Blog Category: Research

Lack of progress as an opportunity for progress

by Gene Golovchinsky on October 1, 2009 Comments (2)

Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel have written what will undoubtedly be a controversy and discussion-inspiring paper for the upcoming CIKM 2009 conference. The paper compares over 100 studies of information retrieval systems based on various TREC collections, and concludes that not much progress has been made over the last decade in terms off Mean Average Precision (MAP). They also found that studies that use the TREC data outside the TREC competition tend to pick poor baselines to show short-term improvement (which is publishable) without demonstrating long-term gains in system performance. This interesting analysis is summarized in a blog post by William Webber.

eBooks aren’t just for reading anymore

by Gene Golovchinsky on September 30, 2009 Comments (1)

There has been more news on eBook hardware front recently. Microsoft is floating a two-screen device idea reminiscent of Nick Chen‘s thesis work that he has published in part in CHI 2008. The video is worth watching. The rendering of the MS ‘Courier’ device is slick, but at this point no specs are available. A UX mockup video shows some nice ideas, but it is not clear how much of this will survive in the product. And of course it will need to compete with the Apple tablet, whether that thing materializes.

More interestingly, IREX announced a digital reader that is a follow-on to the Iliad.

IREX DR 800 eBook reader

CFP: 2nd Workshop on Collaborative Information Seeking

by Gene Golovchinsky on September 28, 2009 Comments (2)

Jeremy and I have been blogging about collaborative search for a while, and it is our pleasure to announce that Merrie Morris and we are organizing another workshop on Collaborative Information Seeking. The first workshop was held in 2008 in conjunction with the JCDL 2008 conference. We had a many interesting presentations and a lot of discussion about systems, algorithms, and evaluation.You can find the proceedings from the workshop on arXiv.org (metadata and papers) and on the workshop web site.

It’s time to revisit this topic, this time in conjunction with the CSCW 2010 conference. The workshop call for participation is here. Our goal is

to bring together researchers with backgrounds in CSCW, social computing, information retrieval, library sciences and HCI to discuss the research challenges associated with the emerging field of collaborative information seeking.

To participate, please submit a 2-4 page position paper in the ACM format by November 20th. The workshop will take place in February, in Savannah, Georgia. Hope to see you there!

The many faces of PubMed search

by Gene Golovchinsky on September 24, 2009 Comments (2)

The number of third-party tools for searching PubMed data seems to be increasing recently. As the NLM is about to roll out a new search interface, companies are starting to offer alternative interfaces for searching this important collection. The attraction is obvious: a large, motivated group of searchers, an important information need, and a manageable collection size. A decade ago, over 20 million searches were done monthly through the NLM site, and the numbers are surely higher today; the collection is large but not huge — currently over 17 million entries (some with full text), occupying somewhat more than 60GB of disk space. Thus we see an increasing number of sites offering search over this collection, including PubGet, GoPubMed, TexMed, and HubMed. The offerings range from basic to flashy, and appear to be aiming at different groups of searchers.

Should IR Objective Functions be Obfuscated?

by Jeremy Pickens on September 23, 2009 Comments (3)

I have a question. It’s a general question, directed at anyone and everyone.

When one is building an Information Retrieval system, one uses target objective function(s) that give an indication of the performance of the system, and designs the system (algorithms, interfaces, etc.) toward those targets. Sometimes, those functions are open and well understood. Other times, those functions are proprietary and hidden.

My question is: Does it do the users of an IR system a service or disservice to hide from them the function that is being optimized? Or is it completely neutral? In other words, does the user have to understand, or at least be given the chance to understand, what it is that the system is trying to do for them in order to get the best value out of that system? Or can a user get results just as good without having to have a clear mental model of what the retrieval engine is trying to do? In short, does it matter if the user does not understand what the system is trying to do for him or her?

Can someone point me to research that may have looked at this question? If one were trying to publish original research on the topic, how would one go about designing an experiment in which both (1) this hypothesis is tested, and (2) done so in a way that generalizes, or at least hints at possible generalization?

Contextualizing IR

by Gene Golovchinsky on September 14, 2009 Comments (5)

In a recent post, Miles Efron proposed a distinction between different kinds of information retrieval: “macro IR” that concerns with generic tasks such as searching the web, and “micro IR” that represents more focused interaction. My sense is that one key distinction between the two is the degree to which the system represents the context of the search, and therefore is able to act on the results. Miles’ examples–finding restaurants, books, music, people–have a transactional quality about them. The system has a sufficient representation of the task to both structure the query in an appropriate manner (e.g., Yelp! metadata about restaurants) and to act on the selected result (e.g., offer to make a reservation). Macro IR, on the other hand, lacks a strong contextual representation, and leaves it to the user to act on the retrieved information.

Perhaps they measured the wrong thing…

by Gene Golovchinsky on September 11, 2009 Comments (3)

Ian Soboroff commented on yesterday’s blog post that although mental models were important, they were insufficient. He cited a paper that found that legal staff had experienced problems with using a full-text search engine to search (with a recall-oriented information need) a collection of documents in a legal discovery scenario. The paper concludes that coming up with effective keyword searches is difficult for non-search experts. The paper is interesting and worth reading, but I believe the authors conclusions are not warranted by their methodology.

Search is not Magic

by Gene Golovchinsky on September 10, 2009 Comments (7)

A discussion among commenters on a post about PubMed search strategies raised the issue of how people need to make sense of the results that a search engine provides. For precision-oriented searches a “black box” approach may make sense because as long as the system manages to identify a useful document, it doesn’t matter much how it does that. For exploratory search, which may be more recall-oriented, having a comprehensible representation of the system’s computations is important to assess coverage of your results. This suggests the need to foster useful mental models, rather than relying on the system to divine your intent and magically produce the “right” result.

What a difference 200 years makes

by Gene Golovchinsky on September 9, 2009 Comments (1)

Recently, I had an opportunity to see the Babbage Difference Engine No. 2 (serial #2) in action. It’s an impressive piece of machinery, weighing in at about five tons, consisting of 25,000 parts. Mostly metal. It’s on display at the Computer History Museum in Mountain View through December, when Nathan Myhrvold takes it home and installs it in his living room, next to the T-Rex. Babbage built a few smaller models, but never saw the completion of the project after a falling out with his master builder and subsequent loss of funding from the government. Still, he had something like 12 years of funding to attempt to build the device. (He also made money on other inventions such as the cowcatcher at the front of steam engines.)

The Science Museum in London built Difference Engine No. 2 serial #1 in the late 1980s to commemorate the 200th anniversary of Babbage’s birth.

Front view showing the registers

Tree-books to e-books

by Gene Golovchinsky on September 8, 2009 Comments (3)

I recall from my youth in the Soviet Union a series of jokes structured around a fake talk radio call-in show. One example stuck with me:

Q: Is it possible to create a Communist regime in an arbitrary country? Say France, for example.

A: In principle, yes. But what has France ever done to deserve that?

I was reminded of this joke by a recent article describing how a school would be replacing its library with electronic devices. The plan is to replace the stacks with three large monitors, “laptop-friendly” study carrels, and 18 e-book readers (Amazon Kindles and Sony eReaders). They are also planning to replace textbooks with electronic versions, at least in math, and possibly in other subjects as well.

I can see many problems with this vision of the future of reading based on the notion that books are an outdated technology. I’ve written about e-books before (and I am still fond of the research we did in this space), and I find myself wondering about the wisdom of this venture by the headmaster of Cushing Academy.

Blog Category: Research

Categories

Archive

Blog Category: Research

Lack of progress as an opportunity for progress

eBooks aren’t just for reading anymore

CFP: 2nd Workshop on Collaborative Information Seeking

The many faces of PubMed search

Should IR Objective Functions be Obfuscated?

Contextualizing IR

Perhaps they measured the wrong thing…

Search is not Magic

What a difference 200 years makes

Tree-books to e-books