Blog Archive: 2009

Lack of progress as an opportunity for progress

on Comments (2)

Timothy G. Armstrong, Alistair Moffat, William Webber, and Justin Zobel have written what will undoubtedly be a controversy and discussion-inspiring paper for the upcoming CIKM 2009 conference. The paper compares over 100 studies of information retrieval systems based on various TREC collections, and concludes that not much progress has been made over the last decade in terms off Mean Average Precision (MAP). They also found that studies that use the TREC data outside the TREC competition tend to pick poor baselines to show short-term improvement (which is publishable) without demonstrating long-term gains in system performance. This interesting analysis is summarized in a blog post by William Webber.

Continue Reading

eBooks aren’t just for reading anymore

on Comments (1)

There has been more news on eBook hardware front recently. Microsoft is floating a two-screen device idea reminiscent of Nick Chen‘s thesis work that he has published in part in CHI 2008. The video is worth watching. The rendering of the MS ‘Courier’ device is slick, but at this point no specs are available. A UX mockup video shows some nice ideas, but it is not clear how much of this will survive in the product. And of course it will need to compete with the Apple tablet, whether that thing materializes.

More interestingly, IREX announced a digital reader that is a follow-on to the Iliad.

IREX_DR800_02

IREX DR 800 eBook reader

Continue Reading

Getting a CLuE

on Comments (1)

An NSF-funded cloud computing event is coming to the Bay Area.

In October 2007, Google and IBM announced the first pilot phase of the Academic Cloud Computing Initiative (ACCI), which granted several prominent U.S. universities access to a large computer cluster running Hadoop, an open source distributed computing platform inspired by Google’s file system and MapReduce programming model. In February 2008, the ACCI partnered with the National Science Foundation to provide grant funding to academic researchers interested in exploring large-data applications that could take advantage of this infrastructure. This resulted in the creation of the Cluster Exploratory (CLuE) program led by Dr. Jim French, which currently funds 14 projects. See this NSF Press Release for a short description of all the projects funded under the CLuE program.

The event will be held on October 5th in the Computer History Museum (the current home of the Babbage Difference Engine No2 Serial #2), and will feature a great lineup of researchers reporting on their accomplishments in a variety of disciplines, including indexing for search, data processing, machine translation, text processing, databases, visualization, and other cloud computing topics. You can get more details about the schedule and the speakers here, and click here to register.

CFP: 2nd Workshop on Collaborative Information Seeking

on Comments (2)

Jeremy and I have been blogging about collaborative search for a while, and it is our pleasure to announce that Merrie Morris and we are organizing another workshop on Collaborative Information Seeking. The first workshop was held in 2008 in conjunction with the JCDL 2008 conference. We had a many interesting presentations and a lot of discussion about systems, algorithms, and evaluation.You can find the proceedings from the workshop on arXiv.org (metadata and papers) and on the workshop web site.

It’s time to revisit this topic, this time in conjunction with the CSCW 2010 conference. The workshop call for participation is here. Our goal is

to bring together researchers with backgrounds in CSCW, social computing, information retrieval, library sciences and HCI to discuss the research challenges associated with the emerging field of collaborative information seeking.

To participate, please submit a 2-4 page position paper in the ACM format by November 20th. The workshop will take place in February, in Savannah, Georgia. Hope to see you there!

A tale of two islands

on Comments (2)

ECDL 2009 is taking place this week, and those of us who could not make it to Corfu will have to settle for the island experience of the Second (Life) Kind. Just as JCDL 2009 did earlier this summer, the ECDL 2009 Poster Session is available for viewing online through SecondLife. The real Poster Session will take place Monday, September 28th,  (7-9pm EET, 12:00-14:00 EST, 9-11am PDT), with a parallel session in SecondLife that will continue long after the real one ends.

The complete list of posters is available here; I am looking forward to “Improving annotations in digital documents,” “Searching in a book,” and “Workspace narrative exploration: overcoming interruption-caused context loss in information seeking tasks.”

There are some interesting papers at ECDL as well, including

In pursuit of impact

on Comments (3)

Impact of academic research is often measured through citation counts. Arguably, this is a more sensitive measure than just the number of publications, or even the number of publications in prestigious journals. Innovative work often gets published in venues with mixed reputations because prestigious journals and conferences may reject ideas that don’t fit well with the orthodoxy the discipline. In its heyday, for example, the ACM Hypertext Conference rejected Tim Berners-Lee’s paper on the World Wide Web because (among perhaps other reasons) that work contracted then-established standards of what makes interesting Hypertext research.

Thus it is useful to measure the citation counts of papers to understand their impact on the field. Traditionally, this has been the purview of librarians and citation indexes, but the proliferation of publication venues, and the desire to recognize work that was not published in the mainstream (or perhaps not officially published at all, as Daniel Lemire points out) makes the task of collation difficult.

Continue Reading

The many faces of PubMed search

on Comments (2)

The number of third-party tools for searching PubMed data seems to be increasing recently. As the NLM is about to roll out a new search interface, companies are starting to offer alternative interfaces for searching this important collection. The attraction is obvious:  a large, motivated group of searchers, an important information need, and a manageable collection size. A decade ago, over 20 million searches were done monthly through the NLM site, and the numbers are surely higher today; the collection is large but not huge — currently over 17 million entries (some with full text), occupying somewhat more than 60GB of disk space. Thus we see an increasing number of sites offering search over this collection, including PubGet, GoPubMed, TexMed, and HubMed. The offerings range from basic to flashy, and appear to be aiming at different groups of searchers.

Continue Reading

Should IR Objective Functions be Obfuscated?

on Comments (3)

I have a question. It’s a general question, directed at anyone and everyone.

When one is building an Information Retrieval system, one uses target objective function(s) that give an indication of the performance of the system, and designs the system (algorithms, interfaces, etc.) toward those targets.  Sometimes, those functions are open and well understood.  Other times, those functions are proprietary and hidden.

My question is: Does it do the users of an IR system a service or disservice to hide from them the function that is being optimized?  Or is it completely neutral?  In other words, does the user have to understand, or at least be given the chance to understand, what it is that the system is trying to do for them in order to get the best value out of that system?  Or can a user get results just as good without having to have a clear mental model of what the retrieval engine is trying to do?  In short, does it matter if the user does not understand what the system is trying to do for him or her?

Can someone point me to research that may have looked at this question?  If one were trying to publish original research on the topic, how would one go about designing an experiment in which both (1) this hypothesis is tested, and (2) done so in a way that generalizes, or at least hints at possible generalization?

Larry Rowe wins ACM SIGMM Outstanding Technical Achievement Award 2009

on Comments (3)

The 2009 winner of the prestigious ACM Special Interest Group on Multimedia Award for Outstanding Technical Achievement is our own Dr. Lawrence Rowe.    I have seen this award referred in a number of different ways (even on the ACM SIGMM website), but the above, and “Outstanding Technical Contributions to Multimedia Computing, Communications and Applications” seem to be the most common.  It is only the second year of the award, so we have to wait a while before a cute nickname arises.  (The Mummy award?)

Continue Reading

It’s not what you know, it’s whom you know

on Comments (4)

Almost 10 years ago, L. Sweeny published an analysis of summary census data that was used to identify 87% of respondents based only on their ZIP code, gender, and date of birth, data that we all think of (and the census treats as) relatively anonymous. At about the same time, I visited a friend at a large consulting firm who demonstrated data mining software that combined data from multiple sources and was able to discover many facts about people, that while not particularly revealing individually, painted a much more complete picture when federated. Now comes the news (thanks Daniel) that a group at MIT was able to make better-than-chance predictions about people’s sexual orientation using Facebook friends as training data. Whereas the census analysis and the data mining tools could be considered academic exercises on datasets to which most people don’t have access, the MIT results have much more immediate and potentially damaging implications.

Continue Reading