Building the Ivory Tower

on

I recently read on Jeff Dalton’s blog that a new open-source search engine, called  Ivory, has been released by Jimmy Lin. Ivory is based on Hadoop, and is  designed to handle terabyte-sized collections. Unlike Lucene, this is a research project, Jimmy Lin writes,

aimed at information retrieval researchers who need access to low-level data structures and who generally know their way around retrieval algorithms. As a result, a lot of “niceties” are simply missing—for example, fancy interfaces or ingestion support for different file types. It goes without saying that Ivory is a bit rough around the edges, but our philosophy is to release early and release often. In short, Ivory is experimental!

Continue Reading

Living Laboratory

on Comments (1)

In her talk at the IR Eval workshop at SIGIR 09, Sue Dumais called for an experimental platform for conducting research in information seeking (thanks Sakai-san!). She called it a Living Laboratory. This is a tremendous idea, the high tide that lifts all boats. Whether you’re interested in doing log analysis, interface design evaluation, building new indexing algorithms, or other kinds of research, having real data sets with real users and real information needs can move the field forward in ways that Cranfield-style experiments do not.

Continue Reading

Can’t find that symbol?

on

Via Dave Bacon’s blog, I came across Detexify, a cool tool that enables you to find the LaTeX command for a symbol by drawing the symbol.  LaTeX is the standard typesetting system for researchers in the mathematical sciences.  One indication of its popularity is that Scott Aaronson lists “The authors don’t use TeX” as the first of his “Ten Signs a Claimed Mathematical Breakthrough is Wrong.” Unfair I know, but so it is.

Continue Reading

SIGIR Twitter Archives

on

We’ve created some archives of twitter conversations for SIGIR 2009 and for some of the workshops associated with the conference. These archives are useful because Twitter messages tend to evaporate after a while.

I know of the following archives:

If the other workshops had significant traffic, I am happy to archive & update the list above.  TwapperKeeper is a service that archives twitter searches based on a specified hashtag. The data is then available through the web site and for download in tab- or semicolon-separated format. Saving your own copy means that you can refer to it later, and also makes it easier to do data mining or other research on the use of Twitter. I encourage people to download archives (although as new tweets come in the archives will get updated on TwapperKeeper) to make sure they persist even if TwapperKeeper doesn’t. Archive early, archive often.

SIGIR09: An aspectual interface for supporting complex search tasks

on Comments (2)

Faceted search interfaces for metadata-rich datasets such as product information have been around for a while. e-Bay and Amazon are two obvious examples. Faceted search for textual data is only slowly making its way into the commercial realm (see NewsSift, for example) but have been receiving increasing attention in research. Villa et al. presented an interesting paper at SIGIR09 in which they compared different interface layouts for handling aspects, and compared the effectiveness of aspectual search with a conventional interface for different tasks.

Continue Reading

Query suggestion vs. term suggestion

on Comments (2)

Diane Kelly presented an interesting (and much tweeted-about) paper at SIGIR this week. The paper, “A Comparsion of Query and Term Suggestion Features for Interactive Searching,” co-written with Karl Gyllstrom and Earl Bailey, looks at the effects that query and term suggestions have on users’ performance and preferences. These are important topics for interactive information seeking, both for known-item and exploratory search.

Continue Reading

Sue Dumais, HCIR Poster Child

on Comments (2)

At CHI 2007, in a workshop on exploratory search, we had a long discussion of the definition of exploratory search, during which Sue Dumais kept challenging the room to look broadly, bringing in examples and counter-examples not only from full text search, but from more structured datasets that were also fair game.

Exploratory search is just one part of HCIR; her work on adapting systems to users’ vocabulary (not vice-versa) that led to LSI, innovative search interfaces (“If in 10 years we are still using a rectangular box and a list of results, I should be fired.” ), finding and re-finding information on your personal computer, and personalization of search results all fit squarely into the HCIR space.

Those who attended the HCIR’08 workshop organized by Daniel Tunkelang (Endeca), Ryen White (MSR), and Bill Kules (CUA) got a great overview of Sue’s research. This week, during her  opening keynote at SIGIR (see notes from Jeff Dalton and Jonathan Elsas, who, unlike me, were actually there!) Sue described the course of her career as an IR researcher, first at Bellcore and at Microsoft Research. In her career, she has consistently focused on the user both for inspiration for design, and for evaluating the systems.

“If you have an operational system and you don’t use what your users are doing to improve, you should have your head examined” (from  Jeff Dalton)

I expect we’ll be seeing more interesting and innovative results from her group, both at SIGIR and at the HCIR workshop series.

Which future of search?

on

Alex Iskold recently wrote on the ReadWriteWeb about potential improvements in search that could be derived from incorporating evidence from one social network to affect the ranking of documents. The idea is that people you know, people with similar interests, friends-of-friends, authorities, and “the crowd” could all contribute to change the ranking on documents that a search engine delivers to you because the opinions or interests of all these people can provide some information to help disambiguate queries.

Continue Reading