Blog Category: Information seeking

Finding vs. retrieving

on Comments (3)

Having stumbled onto the IR Museum on the SIGIR web site, I decided to investigate why I had not come across it before. I missed the SIGIR announcement about the museum when it was made in 2008, and since that item was no longer on the first page of the web site, I didn’t find it by browsing. Site search might have helped, but there wasn’t any.

So I tried searching the web.

Continue Reading

Impossible to find

on Comments (6)

Thanks to Fernando’s post on Probably Irrelevant, I discovered the IR Museum, an  interesting resource hosted on the SIGIR web site. The site was created to archive documents related to the IR community that are not found in the ACM Digital Library or other similar archives, and yet, are considered fundamental work to the field. The collection, as far as I can tell, includes the Cranfield reports (Cleverdon, 1962),  Rocchio’s PhD Thesis (Rocchio, 1966), a variety of SMART reports (ISR-11, ISR-12, ISR-13,  and ISR-15), and other things that are impossible to find.

Continue Reading

Research advice and a search challenge

on Comments (9)

I was intending to write a post on the varied reasons mathematicians give for taking long walks as an aid to research. I couldn’t find my favorite quote, so instead I’m posting a search challenge.

I thought I remembered reading, in the book Littlewood’s Miscellany, something along the lines of the following advice:

Researchers spend the vast majority of their time feeling frustrated. To improve the ratio of time feeling fulfilled to time feeling frustrated, whenever you find a new result or succeed in completing a proof, take the time to enjoy it, preferably by taking a long walk.  Definitely don’t dive into the next problem, or go back and check the proof. There is plenty of time for that later.

However, it doesn’t seem to be in that book. Littlewood certainly approved of walking, and the tone of much of his advice is consistent with this quote, but this particular piece of advice doesn’t appear to be there.  I couldn’t find it in a web search either.

I would love to know the true source for this piece of wisdom.

Tcho chocolate bar to anyone who can track down the source!

Aggregating Twitter

on Comments (2)

There are lots of ways to display search results, and the familiar (if impoverished) ranked list of links with snippets is just one possibility. It doesn’t work particularly well for Twitter, for example because for many kinds of searches it’s hard to make sense of the tweets individually; instead, a more holistic approach is more appropriate.  I described in one such approach in Making Sense of Twitter Search (the position paper was co-authored with Miles Efron and was presented at a CHI 2010 workshop on microblogging) .

Paper.li is another approach to visualizing sets of Tweets. For a given topic or user, it identifies documents referred to by your followers and builds a two-column online newspaper-style layout out of those documents. It classifies documents by broad categories (media, education, technology, etc.) and prominent hashtags (e.g., #facebook), show the leading paragraphs or two of the document, and the person who tweeted it. Media such as YouTube videos are embedded directly into the layout. And, you can, of course, switch to a list view.

Continue Reading

Migratory Words

on Comments (1)

Building effective search interfaces is hard, particularly when the goal is to support exploratory search rather than precision-oriented fact finding that the major search engines excel at. The challenge is to support a complex, evolving, information-rich task in a generalizable, understandable, and manageable way. We have some good ideas about how to make various components of information exploration interfaces; Marti Hearst’s book, for example, details much of the science and engineering that goes into good design for information seeking interfaces. None the less, the challenge of how to put these techniques into usable, effective and engaging interfaces that make it possible to do serious information seeking, remains.

A team of students at SIMS took a step in this direction with their Masters’ Thesis project called Migratory Words. The system allows people to search and browse a collection of news articles. Results are presented in a combination of visualizations and text lists that highlight terms, documents, and collections. The use terms and phrases that represent the ideas latent in the documents is a particularly welcome addition to traditional document-focused interfaces.

Continue Reading

Enterprise Search Summit 2010

on

The Enterprise Search Summit is taking place right now, and I am sorry to be missing it. The program looks quite interesting, including keynotes by Marti Hearst and Peter Morville, among others. Marti’s talk this morning, related to her recent book on information retrieval, was summarized by Daniel Tunkelang on his blog. While she did touch on topics covered in her book, including some of the collaborative search work done here at FXPAL, she has shifted her focus somewhat to address the more social issues around information seeking. While I don’t the details of her presentation, she did mention similar topics when she participated at a recent panel on search at the WWW2010 conference. The twitter streams from both events capture her “socialize vs. personalize” comments. (Since Twitter search sunsets quickly, here are the TwapperKeeper archives for #ess10 and the www2010 Search Is Dead panel.)

Peter Morville should be an interesting speaker on information retrieval-related topics, some of which he covers in his books Search Patterns and Ambient Findability. I wrote about some of his ideas earlier, but am curious to hear how he is presenting his work.

I hope that both talks are recorded and made available on the web.

Update: Daniel Tunkelang’s summary of Peter Morville’s talk

Search is Dead. Long Live Search!

on Comments (3)

Yesterday, the WWW 2010 conference featured a panel with representatives of Yahoo! (Andrei Broder – Fellow and VP, Search & Computational Advertising, Yahoo! Research), Bing (Barney Pell – Partner, Search Strategist for Bing, Microsoft), Google (Andrew Tomkins – Director of Engineering at Google Research), and academia (Marti Hearst – Professor, School of Information, University of California-Berkeley) on the current state of search on the web. The title was meant to be provocative, but I doubt that anyone in the room thought that this was a solved problem. I wasn’t at the conference, but was able to follow it on Twitter and through a video feed kindly provided by Wayne Sutton. A persistent recording of the event is available through qik by Kevin Marks, although the audio is rather faint. (Wayne’s feed had great audio, but the panelists were sitting down, and were blocked by the podium!)

The panel covered a lot of ground, and some of this has already been summarized by Jeff Dalton on his blog. In short, the big search engines are moving beyond the top 10 links and exploring additional capabilities — both in the ranking algorithms and in the style of interaction — to satisfy a broader range of information needs.

Continue Reading

The Map Trap

on Comments (4)

Are maps better than text for presenting information on mobile devices? That was the question explored by Karen Church, Joachim Neumann, Mauro Cherubini and Nuria Oliver in a paper (about to be) presented at the WWW 2010 conference, they present evidence that in some cases a textual display of information supports people’s information needs more effectively than a map-based one.

The two interfaces were evaluated over the course of a month of use “in the wild” (but in Ireland, not in in Spain). Each participant had access to both interfaces, and was shown how to use them to ask location-specific questions, which would be answered by others nearby. Availability of answers was communicated via SMS messages.

Continue Reading

Risky Business

on

A while ago I wrote about the general threats to one’s privacy posed by search engine histories. It appears that the threat is more than theoretical, as researchers at INRIA and UCI have shown recently. They were able to exploit security weaknesses in the Google Web History used to generate personalized suggestions through what they termed a “Historiographer” attack.

Google appears to be taking the researchers’ warnings seriously, and has modified some of its services to use HTTPS. Not all aspects have yet been secured, however.

Continue Reading

SIGIR Papers Announced

on

The complete list of accepted SIGIR papers were announced yesterday:

http://members.unine.ch/jacques.savoy/Events/SIGIR.html

I think there is a much larger diversity this year in topics, a trend that has been growing in recent years.  In fact, the only topic with more than a single session is clustering. A couple of titles that personally look intriguing include:

Assessing the Scenic Route:  Measuring the Value of Search Trails in Web Logs
White Ryen (Microsoft Research Redmond), Huang Jeff (University of Washington)

Relevance and Ranking in Online Dating Systems
Diaz Fernando, Metzler Donald, Amer-Yahia Sihem (Yahoo! Labs)

Comparing User Preferences, for Relevance and Diversity with Test Collection Outputs
Sanderson Mark, Paramita Monica Lestari, Clough Paul, Kanoulas Evangelos (University of Sheffield)

Evaluating Verbose Query Processing Techniques
Huston Samuel, Croft Bruce (University of Massachusetts Amherst)

In particular, the question of “going the scenic route” is one that deserves much more study.  Information Retrieval is most often concerns with effectiveness and efficiency.  The straight path to relevance.  As well it should be.  But there are other valuable goals that are just as much a part of information seeking such as serendipity, diversity, and, well, scenery.  It becomes interesting, and difficult to evaluate, when the goal rather than the process is exploration.