Blog Category: Information seeking

Timing thoughts

on Comments (2)

I’ve written about Google Instant before, but Daniel Tunkelang’s recent post triggered some additional reactions. Daniel writes that Instant is good because

Users spend less–and hopefully no time–in a limbo where they don’t know if the system has understood the information-seeking intent they have expressed as a query.

thus, the argument goes that by saving the user a few hundred milliseconds (and the need to press the Enter key), users will be better off because they will get feedback on the queries they run more quickly, and thus will be able to find the things that they are looking for more quickly.

I am not sure that the accountants and the psychologists would necessarily agree, in this case.

Continue Reading

Visualizing search progress

on Comments (1)

I’ve been re-reading a paper by Joho et al. that explored the effectiveness of a number of strategies with respect to collaborative search. The paper finds that

…looking at the top 20 documents in more queries was more effective than looking at the top, say, 100 documents in one fifth the number of queries.

This finding, supported by some of the observations by Vakkari, suggests that encouraging users (working individually or collaboratively) to issue multiple queries, and supporting them in subsequent sense-making activities should improve overall effectiveness of the search process.

Continue Reading

Reverted Indexing

on Comments (8)

Traditional interactive information retrieval systems function by creating inverted lists, or term indexes. For every term in the vocabulary, a list is created that contains the documents in which that term occurs and its relative frequency within each document. Retrieval algorithms then use these term frequencies alongside other collection statistics to identify the matching documents for a query.

In a paper to be published at CIKM 2010, Jeremy Pickens, Matt Cooper and  I describe a way of using the inverted index to associate document ids with the queries that retrieve them. Our approach combines the inverted index with the notion of retrievability to create an efficient query expansion algorithm that is useful for a number of applications, including relevance feedback. We call this kind of index a reverted index because rather than mapping terms onto documents, it maps document ids onto queries that retrieved the associated documents.

Continue Reading

Instant success?

on Comments (10)

So I fired up IE-8 and I tried Google Instant. It’s fast: as fast as I can type, it’s showing me search results. Mind you the results aren’t always sensible, but they are delivered quickly. It works great for short queries such as looking for a popular sense of some word. In this case, it saves me the trouble of hitting enter. Nice, but not earth-shattering.

When I am looking for something less obvious, it guesses wrong. For example, the query “information processing and management” (an academic journal) first produced a set of results for the partial string “”inform” that match informatica.com. Nice, but not the journal. After I typed “information,” it showed me the wikipedia page for “information” (oh the irony) and a bunch of other links highly-associated with the term. But no journal. “information proc” produced a bunch of hits on “information processing.” Better, but not what I am after. Completing the second word and pressing the space bar yielded a number of links to “information processing theory,” which also happens to be the top query suggestion. But no journal. Only when I typed “information processing and” did I get the results I wanted.

So what are we to make of this new addition to Google’s bag of tricks?

Continue Reading

Affect and design

on Comments (4)

Daniel Tukelang wrote an interesting review/commentary on Clifford Nass and Corina Yen‘s new book on affective computing, where they cite many examples that biasing results toward one’s expectations can improve users’ satisfaction with the results. Another class of responses (that is also well-documented in the affective computing literature) is the tendency for people to anthropomorphize computers.

Daniel’s conclusion is that it’s relatively straightforward to use these techniques to deceive people, to subvert personalization, to mislead rather that to inform. I’ve got two reactions to this work, one related to system design, and one more specifically to information seeking.

Continue Reading

TalkMiner

on Comments (6)

While many of the systems we build at FXPAL are either deployed internally or transferred to our parent company, in some cases we get to deploy them in the real world. This week, we released TalkMiner, a system for indexing and searching video of lecture broadcasts. We’ve indexed broadcasts from a variety of sources, including the U.C. Berkeley webcast.berkeley site, the blip.tv site, and various channels on YouTube, including Google Tech Talks, Stanford University, MIT Open Courseware, O’Reilly Media, TED Talks, and NPTEL Indian Institute of Technology.

But all of these videos are already indexed by web search engines, you say; why do we need TalkMiner?

Continue Reading

The Copenhagen Interpretation

on Comments (2)

The IIiX conference series (the latest installment of which took place recently at Rutgers University) arose from IRiX (Information Retrieval In conteXt) workshops (2004, 2005) held in conjunction with SIGIR 2004 and 2005. The workshops were organized by what I think of as the Scandinavian contingent of the IR community — the likes of Peter Ingwersen, Kalervo Järvelin, Pia Borlund, Birger Larsen and others — who collectively represented a more user-centered (as opposed to system-centered) approach to studying information retrieval. Yes, others were involved, but it still seems that the Scandinavians somehow inspired and led the movement. Given the success of the workshops, they organized the IIiX conference series to create a more formal venue for these topics.

One of the highlights of the 2010 conference was a debate between the system camp and the user camp about the value of simulating users. (See Saturday August 21 in the program.) This was a reprise of the theme of a workshop held at this year’s SIGIR conference, this time on the other side’s turf.

Continue Reading

Searching deeper

on Comments (1)

Daniel Russell wrote up a nice summary of my search for the origins of Daniel  Tunkelang’s name. Daniel R. drew two lessons from the exercise: one, that social search (although I would say the social was bordering on the collaborative, in this case) can be effective because it integrates insights of multiple people; and two, that some domain knowledge helped me navigate the search results more effectively.

I’d like to expand his second point a bit.

Continue Reading

HCIR Search Challenge

on Comments (2)

The fourth HCIR workshop was held this past weekend at Rutgers University in conjunction with the IIiX 2010 conference. This was, in my opinion, the best workshop of the four so far. Part of the strength of the workshop has been the range of presentations, covering more mature work in traditional 30 minute presentations, a poster and demo session, and, new this year, reports from the HCIR search challenge.

From the web site:

The aims of the challenge are to encourage researchers and practitioners to build and demonstrate information access systems satisfying at least one of the following:

  • Not only deliver relevant documents, but provide facilities for making meaning with those documents.
  • Increase user responsibility as well as control; that is, the systems require and reward human effort.
  • Offer the flexibility to adapt to user knowledge / sophistication / information need.
  • Are engaging and fun to use.

Participants would be given access to the New York Times annotated corpus which consists of 1.8 million articles published in the Times between 1987 and 2007, and they would be expected do something interesting in searching or browsing this collection.

Continue Reading