Blog Archive: 2011

Recall vs. Precision

on Comments (3)

Stephen Robertson’s talk at the CIKM 2011 Industry event caused me to think about recall and precision again. Over the last decade precision-oriented searches have become synonymous with web searches, while recall has been relegated to narrow verticals. But is precision@5 or NCDG@1 really the right way to measure the effectiveness of interactive search? If you’re doing a known-item search, looking up a common factoid, etc., then perhaps it is. But for most searches, even ones that might be classified as precision-oriented ones, the searcher might wind up with several attempts to get at the answer. Dan Russell’s a Google a day lists exactly those kinds of challenges: find a fact that’s hard to find.

So how should we think about evaluating the kinds of searches that take more than one query, ones we might term session-based searches?

Continue Reading

HCIR 2011 keynote

on Comments (4)

HCIR 2011 took place almost three weeks ago, but I am just getting caught up after a week at CIKM 2011 and an actual almost-no-internet-access vacation. I wanted to start off my reflections on HCIR with a summary of Gary Marchionini‘s keynote, titled “HCIR: Now the Tricky Part.” Gary coined the term “HCIR” and has been a persuasive advocate of the concepts represented by the term. The talk used three case studies of HCIR projects as a lens to focus the audience’s attention on one of the main challenges of HCIR: how to evaluate the systems we build.

Continue Reading

CFP: 3rd Workshop on Collaborative Information Retrieval

on

We are organizing a third workshop on collaborative information retrieval, this time in conjunction with CIKM 2011. The first workshop, held in conjunction with JCDL 2008, focused on definitional issues, models for collaboration, and use cases. The second workshop, held in conjunction with CSCW2010, explored communication and awareness as related to collaborative search. This third workshop will focus on system building, algorithms, and user interfaces for collaboration.

Continue Reading

Looking for an HCIR intern

on Comments (1)

It’s intern time again! I am looking for someone to help me run an exploratory study of a collaborative, session-based search tool that I’ve been building over the last few months. Session-based search frames information seeking as an on-going activity, consisting of many queries on a particular topic, with searches conducted over the course of hours, days, or even longer. Collaborative search describes how people can coordinate their information-seeking activities in pursuit of a common goal.

The intern for this project will help frame a set of research questions around collaborative, session-based search, and then take the lead on an experiment to gain insight into this rich space and to help understand how to improve our search tool. The intern will also participate in writing up this work for publication at a major conference such as CHI, CSCW, JCDL, etc.

Continue Reading

No such thing as bad press?

on Comments (2)

A recent NY Times article exposed the machinations of a sleazy guy who ran an online business that relied on links — positive, negative, whatever — to his web site that caused it to be promoted in Google search results. In fact, he found that by being nasty to his customers, his rankings improved.

The Time article implies that it was his customers’ negative comments that drove up his PageRank score, but Get Satisfaction (least one of the sites on which many of the comments were posted) claims that they mark links with the “rel=nofollow” attribute, which removes that link from PageRank considerations.

So why was he as successful as the article makes it seem?

Continue Reading

Genealogical search

on Comments (1)

October is Family History Month, and I thought I would start it with some reflections on genealogical searching. This post builds on some earlier observations on genealogy and information retrieval.

Genealogy searches are an interesting example of many aspects of information seeking. In some ways, this endeavor reveals the limitations of our classification of information seeking systems and behaviors, such as recall-oriented vs. precision-oriented search, known-item vs. exploratory, etc. While each query one runs should be high precision (find me records for the person I am interested in at the moment), there are many aspects (dates and places of birth and death, details of immigration, residence, occupation) resulting in many queries. And often you really do want to try to find as much as can be found, so the overall task is recall-oriented. Similarly, you start with searching for facts for people whose existence you are documenting, and you can often recognize relevant records when you see them. This has all the hallmarks of known-item search. On the other hand, you may also discover relatives you didn’t know existed, facts you had not expected, new kinds of historical records, etc. This feels much more like exploratory search.

Finally, there is the issue of where to search for information, which databases to use, etc. The range of potential sources for the serious genealogist is quite broad, but for those just starting out there are a few obvious choices beyond interviewing your relatives. Ancestry.com is a family of web sites that federates access to a large range of historical data on individuals. While it’s not the only place one can start, it’s not a bad choice.

Continue Reading

Timing thoughts

on Comments (2)

I’ve written about Google Instant before, but Daniel Tunkelang’s recent post triggered some additional reactions. Daniel writes that Instant is good because

Users spend less–and hopefully no time–in a limbo where they don’t know if the system has understood the information-seeking intent they have expressed as a query.

thus, the argument goes that by saving the user a few hundred milliseconds (and the need to press the Enter key), users will be better off because they will get feedback on the queries they run more quickly, and thus will be able to find the things that they are looking for more quickly.

I am not sure that the accountants and the psychologists would necessarily agree, in this case.

Continue Reading

HCIR Search Challenge

on Comments (2)

The fourth HCIR workshop was held this past weekend at Rutgers University in conjunction with the IIiX 2010 conference. This was, in my opinion, the best workshop of the four so far. Part of the strength of the workshop has been the range of presentations, covering more mature work in traditional 30 minute presentations, a poster and demo session, and, new this year, reports from the HCIR search challenge.

From the web site:

The aims of the challenge are to encourage researchers and practitioners to build and demonstrate information access systems satisfying at least one of the following:

  • Not only deliver relevant documents, but provide facilities for making meaning with those documents.
  • Increase user responsibility as well as control; that is, the systems require and reward human effort.
  • Offer the flexibility to adapt to user knowledge / sophistication / information need.
  • Are engaging and fun to use.

Participants would be given access to the New York Times annotated corpus which consists of 1.8 million articles published in the Times between 1987 and 2007, and they would be expected do something interesting in searching or browsing this collection.

Continue Reading

HCIR hat trick

on Comments (4)

The IIiX2010 conference is coming up, and it promises to be a great week. For me it will start with the Doctoral Consortium, followed by the conference proper, and capped off by the HCIR workshop. I’ve sat in on some doctoral consortia in the past, but this will be my first fully-fledged one. I am looking forward to the presentations and the discussion, and I will be blogging about the various presentations in the coming week.

I don’t expect to get much sleep!

Continue Reading

Pivot

on Comments (6)

Not having gone to SIGIR 2010, I missed Gary Flake’s keynote address, in which he described and demonstrated Microsoft Pivot, a zoomable, faceted search interface that his group built. Jeff Dalton has a good summary of the talk, which parallels Gary’s previous presentations, including a TED talk (video below). The demos are pretty slick, and the scale at which the system operates is impressive.

In some ways, his emphasis on rich clients and interactive control over large, pre-computed datasets, is a great illustration of HCIR principles. The user is encouraged to explore by making fluid, immediate, reversible operations over large data sets with the goal of finding useful information.

Continue Reading