Blog Category: Information seeking

Google eBooks

on Comments (2)

So Google has unveiled its eBook store, setting itself up to compete with Amazon, Barnes&Noble, and everyone else selling books. Google offers its editions through the browser and on a range of devices such as Android phones and the iPad. The reading experience on the browser on my laptop was OK: not great, but the text was legible enough, and would even switch to a two-page layout in a wide window. On the iPad, Google offers two choices: the browser, and a free app. The browser interface implements a swipe gesture for page turning, although there is no visible indication that it’s possible, nor any visual feedback until the page flips. The iPad app sports an animated page turning transition, but does not have a two-page mode.

Continue Reading

No such thing as bad press?

on Comments (2)

A recent NY Times article exposed the machinations of a sleazy guy who ran an online business that relied on links — positive, negative, whatever — to his web site that caused it to be promoted in Google search results. In fact, he found that by being nasty to his customers, his rankings improved.

The Time article implies that it was his customers’ negative comments that drove up his PageRank score, but Get Satisfaction (least one of the sites on which many of the comments were posted) claims that they mark links with the “rel=nofollow” attribute, which removes that link from PageRank considerations.

So why was he as successful as the article makes it seem?

Continue Reading

Evidence

on Comments (7)

Those of you who’ve followed this blog and Jeremy Pickens’ blog will recall his many comments about Google’s un-Googly behavior. Recently, Benjamin Edelman actually tested the hypothesis about Google injecting bias into organic results. His post details several kinds of queries that don’t produce organic results. Which ones? Ones that are related to Google properties such as finance, health, and travel. While it’s clear why Google pushes its own properties, it seems that this behavior is inconsistent with the image it tries to project.

Continue Reading

Slides from CIKM 2010 Reverted Indexing talk

on Comments (1)

Here are the slides from our talk at CIKM 2010 last week. More details on reverted indexing can be found in an earlier post and on the FXPAL site, the full paper is available here, and the previous post describes why the technique works. The contribution of the paper can be summarized as follows:

We treat query result sets as unstructured text “documents” — and index them.

On term selection in reverted indexing

on Comments (4)

Jeremy Pickens contributed to this post.

Jeremy did a great job of presenting our Reverted Indexing paper, but the short session made it difficult to answer all questions and comments thoroughly. For example, William Webber wrote up a post summarizing our work, in which he observed

The authors surmise that the reverted index is more effective because it suggests more selective expansion terms, and they reproduce example term sets as evidence. This explanation is convincing enough as far as it goes; but what is not explained is why the reverted index’s expansion terms are more selective. The reason is not obvious. A single-term reverted index is not much more than a weighted direct index, mapping from documents to the terms that occur in them

I would like to address his comments because this is a key aspect of Reverted Indexing.

Continue Reading

Sue Dumais at CIKM 2010

on Comments (2)

Sue Dumais of MSR gave an excellent keynote address at CIKM last week, in which she emphasized the temporal nature of collections used for information retrieval and of the way people access information on the web. This was by far the most user-oriented talk at the conference that I attended, and a refreshing change from the vast array of machine learning papers in the rest of the conference.

The slides from the talk will be available on her site, but are substantially similar to her ECDL 2010 keynote talk. In short, Sue described how collections and documents change over time, and how people’s patterns of visiting web sites change in response to content evolution. She also introduced a new browser plugin for Internet Explorer called Diff-IE that helps people understand changes to the web sites they visit.

Continue Reading

A future of search

on Comments (10)

Jamie Callan of CMU gave an interesting and thought-provoking keynote talk at CIKM 2010. While traditionally search engines have been used in a more or less direct manner to identify useful documents that the user would then (manually) incorporate into other tasks, Jamie suggested a new class of applications that would use search engines for the purposes of identifying documents or parts of documents in some collection, but then would apply this information in pursuit of some other, more specialized, task.

While the notion of using a search engine as a component of another system is not particularly novel, the kinds of requirements that his proposed use imposes on search engines would certainly push the envelope.

Continue Reading

Genealogical search

on Comments (1)

October is Family History Month, and I thought I would start it with some reflections on genealogical searching. This post builds on some earlier observations on genealogy and information retrieval.

Genealogy searches are an interesting example of many aspects of information seeking. In some ways, this endeavor reveals the limitations of our classification of information seeking systems and behaviors, such as recall-oriented vs. precision-oriented search, known-item vs. exploratory, etc. While each query one runs should be high precision (find me records for the person I am interested in at the moment), there are many aspects (dates and places of birth and death, details of immigration, residence, occupation) resulting in many queries. And often you really do want to try to find as much as can be found, so the overall task is recall-oriented. Similarly, you start with searching for facts for people whose existence you are documenting, and you can often recognize relevant records when you see them. This has all the hallmarks of known-item search. On the other hand, you may also discover relatives you didn’t know existed, facts you had not expected, new kinds of historical records, etc. This feels much more like exploratory search.

Finally, there is the issue of where to search for information, which databases to use, etc. The range of potential sources for the serious genealogist is quite broad, but for those just starting out there are a few obvious choices beyond interviewing your relatives. Ancestry.com is a family of web sites that federates access to a large range of historical data on individuals. While it’s not the only place one can start, it’s not a bad choice.

Continue Reading

IIiX 2010 Proceedings

on Comments (1)

The proceedings of IIiX 2010 are finally available through the ACM Digital Library! In addition, it turns out that ACM has a special series page that links to all IIiX proceedings. In addition, here are the slides from Tefko Saracevic’s keynote address.

The Best Paper award winner was a paper by Sanna Kumpulainen and Kalervo Järvelin (University of Tampere, Finland) titled “Information Interaction in Molecular Medicine: Integrated Use of Multiple Channels.” Two other papers were nominated: “Evaluating search systems using result page context” by Bailey, et al., and “Supporting polyrepresentation in a quantum-inspired geometrical retrieval framework” by Frommholz et al. The Best Poster award was shared by Loizides and Buchanan “Performing Document Triage on Small Screen Devices. Part 1: Structured Documents” and Liu et al., “Identifying Queries in the Wild, Wild Web.”