Blog Category: Information seeking

Boolean illogic

on Comments (5)

I am trying to understand how Google patent search works, and am encountering some quite odd behavior. I am not talking about the inventor search bug (which is still un-fixed), but about Boolean logic.

If I run the query [“information retrieval”], the system retrieves 323 documents. Similarly, [“dynamic hypertext”] retrieves 368 documents. The combination, [“information retrieval” “dynamic hypertext”] yields 16. Putting a plus in front of either quoted phrase does not affect the results. So far, this seems reasonable.

Continue Reading

Session-based search

on Comments (14)

Exploratory search often takes place over time. Searchers may run multiple queries to understand the collection, to refine their information needs, or to explore various aspects of the topic of interest. Many web search engines keep a history of a user’s actions: Bing makes that history readily available for backtracking, and all major search engines presumably use the click-through history of search results to affect subsequent searches. Yahoo Search Pad diagnoses exploratory search situations and switches to a more elaborate note-taking mode to help users manage the found information.

But none of these approaches makes it easy for a searcher to manage an on-going exploratory search. So what could be done differently? We explore this topic in a paper we’ll be presenting at the IIiX 2010 conference this August. Our paper reviews the literature on session-based search, and proposes a framework for designing interactions around information seeking. This framework uses the structure of the process of exploratory search to help searchers reflect on their actions and on the retrieved results. It treats queries, terms, metadata, documents, sets of queries, and sets of documents as first-class objects that the user can manipulate, and describes how information seeking context can be preserved across these transitions.

Continue Reading

Reading on Papers

on Comments (2)

I am trying to understand the capabilities of existing iPad applications with respect to active reading. In this spirit, I have reviewed iAnnotate, and have written about e-books in general. Mekentosj Papers is a Mac application for managing academic papers; a version of it has been ported to the iPad. The idea is that you can use it to find papers you need to read, read them, and also manage their re-finding. The app fails on all accounts.

Continue Reading

Patent Search workshop at CIKM 2010

on

The 3rd workshop on Patent Information Retrieval (PAIR 2010) will be held in conjunction with CIKM 2010 on October 26th. Patents pose specific challenges with respect to information retrieval, and thus it’s unsurprising that the topic should receive focused attention in a series of workshops. What’s particularly interesting about this workshop is that rather than focusing solely on technical issues, its CFP specifically invites participation from patent retrieval practitioners:

We encourage IP professionals to present their special information needs and IR&KM researchers to present relevant technical ideas, for example for high recall search in prior art searching.

I really like this grounded approach to a complex problem space. Bringing together researchers are domain experts should benefit both groups: researchers should be able to draw on specific use cases and get a better understanding of searchers’ information needs, while patent search domain experts can get exposure to new tools and interfaces. I would love to see this approach repeated for other domains that involve information seeking such as medicine, law, and intelligence analysis, etc.

Now all I have to do is figure out how to attend it and the BooksOnline’10 workshop at the same time.

Google’s Patent Search “feature”

on Comments (1)

While poking around on the USPTO and Google to try to figure out how to get single PDF documents for my indexing project, I discovered that the Google advanced search interface won’t retrieve any documents based on the inventor field. I run the searches three ways: by typing an author’s name into the Google patent search box, by typing it into the advanced search form on Google, and by entering it into the USPTO’s advanced search form. I expect the first set of results to be the largest as it may include hits where the inventor is referenced by some other patent, but the second two should return the same number of hits. The results for a couple of searches are shown below; you can run your vanity search yourself.

Inventor Google Google
advanced
USPTO
Gene Golovchinsky 41 0 21
Andreas Girgensohn 52 0 29
Daniel Tunkelang 9 0 8

I don’t know if this is a metadata problem (along the lines of the Google books metadata issues that came up in the context of Google Books), or if it is a UI/front end issue. In any case, it seems odd that testing didn’t catch this bug.

Parsing patents

on Comments (5)

Since Google announced its distribution of patents, I have been poking around the data trying to understand what’s in there and starting to index it for retrieval. The first challenge I’ve had to deal with is data formats. The second is how to display documents to users efficiently.

The full text of the patents is available in ZIP files, one file per week, based on the date patents were granted. The files cover patents issued from 1976 to (as of this writing) the first week of 2010. In addition to the text, they contain all manner of metadata such as when the patent was filed, who the inventors and assignees were, etc. Interestingly, the zipped up files are in two different formats: patents from 2001 on are in XML, while earlier ones are in a funky ad hoc text format.

Continue Reading

Intended to deceive

on Comments (2)

The ‘sphere is a-twitter about BP’s buying keywords (e.g., “oil spill”, “BP”, “gulf disaster”, etc.) to place links to their versions of the story at the top of the search results.  ABC News writes:

According to Kevin Ryan, the CEO of California-based Motivity Marketing, research shows that most people can’t tell the difference between a paid result pages, like the ones BP have, and actual news pages.

So we have two issues: one related to BP, and one related to the search engines.

Continue Reading

Searching for a Houzz

on

Miles Efron and I have written about micro-IR in the past (see here, here, and here), and I recently came across another interesting example in the form of the Houzz App for the iPad. Houzz is an interface that fronts a collection of photographs of house interiors, the kind of stuff you might find in magazines and interior design/decoration books. It provides (an imperfect) browsing and search interface to find images by geographic area, by room function, etc.  It also has a mode which brings together sets of images on a theme, curated by a designer with a blog. Each set of images comes with an introduction by the blogger, a bit of background on the person,  commentary on each image, and even blog-like discussions among readers and designers associated with each theme.

Continue Reading

How far to generalize?

on

The importance of understanding people’s activity to inform design is one of the central tenets of HCI. When design is grounded in actual work practice, it is much more likely to produce artifacts that fit with the way people work and the way they think. One key challenge when studying people for the purpose of informing design is to understand what aspects of existing work practice are essential to the work and what aspects are side-effects of existing technology (or lack thereof) and are fair game for innovation.

While HCIR research often relies on recall and precision measures to compare systems, qualitative methods are used as well. For example, Vakkari and his colleagues studied several students performing research for their Master’s thesis work. Researchers used a variety of techniques including diary entries and interviews to assess the evolution of searchers’ behavior over the course of a few months. Their findings led them to fill in some of the details of Kuhlthau’s model of information seeking.

Continue Reading