Perhaps they measured the wrong thing...

by Gene Golovchinsky on September 11, 2009 Comments (3)

Perhaps they measured the wrong thing...

by Gene Golovchinsky on September 11, 2009

Ian Soboroff commented on yesterday’s blog post that although mental models were important, they were insufficient. He cited a paper that found that legal staff had experienced problems with using a full-text search engine to search (with a recall-oriented information need) a collection of documents in a legal discovery scenario. The paper concludes that coming up with effective keyword searches is difficult for non-search experts. The paper is interesting and worth reading, but I believe the authors conclusions are not warranted by their methodology.

The paper is interesting in that describes an experimental setup in which paralegals and attorneys collaborated in an information seeking task: the attorneys would specify information needs, the paralegals would do searches, and the attorneys would then evaluate the results for relevance; they were free to iterate until they thought they had found a query that identified 75% of the ‘vital’, ‘satisfactory’, and ‘marginally relevant’ documents. This setup was meant to simulate a realistic work scenario where groups of people worked on a recall-oriented, exploratory search.

It turned out, however, that only about 20% of relevant articles were identified through this method. In their analysis, the authors argued that it is unrealistic to expect users without much training to find relevant documents using disjunctions of keywords when searching over large collections. But the authors’ analysis is flawed because in realistic scenarios, single queries are not expected to achieve high recall; rather the idea is that many queries run in an interactive, iterative manner will identify small clusters of relevant documents, the union of which may produce high recall. Ironically, the authors describe an elaborate, multi-step process they used to approximate the number of relevant documents. The process involved learning which vocabulary was useful for identifying some relevant documents, and learning new vocabulary from those documents. This, of course, is precisely how exploratory search is performed. So it is the expectation that people would be effective at this task without engaging in such a process that is unrealistic.

The way the study structured people’s interactions with information fostered an inappropriate mental model in the users. Rather than expecting users to construct single high-precision, high-recall queries, the users should have been encouraged to be more interactive and exploratory. My sense is a more transparent approach would yield better results:

The system should let both attorneys and paralegals interact with the collection. This should make the entire team more familiar with the system and may encourage people to act on their insights in an exploratory fashion rather than communicating with co-workers indirectly through written notes. A collaborative framework such as SearchTogether might help mediate communication.
The system showed the attorneys (domain experts) what was retrieved, but hid from them how the searches were performed and therefore why certain documents were retrieved. This separation leads to loss of transparency, and may have contributed to their poor ability to estimate the number of relevant documents in the collection. The lesson here for transparency in collaboration is that in addition to exposing the results of a team member’s efforts, the system should also make available some useful explanation for how that information was identified. Coherent accounts of team members’ activity should improve collaboration.
The system should indicate what fraction of the collection has been seen by the searchers. Immersion should help people with assessing how likely they are to have seen all the relevant articles.
Query and term suggestion techniques can offer suggestions of other terms to use; domain experts (such as the attorneys described in this study) may then recognize potentially-useful terms. In addition to using the terms directly, this may help search novices learn to diversify query terms in the future.

Above all, the system should strive to be learnable and predictable to encourage the expectation that it can be learned, and understood, and appropriated. Another thing to note (for the librarians in the audience) is that a search intermediary might have helped not only to create more useful search strategies, but also to educate the searchers about how to conduct more effective searches. While these days we take the ability to search for granted (not so in 1985, when this paper was written), we may still benefit from better basic education in information seeking at the high school or college level.

Share on:

We also recommend

3 Comments

Sarah Vogel says:

September 11, 2009

I think you’ve nailed a lot of the concerns I have with search systems and what you describe actually simulates the way many (if not most) professional searchers work with clients.

If a search project is recall-oriented, it would be rare that a single search statement would do the job. As you mentioned, usually it’s an iterative process where the results provide new ways of looking at the same question with new vocabulary. Working in Boolean systems, I may come to my final search statement and AND all my little subsearches together but the process is definitely iterative as you describe. One of my issues with some of the Internet search engines I use (such as Google), is that I can’t NOT out results I’ve already seen. A good system would remember each of my attempts on the same topic and let me omit items previously seen (and a great system might let me have control of this behavior).

Regarding your comments on transparency, I think it is interesting that what you’re describing is how we (librarian=human search engine) often work. When I provide results to my clients, I usually provide a list of keywords that I used. While I’m fairly knowledgeable in the pharma/biotech domain and usually expand their keyword lists considerably, they are still the domain experts and can spot concepts that have been left out. This helps creates a sense of trust that I’m doing a thorough job. As a searcher, I actually don’t trust Google or the “Black box” engines because there is no way for me to pull back the curtain and see how it was done.

There are some systems (PubMed is one) where you can take a look at the search strategy that the system used. It doesn’t force you to look at it if you’re not interested but it’s there if you want to learn. I become comfortable & trusting with these systems far faster than I do with black boxes.

I’m not really sure if most people are better able to do recall-oriented searching now than in 1985. In 1985 most systems were much more difficult to use. Those who were using them generally were better trained and were at least aware of some of the pitfalls of recall-oriented searching. You had to really want to search in 1985 to put up with the esoteric command line search engines. (I don’t know anything about the search engine used in this paper. It may have been the epitome of simplicity but most weren’t back then.) In my experience casual searchers today often don’t think to ask the question “What might I have missed?” because they’ve gotten such great on-target results with the precision-oriented search engines that are available.

I’d love to see more systems with the type of interface you’re suggesting. It could be a really great way to help information seekers improve their queries and understand their results.
Gene Golovchinsky says:

September 13, 2009

Thanks for the feedback on the post! My sense is that professional searchers and people in library schools and iSchools who study how people look for information know all sorts of useful tactics for getting and communicating quality results, but that knowledge is not picked up and used by computer scientists and engineers who build search engines. Perhaps eventually these ideas will have some influence.
FXPAL Blog » Blog Archive » Best of the Overlooked says:

December 23, 2009

[…] Perhaps they measured the wrong thing… […]

Comments are closed.

Categories

Archive

Perhaps they measured the wrong thing...

We also recommend

3 Comments