{"id":1802,"date":"2009-09-11T07:02:58","date_gmt":"2009-09-11T14:02:58","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=1802"},"modified":"2009-09-10T23:19:30","modified_gmt":"2009-09-11T06:19:30","slug":"perhaps-they-measured-the-wrong-thing","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=1802","title":{"rendered":"Perhaps they measured the wrong thing&#8230;"},"content":{"rendered":"<p>Ian Soboroff <a title=\"@ian_soboroff | Twitter\" href=\"http:\/\/twitter.com\/ian_soboroff\/status\/3889134755\" target=\"_blank\">commented<\/a> on yesterday&#8217;s blog <a title=\"Search is not Magic | FXPAL Blog\" href=\"http:\/\/palblog.fxpal.com\/?p=1788\" target=\"_blank\">post<\/a> that although mental models were important, they were insufficient. He cited a <a title=\"Blair, D. C. and Maron, M. E. (1985) An evaluation of retrieval effectiveness for a full-text document-retrieval system. Commun. ACM 28, 3 (Mar. 1985), 289-299.\" href=\"http:\/\/doi.acm.org\/10.1145\/3166.3197\" target=\"_blank\">paper <\/a>that found that legal staff had experienced problems with using a full-text search engine to search (with a recall-oriented information need) a collection of documents in a legal discovery scenario. The paper concludes that coming up with effective keyword searches is difficult for non-search experts. The paper is interesting and worth reading, but I believe the authors conclusions are not warranted by their methodology.<\/p>\n<p><!--more-->The paper is interesting in that describes an experimental setup in which paralegals and attorneys collaborated in an information seeking task: the attorneys would specify information needs, the paralegals would do searches, and the attorneys would then evaluate the results for relevance; they were free to iterate until they thought they had found a query that identified 75% of the &#8216;vital&#8217;, &#8216;satisfactory&#8217;, and &#8216;marginally relevant&#8217; documents. This setup was meant to simulate a realistic work scenario where groups of people worked on a recall-oriented, exploratory search.<\/p>\n<p>It turned out, however, that only about 20% of relevant articles were identified through this method. In their analysis, the authors argued that it is unrealistic to expect users without much training to find relevant documents using disjunctions of keywords when searching over large collections. But the authors&#8217; analysis is flawed because in realistic scenarios, single queries are not expected to achieve high recall; rather the idea is that many queries run in an interactive, iterative manner will identify small clusters of relevant documents, the union of which may produce high recall. Ironically, the authors describe an elaborate, multi-step process they used to approximate the number of relevant documents. The process involved learning which vocabulary was useful for identifying some relevant documents, and learning new vocabulary from those documents. This, of course, is precisely how exploratory search is performed. So it is the expectation that people would be effective at this task without engaging in such a process that is unrealistic.<\/p>\n<p>The way the study structured people&#8217;s interactions with information fostered an inappropriate mental model in the users. Rather than expecting users to construct single high-precision, high-recall queries, the users should have been encouraged to be more interactive and exploratory. My sense is a more transparent approach would yield better results:<\/p>\n<ul>\n<li>The system should let both attorneys and paralegals interact with the collection. This should make the entire team more familiar with the system and may encourage people to act on their insights in an exploratory fashion rather than communicating with co-workers indirectly through written notes. A collaborative framework such as <a title=\"SearchTogether (Beta) | Microsoft Research\" href=\"http:\/\/research.microsoft.com\/en-us\/um\/redmond\/projects\/searchtogether\/\" target=\"_blank\">SearchTogether<\/a> might help mediate communication.<\/li>\n<li>The system showed the attorneys (domain experts) <em>what<\/em> was retrieved, but hid from them <em>how<\/em> the searches were performed and therefore <em>why<\/em> certain documents were retrieved. This separation leads to loss of transparency, and may have contributed to their poor ability to estimate the number of relevant documents in the collection. The lesson here for transparency in collaboration is that in addition to exposing the results of a team member&#8217;s efforts, the system should also make available some useful explanation for how that information was identified. Coherent accounts of team members&#8217; activity should improve collaboration.<\/li>\n<li>The system should indicate what fraction of the collection has been seen by the searchers. Immersion should help people with assessing how likely they are to have seen all the relevant articles.<\/li>\n<li><a title=\"Query suggestion vs. term suggestion | FXPAL Blog\" href=\"http:\/\/palblog.fxpal.com\/?p=1435\" target=\"_blank\">Query and term suggestion techniques <\/a>can offer suggestions of other terms to use; domain experts (such as the attorneys described in this study) may then recognize potentially-useful terms. In addition to using the terms directly, this may help search novices learn to diversify query terms in the future.<\/li>\n<\/ul>\n<p>Above all, the system should strive to be learnable and predictable to encourage the <em>expectation<\/em> that it can be learned, and understood, and appropriated. Another thing to note (for the librarians in the audience) is that a search intermediary might have helped not only to create more useful search strategies, but also to educate the searchers about how to conduct more effective searches. While these days we take the ability to search for granted (not so in 1985, when this paper was written), we may still benefit from better basic education in information seeking at the high school or college level.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Ian Soboroff commented on yesterday&#8217;s blog post that although mental models were important, they were insufficient. He cited a paper that found that legal staff had experienced problems with using a full-text search engine to search (with a recall-oriented information need) a collection of documents in a legal discovery scenario. The paper concludes that coming [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[22,15],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1802"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1802"}],"version-history":[{"count":15,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1802\/revisions"}],"predecessor-version":[{"id":1815,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1802\/revisions\/1815"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1802"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1802"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1802"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}