{"id":1256,"date":"2009-07-06T07:12:50","date_gmt":"2009-07-06T14:12:50","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=1256"},"modified":"2009-08-12T17:10:46","modified_gmt":"2009-08-13T00:10:46","slug":"is-trec-good-for-infromation-retrieval-research","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=1256","title":{"rendered":"Is TREC good for Information Retrieval research?"},"content":{"rendered":"<p>In his <a href=\"http:\/\/palblog.fxpal.com\/?p=1246#comment-6317\" target=\"_blank\">comment<\/a> to an earlier <a title=\"On the Science of IR\" href=\"http:\/\/palblog.fxpal.com\/?p=1246\" target=\"_blank\">post<\/a>, <a href=\"http:\/\/www.ischool.utexas.edu\/%7Emiles\/\" target=\"_blank\">Miles Efron<\/a> reiterated the usefulness of the various TREC competitions to fostering IR research. I agree with him (and with others) that TREC has certainly been a good incubator both in its annual competition and in follow-on studies that use its data in other ways.\u00a0 And, as Miles points out,we have seen a proliferation of collections: everything from the original newspaper articles to blogs, video, large corpora, etc.<\/p>\n<p><!--more-->But its major limitation is the reliance on recall and precision\u2014that is, on an <em>a priori<\/em> gold standard\u2014to measure performance. These metrics allow systems to be compared in a lab, but are difficult to transfer to real-world situations. While algorithms may transfer from the lab to the web (or elsewhere), the evaluation methodology based on these metrics doesn&#8217;t. By focusing researchers&#8217; attention so closely on recall and precision (and on MAP in particular), I argue that TREC has in a real sense discouraged innovation in <em>evaluation<\/em>.<\/p>\n<p>In the recently published book, &#8220;<a href=\"http:\/\/www.amazon.com\/Information-Retrieval-Biomedical-Perspective-Informatics\/dp\/038778702X\/ref=sr_1_13?ie=UTF8&amp;s=books&amp;qid=1246000338&amp;sr=1-13\">Information Retrieval: A Health and Biomedical Perspective <\/a>,&#8221; William Hersh lists several questions that try to get at evaluation of IR systems in the wild. These are:<\/p>\n<ol>\n<li>Was the system used?<\/li>\n<li>For what was the system used?<\/li>\n<li>Were the users satisfied?<\/li>\n<li>How well did they use the system?<\/li>\n<li>What factors were associated with successful or unsuccessful use of the system?<\/li>\n<li>Did the system have an impact?<\/li>\n<\/ol>\n<p>This seems like a great starting point for exploration of evaluation in real-world settings. While the answers to these questions are surely dependent on the domain, and will differ for medical, legal, and other disciplines, by collecting such answers in a systematic way, we may be able to identify useful generalizations that can inform the design and evaluation of many systems. For example, one way to assess the impact of a document is to measure its re-use in certain contexts, as suggested by <a title=\"Jon Elsas | Probably Irrelevant\" href=\"http:\/\/probablyirrelevant.org\/author\/jelsas\/\" target=\"_blank\">Jon Elsas<\/a> for <a title=\"Finding relevance judgements in the wild\" href=\"http:\/\/probablyirrelevant.org\/2009\/04\/finding-relevance-judgements-in-the-wild\/\" target=\"_blank\">comments on a forum<\/a> or as we did in a <a title=\"Shipman, F., Price, M., Marshall, C. and Golovchinsky, G. (2003) Identifying Useful Passages in Documents based on Annotation Patterns. In Proc. ECDL 2003\" href=\"http:\/\/www.fxpal.com\/?p=abstract&amp;abstractID=216\" target=\"_blank\">study<\/a> how law students write from sources.<\/p>\n<p>Perhaps a to go forward without burning bridges is to extend the mandate for TREC, and to fill in an important missing aspect. We should honor the <strong>T<\/strong>ext <strong>RE<\/strong>trieval <strong>C<\/strong>onference by transforming it into the <strong>T<\/strong>ext <strong>R<\/strong>etrieval and <strong>E<\/strong>valuation <strong>C<\/strong>onference.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In his comment to an earlier post, Miles Efron reiterated the usefulness of the various TREC competitions to fostering IR research. I agree with him (and with others) that TREC has certainly been a good incubator both in its annual competition and in follow-on studies that use its data in other ways.\u00a0 And, as Miles [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15],"tags":[91,90],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1256"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1256"}],"version-history":[{"count":12,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1256\/revisions"}],"predecessor-version":[{"id":1582,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1256\/revisions\/1582"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1256"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1256"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1256"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}