{"id":1246,"date":"2009-07-01T11:37:19","date_gmt":"2009-07-01T18:37:19","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=1246"},"modified":"2009-07-01T11:37:19","modified_gmt":"2009-07-01T18:37:19","slug":"on-the-science-of-ir","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=1246","title":{"rendered":"On the Science of IR"},"content":{"rendered":"<p><a href=\"http:\/\/www.ischool.utexas.edu\/~miles\/\" target=\"_blank\">Miles Efron<\/a> posted recently on his take on the <a title=\"Is the science of IR improving? | Probably irrelevant\" href=\"http:\/\/probablyirrelevant.org\/2008\/10\/is-the-science-of-ir-improving\/\" target=\"_blank\">progress of the IR field<\/a> in response to a question posted by <a href=\"http:\/\/sentra.ischool.utexas.edu\/%7Eadillon\/blog\/\" target=\"_blank\">Andrew Dillon<\/a> at the last <a title=\"American Society for Info Science and Technology\" href=\"http:\/\/www.asist.org\/\" target=\"_blank\">ASIST<\/a> conference. Miles&#8217; take was that progress was indeed being made for two reasons: the <a title=\"ACM Special Interest Group on Information Retrieval\" href=\"http:\/\/www.sigir.org\/\" target=\"_blank\">SIGIR<\/a> conference has become more competitive over the years, and the diversity of corpora in the <a title=\"Text REtrieval Conference | National Institute of Standards and Technology\" href=\"http:\/\/trec.nist.gov\/\" target=\"_blank\">TREC<\/a> umbrella has also increased. Unfortunately, I wasn&#8217;t there to hear the question or the subsequent discussion, but my guess as to what Andrew Dillon actually meant was not a question of statistical significance, but rather one of magnitude.<\/p>\n<p><!--more-->Every year we see incremental improvements in <a title=\"Definition of Mean Average Precision | NIST\" href=\"http:\/\/www-nlpir.nist.gov\/works\/presentations\/spie99\/sld016.htm\" target=\"_blank\">Mean Average Precision<\/a> (MAP) scores reported in SIGIR (and in CIKM, and in other venues) for some narrow conceptions of the search task. The gains are real, but they may not matter. Similarly, Google recently <a href=\"http:\/\/code.google.com\/speed\/files\/delayexp.pdf\" target=\"_blank\">reported<\/a> (thanks <a title=\"Speed Matters. So Does the Metric.  | IR Gupf\" href=\"http:\/\/irgupf.com\/2009\/06\/29\/speed-matters-so-does-the-metric\/\" target=\"_blank\">Jeremy<\/a>, thanks <a title=\"New Google study on speed in search results | Geeking with Greg\" href=\"http:\/\/glinden.blogspot.com\/2009\/06\/new-google-study-on-speed-in-search.html\" target=\"_blank\">Greg<\/a>) that a change in latency from 100 msec to 400 msec reduced the number of queries people ran by about 0.5%. Statistically significant, yes. Important? Maybe not.<\/p>\n<p>The scientists among us like to measure things. That&#8217;s how we (and others) know we did something interesting. But it seems that what we really want to measure is difficult to observe, and so we settle on some plausible proxy. And so begins the slippery slope.<\/p>\n<p>It is certainly true that having ongoing improvement in indexing and retrieval algorithms is a good thing. But in some ways it has become a victim of its own success, and, like commercial agriculture, now produces decent commodity goods at ridiculously low cost. To continue with the analogy, we need to diversify our notion of information retrieval to include not only the supermarket (where any time of day you can find exactly the same product that you&#8217;ve always bought but without the ability to really understand or control what&#8217;s in the box) but also the farmer&#8217;s market, where you can find more variety, more surprises, and more interaction with the people who grow the food you will be eating.<\/p>\n<p>So there is still room in the field of information retrieval for progress, but the low-hanging fruit of precision-oriented search have been harvested. We now need to look to <a title=\"The Craft of Exploratory Search | FXPAL\" href=\"http:\/\/palblog.fxpal.com\/?p=634\" target=\"_blank\">more difficult tasks<\/a>, to <a title=\"Exploratory Search | Wikipedia\" href=\"http:\/\/en.wikipedia.org\/wiki\/Exploratory_search\" target=\"_blank\">exploratory search<\/a>, to <a title=\"Human-Computer information retrieval | Wikipedia\" href=\"http:\/\/en.wikipedia.org\/wiki\/Human%E2%80%93computer_information_retrieval\" target=\"_blank\">interaction<\/a>, to <a title=\"Communicating about Collaboration | FXPAL Blog\" href=\"http:\/\/palblog.fxpal.com\/?p=249\" target=\"_blank\">collaboration<\/a>.\u00a0 Looking beyond the ranked list is not only a pragmatic strategy for innovation, it&#8217;s also good science.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Miles Efron posted recently on his take on the progress of the IR field in response to a question posted by Andrew Dillon at the last ASIST conference. Miles&#8217; take was that progress was indeed being made for two reasons: the SIGIR conference has become more competitive over the years, and the diversity of corpora [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15],"tags":[],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1246"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1246"}],"version-history":[{"count":9,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1246\/revisions"}],"predecessor-version":[{"id":1255,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1246\/revisions\/1255"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1246"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1246"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1246"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}