{"id":1666,"date":"2009-08-26T07:31:21","date_gmt":"2009-08-26T14:31:21","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=1666"},"modified":"2009-08-26T15:07:54","modified_gmt":"2009-08-26T22:07:54","slug":"what-a-tangled-mesh-we-weave","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=1666","title":{"rendered":"What a tangled MeSH we weave"},"content":{"rendered":"<p><a href=\"http:\/\/www.cs.mu.oz.au\/~wew\/\" target=\"_blank\">William Webber<\/a> recently <a title=\"The Cranfield tests | IREvalEtAl\" href=\"http:\/\/blog.codalism.com\/?p=845\" target=\"_blank\">wrote<\/a> an interesting analysis of the reports of the original Cranfield experiments that were so influential in establishing the primacy of evaluation in information seeking, and in particular a certain kind of evaluation methodology around recall and precision based on a ground truth. One reason that the experiments were so influential was that they provided strong evidence that previously-held assumptions about the effectiveness of various indexing techniques were unfounded. Specifically, the experiments showed that full-text indexing outperformed controlled vocabularies. While this result was shocking in the 1950s, 50 years later it seems banal. Or almost.<\/p>\n<p><!--more--><\/p>\n<p>My initial reaction after having read this post was that sure, this was true for general searches, but for specialized domains there were advantages for using controlled vocabulary.\u00a0 For example, PubMed uses <a title=\"Medical Subject Headings | NCBI\" href=\"http:\/\/www.ncbi.nlm.nih.gov\/sites\/entrez?db=mesh\" target=\"_blank\">MeSH<\/a> to augment (expand) users&#8217; queries behind the scenes and that system is used by millions of people, sometimes in <a title=\"Johns Hopkins\u2019 Tragedy: Could Librarians Have Prevented a Death? | infotoday.com\" href=\"http:\/\/newsbreaks.infotoday.com\/nbreader.asp?ArticleID=17534\" target=\"_blank\">life-or-death situations<\/a>. Surely its continued use is a strong indicator of its effectiveness.<\/p>\n<p>But is it? Has the use of MeSH for query expansion been evaluated? I turned to William Hersh&#8217;s third edition of &#8220;<a href=\"http:\/\/www.amazon.com\/Information-Retrieval-Biomedical-Perspective-Informatics\/dp\/038778702X\/ref=sr_1_13?ie=UTF8&amp;s=books&amp;qid=1246000338&amp;sr=1-13\">Information Retrieval: A Health and Biomedical Perspective<\/a>&#8221; which has a whole section (4.3) on\u00a0 controlled vocabularies, including a very nice description of MeSH. But there wasn&#8217;t much there about the benefits to the user of this (or any other) controlled vocabulary. In another chapter (section 7.4), Hersh talks about some studies that compared full-text vs. abstract searching, and reports briefly on the results of a study by <a title=\"McKinin, E.J., Sievert, M.C., Johnson, E.D., and Mitchell, J.A. (1991) The Medline\/full-text research project. JASIS 42(4) pp. 297-307\" href=\"#\">McKinin, Sievert, et al (1991)<\/a> that compared MEDLINE searching with indexing terms vs. text words. The results, summarized by Hersh suggest no difference in performance in recall (42% for indexing terms vs. 41% for full text), and a considerable disadvantage for indexing terms for precision (55% vs. 62%). Surprisingly, there is no discussion of the significance of these results.<\/p>\n<p>I then poked around on the web, and after some doing found a recent <a title=\"Lu, Z., Kim, W. and Wilbur, W.J. (2009) Evaluation of query expansion using MeSH in PubMed. Information Retrieval, 12 (1), Feb 2009. Springer\" href=\"http:\/\/www.springerlink.com\/content\/a7665w69716219v7\/?p=eeb0d378e4124328bc000bfdcdd40da9&amp;pi=0\" target=\"_blank\">article<\/a> by Lu <em>et al.<\/em> that claims that<\/p>\n<blockquote><p>&#8230; to the best of our knowledge, this is the first formal evaluation on the benefits of applying the query expansion technique to a daily operational search system as opposed to retrieval systems mainly designed and tested in research laboratories. Therefore, our analysis plays a critical role in the understanding of future technology development needs for PubMed, along with its mission to better fulfill the information needs of millions of PubMed users.<\/p><\/blockquote>\n<p>The authors say that while MeSH-based query expansion has been implemented and evaluated in several research systems in the context of the TREC Genomics <em>ad hoc<\/em> retrieval track, the results reported by those studies are mixed, with some reporting improved performance and others reporting no improvement or even worse performance with the inclusion of the expansion terms from thesauri.<\/p>\n<p>The authors then describes an evaluation they performed on the same corpus. They constructed full-text and MeSH-expanded queries for 55 topics from the 2006 and 2007 TREC Genomics <em>ad hoc<\/em> retrieval tasks, and used the F-measure to evaluate the resulting performance. For the 2006 data (21 topics), they found better performance for expanded queries for 7 topics, worse for 5, and unchanged for 9; these results were not statistically significant. For 2007 data (34 topics), they found benefits to expansion in 20 topics, with 8 performing worse and 6 unchanged; these results were statistically significant.<\/p>\n<p>When comparing results using precision at 5, 10, and 20 documents (measures useful to indicate the likelihood that a person would actually see a document retrieved by the system) results were mixed, no clear advantage to either method. In the end, the authors conclude that while thesaural expansion tends to improve recall somewhat, the improvement comes at the expense of precision, just as McKinin <em>et al. <\/em>had found.<\/p>\n<p>So the lessons of the Cranfield experiments seem to remain current even after 50 years of progress in information seeking algorithms. What&#8217;s even more surprising, however, is the lack of a solid body of work that characterizes the effectiveness of such an established thesaurus.<\/p>\n<p><strong>Update<\/strong>: Fixed typo in an author&#8217;s name.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>William Webber recently wrote an interesting analysis of the reports of the original Cranfield experiments that were so influential in establishing the primacy of evaluation in information seeking, and in particular a certain kind of evaluation methodology around recall and precision based on a ground truth. One reason that the experiments were so influential was [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15],"tags":[113,112],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1666"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1666"}],"version-history":[{"count":8,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1666\/revisions"}],"predecessor-version":[{"id":1672,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/1666\/revisions\/1672"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1666"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1666"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1666"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}