{"id":4477,"date":"2010-08-26T07:37:16","date_gmt":"2010-08-26T14:37:16","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=4477"},"modified":"2010-08-26T14:01:48","modified_gmt":"2010-08-26T21:01:48","slug":"hcir-search-challenge","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=4477","title":{"rendered":"HCIR Search Challenge"},"content":{"rendered":"<p>The fourth HCIR workshop was held this past weekend at Rutgers University in conjunction with the IIiX 2010 conference. This was, in my opinion, the best workshop of the four so far. Part of the strength of the workshop has been the range of presentations, covering more mature work in traditional 30 minute presentations, a poster and demo session, and, new this year, reports from the <a title=\"HCIR 2010 Search Challenge | HCIR 2010\" href=\"http:\/\/research.microsoft.com\/en-us\/um\/people\/ryenw\/hcir2010\/challenge.html\" target=\"_blank\">HCIR search challenge<\/a>.<\/p>\n<p>From the web site:<\/p>\n<blockquote><p>The aims of the challenge are to  encourage researchers and practitioners to build and demonstrate  information access systems satisfying at least one of the following:<\/p>\n<ul>\n<li>Not only deliver relevant documents, but provide facilities for making meaning with those documents.<\/li>\n<li>Increase user responsibility as well as control; that is, the systems require and reward human effort.<\/li>\n<li>Offer the flexibility to adapt to user knowledge \/ sophistication \/ information need.<\/li>\n<li>Are engaging and fun to use.<\/li>\n<\/ul>\n<\/blockquote>\n<p>Participants would be given access to the New York Times annotated corpus which consists of 1.8 million articles published in the Times between 1987 and 2007, and they would be expected do something interesting in searching or browsing this collection.<\/p>\n<p><!--more--><\/p>\n<p>Several teams competed in the event, and their entries were judged by the workshop participants. The entries (available as part of the <a title=\"HCIR 2010 Proceedings\" href=\"http:\/\/research.microsoft.com\/en-us\/um\/people\/ryenw\/hcir2010\/docs\/HCIR2010Proceedings.pdf\" target=\"_blank\">proceedings<\/a>) were:<\/p>\n<ul>\n<li><strong>Search for Journalists: New York Times Challenge Report<\/strong><br \/>\nCorrado Boscarino, Arjen P. de Vries, and Wouter Alink<br \/>\n(Centrum Wiskunde and Informatica)<\/li>\n<li><strong>Exploring the New York Times Corpus with NewsClub<\/strong><br \/>\nChristian Kohlsch\u00fctter (Leibniz Universit\u00e4t Hannover)<\/li>\n<li><strong>Searching Through Time in the New York Times<\/strong><br \/>\nMichael Matthews, Pancho Tolchinsky, Roi Blanco, Jordi Atserias, Peter Mika, and<br \/>\nHugo Zaragoza (Yahoo! Labs)<\/li>\n<li><strong>News Sync: Three Reasons to Visualize News Better<\/strong><br \/>\nV.G. Vinod Vydiswaran (University of Illinois),<br \/>\nJeroen van den Eijkhof (University of Washington),<br \/>\nRaman Chandrasekar (Microsoft Research), Ann Paradiso (Microsoft Research),<br \/>\nand Jim St. George (Microsoft Research)<\/li>\n<li><strong>Custom Dimensions for Text Corpus Navigation<\/strong><br \/>\nVladimir Zelevinsky (Endeca Technologies)<\/li>\n<li><strong>A Retrieval System Based on Sentiment Analysis<\/strong><br \/>\nWei Zheng and Hui Fang (University of Delaware)<\/li>\n<\/ul>\n<p>I liked Vladimir Zelevinsky&#8217;s system that constructed a faceted browsing interface on the fly: given a search term, it generated facets using WordNet relations, and then used terms obtained this way to populate the facets. Of course he got great performance using the Endeca back end, which was able to compute facet value counts quickly enough to populate multiple facets in a few seconds. Search results themselves were presented in a two-column newspaper-like layout which appealed to me with its clean look. Of course I am somewhat partial to newspaper layouts for presenting search results.<\/p>\n<p>Microsoft&#8217;s entry performed automatic clustering of matching articles into several groups, and offered a pleasant UI for browsing the results. It was a much heavier system whose success depends on the quality of the clustering algorithm. In demos, it performed well for the most part, although some of the smaller category labels were a bit odd.<\/p>\n<p>The winner of this competition was the entry from Yahoo that used some NLP techniques to identify references to time in the articles. These references were used to construct a variety of timeline visualizations intended to help journalists make sense of the data. I didn&#8217;t get to play with the system, but it looked like a solid piece of work, judging from the presentation.\u00a0 It also introduced me to <a title=\"Open NLP\" href=\"http:\/\/opennlp.sourceforge.net\/\" target=\"_blank\">OpenNLP<\/a>, a set of NLP tools written in Java. Apparently it&#8217;s quite useful for doing things like POS tagging and named entity extraction.<\/p>\n<p>These presentations highlighted a successful attempt to put HCIR  techniques into practice in an open-ended manner. The effort was also a  success in terms of press coverage thanks to Daniel Tunkelang&#8217;s efforts, with the Yahoo entry receiving  coverage from the <a title=\"Technology Review\" href=\"http:\/\/www.technologyreview.com\/\" target=\"_blank\"> Technology Review<\/a> in an article titled &#8220;<a title=\"A Search Service that Can Peer into the Future | Technology Review\" href=\"http:\/\/www.technologyreview.com\/computing\/26113\/\" target=\"_blank\">A Search Service that Can Peer into the Future<\/a>,&#8221; which was also picked up by <a title=\"A Search Service that Can Peer into the Future | Techmeme\" href=\"http:\/\/www.techmeme.com\/100825\/p62#a100825p62\" target=\"_blank\">Techmeme<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The fourth HCIR workshop was held this past weekend at Rutgers University in conjunction with the IIiX 2010 conference. This was, in my opinion, the best workshop of the four so far. Part of the strength of the workshop has been the range of presentations, covering more mature work in traditional 30 minute presentations, a [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[24,15],"tags":[94,164],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/4477"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=4477"}],"version-history":[{"count":4,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/4477\/revisions"}],"predecessor-version":[{"id":4480,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/4477\/revisions\/4480"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=4477"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=4477"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=4477"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}