{"id":74,"date":"2009-02-23T08:01:07","date_gmt":"2009-02-23T16:01:07","guid":{"rendered":"http:\/\/palblog\/?p=74"},"modified":"2009-03-09T23:48:09","modified_gmt":"2009-03-10T06:48:09","slug":"search-log-mining","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=74","title":{"rendered":"Search log mining"},"content":{"rendered":"<p>Max Wilson recently <a title=\"search interaction is short | Max L. Wilson\" href=\"http:\/\/maxlwilson.blogspot.com\/2009\/02\/search-interaction-is-short.html\" target=\"_blank\">commented<\/a> on an article by <a title=\"Zhang, Y., et al. Time series analysis of a Web search engine transaction log. Information Processing and Management (2008), doi:10.1016\/j.ipm.2008.07.003\" href=\"http:\/\/ist.psu.edu\/faculty_pages\/jjansen\/academic\/jansen_time_series_analysis.pdf\" target=\"_blank\">Zhang et al<\/a> on time series analysis of search logs.\u00a0 (Thanks to <a title=\"Daniel Tunkelang | The Noisy Channel\" href=\"http:\/\/thenoisychannel.com\/about\/\">Daniel<\/a> for the web link.) This is a topic I&#8217;ve been interested in for a while, in particular for finding evidence of exploratory search in the logs.\u00a0 Max notes that the average session length is just under three interactions, and concludes that this really amounts to a single query\/selection interaction. If that&#8217;s so, then the average behavior\u00a0 characterized in the Zhang paper is not in fact exploratory.<\/p>\n<p>I was looking forward to reading the paper, but my excitement was short-lived:<\/p>\n<p><!--more--><\/p>\n<ul>\n<li>The abstract is wrong!\u00a0 It claims that &#8220;results show that the average length of a searcher session is approximately 2.9 interactions&#8221; whereas what they are really measuring is the average number of query terms. How disappointing! Given that the authors had access to the IP address and cookie information associated with each query, they could have estimated session length, calculated the range of terms (and how many terms were repeated), and derived all sorts of other useful statistics about users&#8217; behavior.<\/li>\n<li>The authors find that the rank of lowest-selected document ranges mostly from five to eight, meaning that searchers typically don&#8217;t get off the first page of results. This is not surprising for <a title=\"Wright, P. 1991. Cognitive overheads and prostheses: some issues in evaluating hypertexts. In Proc. Hypertext '91.  ACM, New York, NY, 1-12. DOI= http:\/\/doi.acm.org\/10.1145\/122974.122975  | ACM Digital Library\" href=\"http:\/\/portal.acm.org\/citation.cfm?id=122974.122975&amp;type=series\" target=\"_blank\">cognitive reasons<\/a>. In fact, in my PhD research, I found that increasing the number of search results displayed per page increased the subset of the results set viewed by searchers (<a title=\"The Newspaper as an Information Exploration Metaphor | www.fxpal.com\" href=\"http:\/\/www.fxpal.com\/?p=abstract&amp;abstractID=155\" target=\"_blank\">abstract<\/a>, <a title=\"The Newspaper as an Information Exploration Metaphor | IP&amp;M 33 (5), 1997, pp 663-683\" href=\"http:\/\/www.sciencedirect.com\/science?_ob=ArticleURL&amp;_udi=B6VC8-3SX1FSW-S&amp;_user=10&amp;_coverDate=09%2F30%2F1997&amp;_alid=868155716&amp;_rdoc=2&amp;_fmt=high&amp;_orig=search&amp;_cdi=5948&amp;_docanchor=&amp;view=c&amp;_ct=8&amp;_acct=C000050221&amp;_version=1&amp;_urlVersion=0&amp;_userid=10&amp;md5=70d45b578227511e416d70e6e2226c61\" target=\"_blank\">full paper<\/a>).<\/li>\n<li>The correlation analysis suggested that for informational searches, &#8220;searchers who typed in the fewest query terms one period ahead (i.e., less than the average query length) were more likely to click higher ranked links (i.e., top ranked results) in the following period.&#8221;\u00a0 This result is characteristic of naive searchers, or perhaps the authors&#8217; classification of informational vs. navigational searches was not sufficiently accurate. The authors do not report any analysis of query terms, so it is difficult to know if there is a relationship between the types of search terms used and the exhibited search behaviors.<\/li>\n<\/ul>\n<p>I hope that the authors revisit their analysis with an eye for capturing multiple metrics of users&#8217; behaviors, and looks for patterns that indicate a diversity of searching styles and tactics. Another direction to take would be to classify search terms in some manner (e.g., part of speech, proper name, url, etc.) to see if these classifications predict behavior. Such a result could be operationalized by, for example, adjusting search interfaces based on a classification of query terms.<\/p>\n<p>I tried to find the editor responsible for the issue in which this article appears, but could not find a reliable reference to it. The pre-print lists it as being published in 2008, but no issue of IP&amp;M published in 2008 listed that article in its table of contents. I hope that the editors catch the error and at least have the authors edit the abstract\u2014I would hate to pay money for this based on the abstract, only to have the article offer a completely different analysis.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I was looking forward to a search log analysis of exploratory search behavior&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15],"tags":[21,19,20],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/74"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=74"}],"version-history":[{"count":18,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/74\/revisions"}],"predecessor-version":[{"id":135,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/74\/revisions\/135"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=74"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=74"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=74"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}