{"id":5092,"date":"2011-03-29T06:01:16","date_gmt":"2011-03-29T13:01:16","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=5092"},"modified":"2011-03-29T05:56:47","modified_gmt":"2011-03-29T12:56:47","slug":"released-reverted-indexing-source-code","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=5092","title":{"rendered":"Released: Reverted Indexing source code"},"content":{"rendered":"<p>I am pleased to announce that we are releasing a version of the reverted indexing framework as open source software! The release includes the framework and an implementation in Lucene.<\/p>\n<p>Reverted indexing is an  information retrieval technique for query expansion, relevance feedback, and a variety of other operations. The details are described on our <a title=\"Reverted Indexing | FXPAL\" href=\"http:\/\/www.fxpal.com\/?p=RevertedIndexing\" target=\"_blank\">web site<\/a>, in several <a title=\"Posts tagged RevertedIndex | FXPAL Blog\" href=\"http:\/\/palblog.fxpal.com\/?tag=revertedindex\" target=\"_blank\">posts<\/a> on this blog, and in our <a title=\"Pickens, J., Cooper, M., and Golovchinsky, G. (2010) Reverted Indexing for Feedback and Expansion. In Proc. CIKM 2010. | FXPAL\" href=\"http:\/\/www.fxpal.com\/?p=abstract&amp;abstractID=581\" target=\"_blank\">CIKM 2010<\/a> paper. The source code and JAR file can be downloaded from <a title=\"Reverted Indexing (Source code) | FXPAL\" href=\"http:\/\/www.fxpal.com\/?p=RevertedIndexing#source\" target=\"_blank\">Reverted Indexing<\/a> page; see the <a href=\"http:\/\/www.fxpal.com\/RevertedIndexing\/docs\/\" target=\"_blank\">Javadocs<\/a> for details of the API.<\/p>\n<p><!--more--><\/p>\n<p>I&#8217;ve tried to make the code as easy to use as possible, but it would be useful to get feedback on what could be improved to make the library more useful. Of course others are welcome to implement the framework for other search engines, and I am happy to offer some help on such efforts. In principle, though, it&#8217;s quite simple: the Lucene implementation is just a few classes, with a few methods each.<\/p>\n<h3>Examples<\/h3>\n<p>Here is how you can create a reverted index given that you already have \u00a0Lucene inverted index:<\/p>\n<pre class=\"code\"><code class=\"comment\">\r\n \/\/ Create the spec for the inverted index<\/code>\r\n<code>IndexSpecification spec = \r\n  new IndexSpecification(\"inverted\",\"id\",\"body\",1000);<\/code>\r\n\r\n<code class=\"comment\">\r\n\/\/ Create a searcher that will execute the basis \r\n\/\/ queries.<\/code>\r\n<code>Searcher searcher = new LuceneSearcher( spec );<\/code>\r\n\r\n<code class=\"comment\">\r\n\/\/ Create the iterator that will generate the index \r\n\/\/ terms from an inverted index to be used as \r\n\/\/ basis queries<\/code>\r\n<code>BasisQuerySource termSource =\r\n  LuceneIndexTermIterator.createStandardIterator(\r\n    spec.getLocation(),null,5,\"basis-queries.txt\");<\/code>\r\n\r\n<code class=\"comment\">\r\n\/\/ Create a RevertedDocument iterator with the \r\n\/\/ searcher and the iterator over the basis queries<\/code>\r\n<code>RevertedDocumentIterator rdi =\r\n  new RevertedDocumentIterator(searcher, termSource);<\/code>\r\n\r\n<code class=\"comment\">\r\n\/\/ Build the index<\/code>\r\n<code>RevertedIndexing indexer = \r\n  new RevertedIndexingLucene();\r\nindexer.buildRevertedIndex(rdi);\r\nindexer.close();\r\nsearcher.close();<\/code><\/pre>\n<p>Once the reverted index is created, you can query it to perform a number of standard operations. First, you need to set up searching, which involves creating a <code>Searcher<\/code> on each index, and putting them together like this:<\/p>\n<pre class=\"code\"><code class=\"comment\">\/\/ Create the searcher on the inverted index<\/code>\r\n<code>IndexSpecification spec = \r\n  new IndexSpecification(\"inverted\",\"id\",\"body\",1000);\r\nSearcher inverted = new LuceneSearcher(spec);\r\n<\/code>\r\n\r\n<code class=\"comment\">\r\n\/\/ Create the searcher on the reverted index<\/code>\r\n<code>IndexSpecification revertedSpec = \r\n  RevertedIndexingLucene.revertedIndexSpec(\r\n    \"reverted\", 500);\r\nRevertedSearcher reverted =\r\n    new LuceneRevertedSearcher(revertedSpec);<\/code>\r\n\r\n<code class=\"comment\">\r\n\/\/ Create the reverted querying instance<\/code>\r\n<code>RevertedQuerying revertedQuerying =\r\n    new RevertedQuerying(inverted, reverted);\r\n<\/code><\/pre>\n<p>Now you can do relevance feedback, assuming some user input:<\/p>\n<pre class=\"code\"><code>\r\nString userQuery = \"good stuff\";\r\nString[] docids = {\"docid1\", \"docid2\"};\r\nRankedList relDocs = RankedList.fromDocIds(docids);\r\nExpansionResults results =\r\n    revertedQuerying.runRelevanceFeedbackQuery(\r\n        userQuery, relDocs, false );\r\n<\/code><\/pre>\n<p>Pseudo-relevance feedback works like this:<\/p>\n<pre class=\"code\"><code>\r\nresults =\r\n    revertedQuerying.runPseudoRelevanceFeedbackQuery(\r\n        userQuery, 5, false );\r\n<\/code><\/pre>\n<p>In each case, the <code>ExpansionResults<\/code> instance will contain the set of documents retrieved by the original (unexpanded), the set of expansion terms, and the final set of documents. When doing relevance feedback, you can also ask for the residual document list that excludes the documents used for relevance feedback.<\/p>\n<p>If you just want to get some expansion terms given a collection of document ids, you can do this, without bothering with a <code>RevertedQuerying<\/code> object:<\/p>\n<pre class=\"code\"><code>\r\nString[] docids = {\"doc1\", \"doc2\", \"doc27\"}\r\nRankedList terms = reverted.runDocumentQuery(docids);\r\n<\/code><\/pre>\n<p>Each item in terms will contain a docid and a score, where the docid is a basis query that can be used in the <em>inverted<\/em> index. Typically, it will be of the form <code>field:value<\/code> which was generated by the indexing process.<\/p>\n<h3>Acknowledgments<\/h3>\n<p>While I am the one who bundled the code together and pushed it out, much of the credit for this work belongs to <a title=\"Jeremy Pickens | IR Gupf\" href=\"http:\/\/irgupf.com\/about\/\" target=\"_blank\">Jeremy Pickens<\/a> who conceived of the idea in the first place. Of course his inspiration was <a href=\"http:\/\/www.dcs.gla.ac.uk\/~leif\/\">Leif Azzopardi<\/a> and Vishwa Vinay&#8217;s work on <a href=\"http:\/\/www.dcs.gla.ac.uk\/~leif\/papers\/azzopardi2008retrievability.pdf\">retrievability<\/a>, and Leif also planted the idea that we should release the software as open source. Finally, I would like to thank <a title=\"Abdigani Diriye | UCL\" href=\"http:\/\/www.ucl.ac.uk\/uclic\/people\/a_diriye\" target=\"_blank\">Abdigani Diriye<\/a> for beta-testing this API, and <a title=\"Andreas Girgensohn | FXPAL\" href=\"http:\/\/www.fxpal.com\/?p=andreasg\" target=\"_blank\">Andreas Girgensohn<\/a> for his suggestions.<\/p>\n<p>Comments, suggestions are always welcome!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Open-source release of Reverted Indexing library.<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15],"tags":[256,23,275],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5092"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5092"}],"version-history":[{"count":78,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5092\/revisions"}],"predecessor-version":[{"id":5178,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5092\/revisions\/5178"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5092"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5092"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5092"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}