{"id":5786,"date":"2014-07-01T09:52:10","date_gmt":"2014-07-01T16:52:10","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=5786"},"modified":"2014-07-01T09:52:10","modified_gmt":"2014-07-01T16:52:10","slug":"to-cluster-or-to-hash","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=5786","title":{"rendered":"To cluster or to hash?"},"content":{"rendered":"<p>Visual search has developed a basic processing pipeline in the last decade or so on top of the\u00a0<a href=\"http:\/\/en.wikipedia.org\/wiki\/Bag-of-words_model_in_computer_vision\">&#8220;bag of visual words&#8221;<\/a>\u00a0representation based on local image descriptors. \u00a0You know it&#8217;s established when it&#8217;s in Wikipedia. \u00a0There&#8217;s been a steady stream of work on image matching using the representation in combination with approximate nearest neighbor search and various downstream geometric verification strategies.<\/p>\n<p>In practice, the most computationally daunting stage can be the construction of the visual codebook which is usually accomplished via k-means or\u00a0<a href=\"http:\/\/www.vis.uky.edu\/~stewe\/publications\/nister_stewenius_cvpr2006.pdf\">tree structured vector quantization<\/a>. \u00a0The problem is to cluster (possibly billions of) local descriptors, and this offline clustering may need to be repeated when there are any significant changes to the image database. \u00a0Each descriptor cluster is represented by one element in a visual vocabulary (codebook). \u00a0In turn, each image is represented by a bag (vector) of these visual words (quantized descriptors).<\/p>\n<p>Building on previous work on high accuracy\u00a0<a href=\"http:\/\/www.fxpal.com\/research-projects\/visual-search-via-hashing\/\">scalable visual search<\/a>, a recent FXPAL\u00a0<a href=\"http:\/\/www.fxpal.com\/publications\/scalable-image-search-with-multiple-index-tables\/\">paper<\/a>\u00a0at\u00a0<a href=\"http:\/\/www.icmr2014.org\/\">ACM ICMR 2014<\/a>\u00a0proposes Vector Quantization Free (VQF) \u00a0search using projective hashing in combination with binary valued local image descriptors. \u00a0 Recent years have seen the development of binary descriptors such as\u00a0<a href=\"http:\/\/www.vision.cs.chubu.ac.jp\/CV-R\/pdf\/Rublee_iccv2011.pdf\">ORB<\/a>\u00a0or\u00a0<a href=\"http:\/\/cvlab.epfl.ch\/research\/detect\/brief\">BRIEF<\/a>\u00a0that improve efficiency with negligible loss of accuracy in various matching scenarios. \u00a0 Rather than clustering the collected descriptors harvested globally from the image database, the codebook is implicitly defined via projective hashing. \u00a0Subsets of the elements of ORB descriptors are hashed by projection (i.e. all but a small number of bits are discarded) to form an index table, as below.<\/p>\n<p style=\"text-align: center;\"><a href=\"http:\/\/palblog.fxpal.com\/wp-content\/uploads\/2014\/05\/VQFMarch2014.jpg\"><img decoding=\"async\" loading=\"lazy\" class=\" wp-image-5790 aligncenter\" alt=\"VQFMarch2014\" src=\"http:\/\/palblog.fxpal.com\/wp-content\/uploads\/2014\/05\/VQFMarch2014-300x164.jpg\" width=\"300\" height=\"164\" srcset=\"https:\/\/blog.fxpal.net\/wp-content\/uploads\/2014\/05\/VQFMarch2014-300x164.jpg 300w, https:\/\/blog.fxpal.net\/wp-content\/uploads\/2014\/05\/VQFMarch2014.jpg 659w\" sizes=\"(max-width: 300px) 100vw, 300px\" \/><\/a><\/p>\n<p>By creating multiple different tables, image matching is implemented by a voting scheme based on the number of collisions (i.e. partial matches) between the descriptors in a test image and those in a database image.<\/p>\n<p>The paper presents experimental results on image databases that validate the expected significant increase in efficiency and scalability using the VQF approach. \u00a0The results also show improved performance over some competitive baselines in near duplicate image search. \u00a0There remain some interesting questions for future work to understand tradeoffs around the size of the hash tables (governed by the number of bits projected) and the number of tables required to deliver a desired level of performance.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Visual search has developed a basic processing pipeline in the last decade or so on top of the\u00a0&#8220;bag of visual words&#8221;\u00a0representation based on local image descriptors. \u00a0You know it&#8217;s established when it&#8217;s in Wikipedia. \u00a0There&#8217;s been a steady stream of work on image matching using the representation in combination with approximate nearest neighbor search and [&hellip;]<\/p>\n","protected":false},"author":50,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[128,7],"tags":[317],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5786"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/50"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5786"}],"version-history":[{"count":10,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5786\/revisions"}],"predecessor-version":[{"id":5819,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5786\/revisions\/5819"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5786"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5786"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5786"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}