{"id":5089,"date":"2011-02-07T07:04:02","date_gmt":"2011-02-07T15:04:02","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=5089"},"modified":"2011-02-04T16:16:26","modified_gmt":"2011-02-05T00:16:26","slug":"recommendations-needed","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=5089","title":{"rendered":"Recommendations needed"},"content":{"rendered":"<p>In one of our research projects, we are trying to compare some alternative algorithms for generating recommendations based on content similarity. As you might expect, we have some data we&#8217;re playing with, but the data is noisy and sometimes it&#8217;s hard to make sense of the variability: is it due to noise in the data, or is the algorithm trying to tell us something?<\/p>\n<p>So my thought was to break the problem into two parts: first deal with our algorithms on known data, and then apply the results to the new, noisy data to see what&#8217;s there. My purpose in writing this post is to solicit suggestions about which publicly-available data we should be using.<\/p>\n<p><!--more-->Here are the characteristics the data we&#8217;d like to use should have:<\/p>\n<ul>\n<li>Each item should belong to one or more categories.<\/li>\n<li>Each item should be characterized by at least one attribute.<\/li>\n<li>There should be a large number of attributes, but each item may have only a few of them assigned.<\/li>\n<li>If possible, attributes should have numerical scores or ratings.<\/li>\n<li>There should be lots of items.<\/li>\n<\/ul>\n<p>Looking forward to hearing about your suggestions. In the end, we&#8217;ll probably try to do the analysis on several different datasets to check the robustness of our findings.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In one of our research projects, we are trying to compare some alternative algorithms for generating recommendations based on content similarity. As you might expect, we have some data we&#8217;re playing with, but the data is noisy and sometimes it&#8217;s hard to make sense of the variability: is it due to noise in the data, [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[286],"tags":[287],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5089"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5089"}],"version-history":[{"count":2,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5089\/revisions"}],"predecessor-version":[{"id":5091,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5089\/revisions\/5091"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5089"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5089"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5089"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}