{"id":5062,"date":"2010-12-21T07:30:45","date_gmt":"2010-12-21T15:30:45","guid":{"rendered":"http:\/\/palblog.fxpal.com\/?p=5062"},"modified":"2010-12-20T14:33:12","modified_gmt":"2010-12-20T22:33:12","slug":"numbers-counting-books","status":"publish","type":"post","link":"https:\/\/blog.fxpal.net\/?p=5062","title":{"rendered":"When is one>two and seven==eight?"},"content":{"rendered":"<p>So Google recently released the Google books <a href=\"http:\/\/ngrams.googlelabs.com\/\">N-gram viewer<\/a> along with the <a href=\"http:\/\/ngrams.googlelabs.com\/datasets\">datasets<\/a>.<\/p>\n<p>There&#8217;s been plenty of press about it, and the <a href=\"http:\/\/www.librarian.net\/wp-content\/uploads\/science-googlelabs.pdf\">Science article<\/a> based on this data is an interesting read.<\/p>\n<p>I was trying to come up with a simple, yet insightful query. My initial trial was <a href=\"http:\/\/\">modernism,postmodernism<\/a> which immediately had me wondering about hyphenation or the lack thereof&#8230; \u00a0In any case, the upshot seems to be that the use of the term postmodernism started 1978ish. Neat, though I think I won&#8217;t need to clear space for my Nobel Prize anytime soon.<\/p>\n<p>I toyed a little bit with other terms like <a href=\"http:\/\/ngrams.googlelabs.com\/graph?content=generation+X&amp;year_start=1900&amp;year_end=2000&amp;corpus=0&amp;smoothing=3\">generation X<\/a> which has an odd sort of bump in the graph around 1970. Not sure what&#8217;s up with that, though perhaps there&#8217;s some data collection artifacting as <a href=\"http:\/\/searchengineland.com\/when-ocr-goes-bad-googles-ngram-viewer-the-f-word-59181\">discussed in this article<\/a>. \u00a0I wasn&#8217;t inclined to deep end on this and was happy enough to have my prior knowledge confirmed by noting that the use of &#8220;generation X&#8221; took off in the mid 1990&#8217;s.<\/p>\n<p>My final trial was a bit more on the minimal side: <a href=\"http:\/\/ngrams.googlelabs.com\/graph?content=one%2Ctwo%2Cthree%2Cfour%2Cfive%2Csix%2Cseven%2Ceight%2Cnine%2Cten&amp;year_start=1700&amp;year_end=2000&amp;corpus=0&amp;smoothing=3\">one,two,three,four,five,six,seven,eight,nine,ten<\/a>. There shouldn&#8217;t be any surprise here that &#8220;one&#8221; is more common than &#8220;two&#8221; is more common than &#8220;three&#8221;, is more common than &#8220;four&#8221;. It probably shouldn&#8217;t be a surprise that each succeeding number is less frequent by roughly a factor of 2.<\/p>\n<div id=\"attachment_5063\" style=\"width: 430px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/ngrams.googlelabs.com\/graph?content=one%2Ctwo%2Cthree%2Cfour%2Cfive%2Csix%2Cseven%2Ceight%2Cnine%2Cten&amp;year_start=1700&amp;year_end=2000&amp;corpus=0&amp;smoothing=3\"><img aria-describedby=\"caption-attachment-5063\" decoding=\"async\" class=\"wp-image-5063   \" title=\"CropperCapture[192]\" src=\"http:\/\/palblog.fxpal.com\/wp-content\/uploads\/2010\/12\/CropperCapture192.png\" alt=\"Occurence of numbers in google books N-gram viewer\" width=\"420\" \/><\/a><p id=\"caption-attachment-5063\" class=\"wp-caption-text\">Google books n-gram viewer for numbers<\/p><\/div>\n<p>Less intuitive (to me anyway) is that &#8220;ten&#8221; squeezes in front of &#8220;seven&#8221; and &#8220;eight&#8221; (OK, so maybe it&#8217;s a round number), &#8220;seven&#8221; and &#8220;eight&#8221; are basically tied, but even more odd is that <a href=\"http:\/\/ngrams.googlelabs.com\/graph?content=five%2Csix%2Cseven%2Ceight%2Cnine%2Cten&amp;year_start=1780&amp;year_end=1820&amp;corpus=0&amp;smoothing=3\">before 1790 or so, the putative occurrence of &#8220;six&#8221; and &#8220;seven&#8221; were virtually non-existent<\/a>.<\/p>\n<div id=\"attachment_5066\" style=\"width: 430px\" class=\"wp-caption alignnone\"><a href=\"http:\/\/ngrams.googlelabs.com\/graph?content=five%2Csix%2Cseven%2Ceight%2Cnine%2Cten&amp;year_start=1780&amp;year_end=1820&amp;corpus=0&amp;smoothing=3\"><img aria-describedby=\"caption-attachment-5066\" decoding=\"async\" class=\"wp-image-5066   \" title=\"CropperCapture[191]\" src=\"http:\/\/palblog.fxpal.com\/wp-content\/uploads\/2010\/12\/CropperCapture1911.png\" alt=\"\" width=\"420\" \/><\/a><p id=\"caption-attachment-5066\" class=\"wp-caption-text\">Detail on number occurrences<\/p><\/div>\n<p>Turns out it appears to be the same issue with the &#8220;medial S&#8221; that Danny Sullivan describes in greater detail in his <a href=\"http:\/\/searchengineland.com\/when-ocr-goes-bad-googles-ngram-viewer-the-f-word-59181\">post<\/a>. In other words, it&#8217;s an artifact of OCR and an indication of the evolution of typography rather than the evolution of language.<\/p>\n<p>One mystery solved; now why are &#8220;seven&#8221; and &#8220;eight&#8221; tied in frequency?<\/p>\n<p>Kudos to Google for releasing the viewer and data.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So Google recently released the Google books N-gram viewer along with the datasets. There&#8217;s been plenty of press about it, and the Science article based on this data is an interesting read. I was trying to come up with a simple, yet insightful query. My initial trial was modernism,postmodernism which immediately had me wondering about [&hellip;]<\/p>\n","protected":false},"author":14,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[15,61],"tags":[123,45],"jetpack_featured_media_url":"","_links":{"self":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5062"}],"collection":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/users\/14"}],"replies":[{"embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=5062"}],"version-history":[{"count":18,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5062\/revisions"}],"predecessor-version":[{"id":5072,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=\/wp\/v2\/posts\/5062\/revisions\/5072"}],"wp:attachment":[{"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=5062"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=5062"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blog.fxpal.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=5062"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}