A concept by any name

on

Miles Efron wrote about a research project he is starting on statistical processing of 17th and 18th century English texts with the goal of establishing similarities between passages written with different spelling and vocabulary. This is a problem that humanities scholars might have when applying modern information retrieval tools to historical texts, as accepted English spelling and vocabulary was considerably more varied that it is now. (For a fun read about some of the issues, see Bill Bryson’s The Mother Tongue on the history of the English language.)

This problem reminds me somewhat of some of issues related to information retrieval in the patent domain, particularly in the area of invalidity search. Invalidity search (also known as validity search) is the process of trying to discover whether a patent application describes something that has already been patented or otherwise disclosed. One tactic employed by those who write patents is to obscure the relationship to prior art by choosing idiosyncratic vocabulary to describe the invention rather than reusing existing terminology. The patent examiner’s job is it try to penetrate that obfuscation to identify related patents. This kind of search has been explored in the IR community in the NTCIR 2005 patent retrieval task for Japanese patents.

Miles’ proposal is interesting from an IR perspective because it combines aspects of temporal issues (as he notes at the end of his post) with cross-language retrieval, and has close applications beyond the historical domain.

Share on: 

1 Comment

  1. […] This post was mentioned on Twitter by Gene Golovchinsky, Gene Golovchinsky. Gene Golovchinsky said: Posted "A concept by any name" http://palblog.fxpal.com/?p=4246 #sigir […]

Comments are closed.