Finding facets
I’ve been messing around with Twitter search, which (on a small scale) led me to store structured tweet, people and document data. I used a relational database to store the data I got from Twitter, and everything worked just fine. (That is, performance was limited by the Twitter API and Twitter search API, not by my database.) But say you have lots of data, and it includes text and structure, and you want to search it. What if you’re Twitter or LinkedIn? Can you still use MySQL or Oracle or whatever to store your data and serve up search results?
At a recent SDForum talk on the search capabilities of LinkedIn, John Wang described how LinkedIn handles its faceted search. The talk covered a wide range of topics around managing scalability that are undoubtedly shared by many web companies: how to handle real-time updates, how to scale to millions of users, etc. LinkedIn uses Lucene and other related tools, and to their credit has made contributions to the Lucene open source tool set, including Bobo and Zoie.