Blog Category: Information seeking

Making sense of Twitter search

on Comments (16)

Last week Jeremy and I attended the SSM2010 workshop held in conjunction with WSDM2010. In addition to chairing one of the panels, I got an opportunity to demonstrate an interface that I built to browse Twitter search results, to which Daniel alluded in his summary of the workshop. The system is described in a position paper (co-authored with Miles Efron) that has been accepted to the Microblogging workshop held in conjunction with CHI 2010.

The idea behind this interface is that Twitter displays its search results only by date, thereby making it difficult to understand anything about the result set other than what the last few tweets were. But tweets are structurally rich, including such metadata as the identity of the tweeter, possible threaded conversation, mentioned documents, etc. The system we built is an attempt to explore the possibilities of how to bring HCIR techniques to this task.

Continue Reading

SSM2010

on Comments (4)

Last Wednesday Jeremy and I participated in the SSM2010 workshop organized by Ian Soboroff (NIST), Eugene Agichtein (Emory University), Daniel Tunkelang (Google), and Marti Hearst (University of California, Berkeley).  It was a full day of panels, discussions and poster presentations on a variety of topics related to search, to social media, and how to combine the two. In an earlier post, I wrote about one way that we can characterize the space, and Daniel did an excellent job of summarizing the workshop, which was also cross-posted  at BLOG@CACM.

I am still trying to digest all that I learned during the day, but I can say that one of the challenges was live-tweeting the event. I was one of several people who tweeted about what was happening in the panels and about the issues that were raised. Over 500 tweets were sent and resent with the workshop’s hashtag by people at the event and elsewhere. It was interesting to see other people pick up some of the topics and comment on them. In particular, several of my twitter friends who are not part of the SSM research community had commented on the tweets, and retweeted certain aspects of the discussion.

Continue Reading

What do we mean by “Search in Social Media”?

on Comments (3)

Jeremy and I have been busy preparing for the Search in Social Media (SSM2010) workshop. We thought we would start at the beginning and ask what people understood by the term “search in social media.” Workshops often spend a bunch of time on definitions, and we thought we’d jump in early. We’ve talked about social search before, but that was without reference to social media.

We think the phrase ‘search in social media’ has been used to refer to both the information being searched, and to the process for doing so. The information is standard user-generated content — tweets, blog posts, comment threads, tags, etc. The process, however, seems less well understood.

Continue Reading

Finding facets

on Comments (6)

I’ve been messing around with Twitter search, which (on a small scale) led me to store structured tweet, people and document data. I used a relational database to store the data I got from Twitter, and everything worked just fine. (That is, performance was limited by the Twitter API and Twitter search API, not by my database.) But say you have lots of data, and it includes text and structure, and you want to search it. What if you’re Twitter or LinkedIn? Can you still use MySQL or Oracle or whatever to store your data and serve up search results?

At a recent SDForum talk on the search capabilities of LinkedIn, John Wang described how LinkedIn handles its faceted search. The talk covered a wide range of topics around managing scalability that are undoubtedly shared by many web companies: how to handle real-time updates, how to scale to millions of users, etc. LinkedIn uses Lucene and other related tools, and to their credit has made contributions to the Lucene open source tool set, including Bobo and Zoie.

Continue Reading

What If Everyone Were Number One?

on

I’ve been doing a bit of thinking lately about search engines, algorithmic openness, and spammers.  I suppose this was all prompted by a blog post recently on the Meaning of Open: http://googleblog.blogspot.com/2009/12/meaning-of-open.html

In this post, it is claimed that openness is good: open systems, open source, open data.  This claim is held forth as true…for everything except for search algorithms.   In the case of algorithms, the secret sauce must be kept exactly that: secret.  Spammers would otherwise have too much power.

That claim makes me want to play around with a little thought experiment.  What if the search algorithm were indeed fully open?  What if everyone in the world knew exactly how rankings were done, and could modify their web pages so as to adapt themselves to whatever the ranking function is.  In short, what if everyone were number one?  Continue Reading

SSM2010 panel: Research Directions for Search in Social Media

on Comments (3)

The third workshop on Search in Social Media (SSM2010) will held in conjunction with WDSM 2010 in early February. The workshop, organized this year by Eugene Agichtein (Emory University), Marti Hearst (University of California, Berkeley), Ian Soboroff (NIST), and Daniel Tunkelang (Google), will bring together academics and people from industry (including the major search engines). The keynote will be given by Jan Pedersen, who is now Chief Scientist for Core Search at Microsoft. It will address issues of what the big players are doing, what the more specialized social media companies are up to, and will also tackle important research problems in the field.

Continue Reading

#Google #search for #Twitter? #fail!

on Comments (9)

For a while now, Google has been serving up tweets related to searches as part of its real-time search effort. Now they are making it possible to search the Twitter stream in exactly the way Twitter doesn’t allow — that is, to search for tweets older than a few days. A query like

cyberwarfare site:twitter.com

will return a bunch of tweets, formatted as Google search results. As of the time I ran this query, it identified 1,380 hits from Twitter. Twitter’s search yielded about 250 tweets, going back to no more than 10 days ago. So far, so good.

Continue Reading

Position papers for Collab Info Seeking workshop

on

We had a record crop of position papers for the Collaborative Information Seeking (CIS) workshop we’re organizing at CSCW 2010. Underscoring the ubiquity of collaboration in information seeking, the position papers address everything from health care to emergency response to SecondLife to the information seeking ecology within the enterprise. The papers clustered out into several broad categories, although some papers could have been easily classified in more than one way.

Continue Reading

CFP: IIiX 2010

on Comments (3)

If you are doing research in interactive information retrieval, information seeking, collaborative search, and the like (that is, you’re concerned with what users do when they look for information), you might consider submitting  paper to IIiX 2010.

IIiX will explore the relationships between the contexts that affect information retrieval and information seeking, how these contexts impact information behavior, and how knowledge of information contexts and information behaviors can help design truly interactive information systems.

Continue Reading

Never mind about the Turkers, what do YOU think?

on Comments (4)

Let’s do an experiment. Here’s a TREC topic that specifies an information need

Food/Drug Laws

Description: What are the laws dealing with the quality and processing of food, beverages, or drugs?

Narrative: A relevant document will contain specific information on the laws dealing with such matters as quality control in processing, the use of additives and preservatives, the avoidance of impurities and poisonous substances, spoilage prevention, nutritional enrichment, and/or the grading of meat and vegetables. Relevant information includes, but is not limited to, federal regulations targeting three major areas of label abuse: deceptive definitions, misleading health claims, and untrue serving sizes and proposed standard definitions for such terms as high fiber and low fat.

Below are links to four documents that have been identified by some systems as being relevant to the above topic. Are they?

(I apologize in advance for the primitive nature of this form and its many usability defects.)