Should IR Objective Functions be Obfuscated?

on Comments (3)

I have a question. It’s a general question, directed at anyone and everyone.

When one is building an Information Retrieval system, one uses target objective function(s) that give an indication of the performance of the system, and designs the system (algorithms, interfaces, etc.) toward those targets.  Sometimes, those functions are open and well understood.  Other times, those functions are proprietary and hidden.

My question is: Does it do the users of an IR system a service or disservice to hide from them the function that is being optimized?  Or is it completely neutral?  In other words, does the user have to understand, or at least be given the chance to understand, what it is that the system is trying to do for them in order to get the best value out of that system?  Or can a user get results just as good without having to have a clear mental model of what the retrieval engine is trying to do?  In short, does it matter if the user does not understand what the system is trying to do for him or her?

Can someone point me to research that may have looked at this question?  If one were trying to publish original research on the topic, how would one go about designing an experiment in which both (1) this hypothesis is tested, and (2) done so in a way that generalizes, or at least hints at possible generalization?

Larry Rowe wins ACM SIGMM Outstanding Technical Achievement Award 2009

on Comments (3)

The 2009 winner of the prestigious ACM Special Interest Group on Multimedia Award for Outstanding Technical Achievement is our own Dr. Lawrence Rowe.    I have seen this award referred in a number of different ways (even on the ACM SIGMM website), but the above, and “Outstanding Technical Contributions to Multimedia Computing, Communications and Applications” seem to be the most common.  It is only the second year of the award, so we have to wait a while before a cute nickname arises.  (The Mummy award?)

Continue Reading

It’s not what you know, it’s whom you know

on Comments (4)

Almost 10 years ago, L. Sweeny published an analysis of summary census data that was used to identify 87% of respondents based only on their ZIP code, gender, and date of birth, data that we all think of (and the census treats as) relatively anonymous. At about the same time, I visited a friend at a large consulting firm who demonstrated data mining software that combined data from multiple sources and was able to discover many facts about people, that while not particularly revealing individually, painted a much more complete picture when federated. Now comes the news (thanks Daniel) that a group at MIT was able to make better-than-chance predictions about people’s sexual orientation using Facebook friends as training data. Whereas the census analysis and the data mining tools could be considered academic exercises on datasets to which most people don’t have access, the MIT results have much more immediate and potentially damaging implications.

Continue Reading

Data Liberation: What do you Own?

on Comments (3)

Recently Google announced a new initiative: The Data Liberation Front:

The Data Liberation Front is an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products.  We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to “liberate” their products.  This is our mission statement: Users should be able to control the data they store any of Google’s products. Our team’s goal is to make it easier for them to move data in and out.

This is a fantastically worthy goal, and I whole-heartedly applaud it.  However, I am beginning to wonder: What data is yours to own, in the first place?

For example, consider web searching.  Continue Reading

The End of Summer, and Building Rome in a Day

on Comments (1)

Over the last few weeks we have been sad to see our great crop of summer interns leave. Yesterday, my intern, Kathleen Tuite, left, and also, coincidentally, Slashdot picked up a project related closely related to her graduate work. Check out the Building Rome in a Day website to see videos of point cloud models for landmark buildings. The system her colleagues at the University of Washington built makes these models from millions of photographs found on Flickr.

Also check out a very early version of Kathleen’s cool Photocity game that complements the Building Rome in a Day work. The game encourages people to take photos that will help fill in point cloud models, so the photos collected as people play her game will improve the Building Rome in a Day  results. Conversely, her project involves managing many more photos than an average Photosynth so she will be borrowing and building on the technologies developed by Sameer Agarwal, Yasutaka Furukawa, and the rest of the Building Rome in a Day team.

Digging in the past

on

Not content with reading about Cranfield experiments that defined the modern approach to information retrieval, William Webber is now looking to Ancient Rome for inspiration. Actually, he was just looking for images of the original Yahoo! home page, back when it was a directory rather than a search engine. He settled for an image from a 1998 article, but he can do better than that. The Internet Archive has a bunch of snapshots of the Yahoo home page, dating back to 1996.

The Internet Archive is a wonderful thing, and I am certain it’s good for many hours of procrastination. Here, for example, is the first capture of FXPAL’s home page, from mid-1998, complete with the spouting geyser of our ideas. Interestingly, of the seven people who have home pages listed on that site, four are still at FXPAL, but with considerably less hair.

Of course the past didn’t change much (and still doesn’t), so archiving it is straightforward. A bigger challenge is how to archive modern dynamic sites that are all loaded at run-time through javascript, Flash, or Silverlight. How will the Wayback machine handle the list of today’s world leaders, for example?

Feedback wanted

on Comments (2)

Over the life of this blog, we’ve been tweaking its appearance and functionality to make it easier to find information and to make it more user-friendly. In addition to the standard tags and categories, we’ve added lists of popular posts, recent comments, a way to share links to posts via various social media sites, etc. Recently, we’ve improved the way author pages are organized.

In the spirit of user-centered design, we’d like to know which features people find useful, confusing, unnecessary, missing, etc. If you have any suggestions for improving the usability of the site (sorry, improvements to the quality of posts are out of scope, but topic suggestions are welcome!), please let us know!

Contextualizing IR

on Comments (5)

In a recent post, Miles Efron proposed a distinction between different kinds of information retrieval: “macro IR” that concerns with generic tasks such as searching the web, and “micro IR” that represents more focused interaction. My sense is that one key distinction between the two is the degree to which the system represents the context of the search, and therefore is able to act on the results. Miles’ examples–finding restaurants, books, music, people–have a transactional quality about them. The system has a sufficient representation of the task to both structure the query in an appropriate manner (e.g., Yelp! metadata about restaurants) and to act on the selected result (e.g., offer to make a reservation). Macro IR, on the other hand, lacks a strong contextual representation, and leaves it to the user to act on the retrieved information.

Continue Reading

Perhaps they measured the wrong thing…

on Comments (3)

Ian Soboroff commented on yesterday’s blog post that although mental models were important, they were insufficient. He cited a paper that found that legal staff had experienced problems with using a full-text search engine to search (with a recall-oriented information need) a collection of documents in a legal discovery scenario. The paper concludes that coming up with effective keyword searches is difficult for non-search experts. The paper is interesting and worth reading, but I believe the authors conclusions are not warranted by their methodology.

Continue Reading

Search is not Magic

on Comments (7)

A discussion among commenters on a post about PubMed search strategies raised the issue of how people need to make sense of the results that a search engine provides. For precision-oriented searches a “black box” approach may make sense because as long as the system manages to identify a useful document, it doesn’t matter much how it does that. For exploratory search, which may be more recall-oriented, having a comprehensible representation of the system’s computations is important to assess coverage of your results. This suggests the need to foster useful mental models, rather than relying on the system to divine your intent and magically produce the “right” result.

Continue Reading