Blog Archive: 2009

Data Liberation: What do you Own?

on Comments (3)

Recently Google announced a new initiative: The Data Liberation Front:

The Data Liberation Front is an engineering team at Google whose singular goal is to make it easier for users to move their data in and out of Google products.  We do this because we believe that you should be able to export any data that you create in (or import into) a product. We help and consult other engineering teams within Google on how to “liberate” their products.  This is our mission statement: Users should be able to control the data they store any of Google’s products. Our team’s goal is to make it easier for them to move data in and out.

This is a fantastically worthy goal, and I whole-heartedly applaud it.  However, I am beginning to wonder: What data is yours to own, in the first place?

For example, consider web searching.  Continue Reading

The End of Summer, and Building Rome in a Day

on Comments (1)

Over the last few weeks we have been sad to see our great crop of summer interns leave. Yesterday, my intern, Kathleen Tuite, left, and also, coincidentally, Slashdot picked up a project related closely related to her graduate work. Check out the Building Rome in a Day website to see videos of point cloud models for landmark buildings. The system her colleagues at the University of Washington built makes these models from millions of photographs found on Flickr.

Also check out a very early version of Kathleen’s cool Photocity game that complements the Building Rome in a Day work. The game encourages people to take photos that will help fill in point cloud models, so the photos collected as people play her game will improve the Building Rome in a Day  results. Conversely, her project involves managing many more photos than an average Photosynth so she will be borrowing and building on the technologies developed by Sameer Agarwal, Yasutaka Furukawa, and the rest of the Building Rome in a Day team.

Digging in the past

on

Not content with reading about Cranfield experiments that defined the modern approach to information retrieval, William Webber is now looking to Ancient Rome for inspiration. Actually, he was just looking for images of the original Yahoo! home page, back when it was a directory rather than a search engine. He settled for an image from a 1998 article, but he can do better than that. The Internet Archive has a bunch of snapshots of the Yahoo home page, dating back to 1996.

The Internet Archive is a wonderful thing, and I am certain it’s good for many hours of procrastination. Here, for example, is the first capture of FXPAL’s home page, from mid-1998, complete with the spouting geyser of our ideas. Interestingly, of the seven people who have home pages listed on that site, four are still at FXPAL, but with considerably less hair.

Of course the past didn’t change much (and still doesn’t), so archiving it is straightforward. A bigger challenge is how to archive modern dynamic sites that are all loaded at run-time through javascript, Flash, or Silverlight. How will the Wayback machine handle the list of today’s world leaders, for example?

Feedback wanted

on Comments (2)

Over the life of this blog, we’ve been tweaking its appearance and functionality to make it easier to find information and to make it more user-friendly. In addition to the standard tags and categories, we’ve added lists of popular posts, recent comments, a way to share links to posts via various social media sites, etc. Recently, we’ve improved the way author pages are organized.

In the spirit of user-centered design, we’d like to know which features people find useful, confusing, unnecessary, missing, etc. If you have any suggestions for improving the usability of the site (sorry, improvements to the quality of posts are out of scope, but topic suggestions are welcome!), please let us know!

Contextualizing IR

on Comments (5)

In a recent post, Miles Efron proposed a distinction between different kinds of information retrieval: “macro IR” that concerns with generic tasks such as searching the web, and “micro IR” that represents more focused interaction. My sense is that one key distinction between the two is the degree to which the system represents the context of the search, and therefore is able to act on the results. Miles’ examples–finding restaurants, books, music, people–have a transactional quality about them. The system has a sufficient representation of the task to both structure the query in an appropriate manner (e.g., Yelp! metadata about restaurants) and to act on the selected result (e.g., offer to make a reservation). Macro IR, on the other hand, lacks a strong contextual representation, and leaves it to the user to act on the retrieved information.

Continue Reading

Perhaps they measured the wrong thing…

on Comments (3)

Ian Soboroff commented on yesterday’s blog post that although mental models were important, they were insufficient. He cited a paper that found that legal staff had experienced problems with using a full-text search engine to search (with a recall-oriented information need) a collection of documents in a legal discovery scenario. The paper concludes that coming up with effective keyword searches is difficult for non-search experts. The paper is interesting and worth reading, but I believe the authors conclusions are not warranted by their methodology.

Continue Reading

Search is not Magic

on Comments (7)

A discussion among commenters on a post about PubMed search strategies raised the issue of how people need to make sense of the results that a search engine provides. For precision-oriented searches a “black box” approach may make sense because as long as the system manages to identify a useful document, it doesn’t matter much how it does that. For exploratory search, which may be more recall-oriented, having a comprehensible representation of the system’s computations is important to assess coverage of your results. This suggests the need to foster useful mental models, rather than relying on the system to divine your intent and magically produce the “right” result.

Continue Reading

What a difference 200 years makes

on Comments (1)

Recently, I had an opportunity to see the Babbage Difference Engine No. 2 (serial #2) in action. It’s an impressive piece of machinery, weighing in at about five tons, consisting of 25,000 parts. Mostly metal. It’s on display at the Computer History Museum in Mountain View through December, when Nathan Myhrvold takes it home and installs it in his living room, next to the T-Rex. Babbage built a few smaller models, but never saw the completion of the project after a falling out with his master builder and subsequent loss of funding from the government. Still, he had something like 12 years of funding to attempt to build the device. (He also made money on other inventions such as the cowcatcher at the front of steam engines.)

The Science Museum in London built Difference Engine No. 2 serial #1 in the late 1980s to commemorate the 200th anniversary of Babbage’s birth.

Front view showing the registers

Front view showing the registers

Continue Reading

Tree-books to e-books

on Comments (3)

I recall from my youth in the Soviet Union a series of jokes structured around a fake talk radio call-in show. One example stuck with me:

Q: Is it possible to create a Communist regime in an arbitrary country? Say France, for example.

A: In principle, yes. But what has France ever done to deserve that?

I was reminded of this joke by a recent article describing how a school would be replacing its library with electronic devices. The plan is to replace the stacks with three large monitors, “laptop-friendly” study carrels, and 18 e-book readers (Amazon Kindles and Sony eReaders). They are also planning to replace textbooks with electronic versions, at least in math, and possibly in other subjects as well.

I can see many problems with this vision of the future of reading based on the notion that books are an outdated technology. I’ve written about e-books before (and I am still fond of the research we did in this space), and I find myself wondering about the wisdom of this venture by the headmaster of Cushing Academy.

Continue Reading

Open-access publishing

on Comments (2)

Laurent and I recently published an article (SeeReader: An (Almost) Eyes-Free Mobile Rich Document Viewer) in the special issue on Pervasive Computing in the International Journal of Computer Science Issues (IJCSI). The IJCSI is open-access, meaning that the content is not hidden behind a paywall. Open-access journals are still seen as dubious by many, and perhaps rightly so. These journals are universally new and tend to enjoy less prestige and quality than mainstream journals. In return, though, they offer fast turn-around times and wide indexing.

Continue Reading