TalkMiner update

on

Since its debut a few months ago, TalkMiner has been busily crawling the web and indexing all sorts of talks and lectures. In the mean time we engaging in some self-promotion. As the press release details, we’ve now indexed over 15,000 talks, so there is likely to be something for everyone here, whether you’re into 3D models, or big data.

So when you think about turning to YouTube for some lecture, think TalkMiner instead. And if you have any comments or content you’d like to have indexed, let us know.

Want to help make computer science history?

on Comments (2)

Scott Aaronson has been asked by MIT to put together a list of the top 150 events in computer science history as part of the celebration of MIT’s 150th anniversary. You can vote on the potential entries here (you will need to register by providing a login name, password, and e-mail address). For more information about the project, see this blog post which includes an early version of the list, and a more recent blog post of his on the subject.

I’ve mentioned some of Scott’s work before, in a post about classical computer science results inspired by quantum information processing, and in a post on an overview of  quantum computing for technology managers I wrote a couple of years ago. His results don’t make it into the top 150 computer science results of all time, but are good candidates for a list of the top 150 results of the last decade.

A magical way to learn computer science

on Comments (1)

Former FXPAL intern Jeremy Kubica’s Computational Fairy Tales is a fresh new entry into the blogosphere that introduces a unusual way to learn computer science: read a series of charming fairy tales. Each post contains a few sentences of introduction to a computer science concept followed by a fairy tale illustrating that concept.

I particularly enjoyed Loops and Making Horseshoes which illustrates Continue Reading

Are device drivers your kind of thing?

on

FXPAL would like to hire a summer intern who wants to work on Android internals. In particular, we are talking with a vendor for an innovative pen/touch interface sensor. We are exploring how to effectively support pen and multi-touch interface controls on forthcoming tablets using this sensor. While Android (Gingerbread) has some interesting touch events, there are some things this hardware provides that are not reflected through to the Android event system. Depending on exactly what happens in Honeycomb, we are thinking about modifying the device driver and event mapping code to show through some additional device information. This information would then be reflected in an application we are developing that combines pen and touch inputs in novel ways.

While it would be nice to find someone with Android source code experience, we would be happy to offer an internship to someone who was an experienced OS person (Unix?) who is interested in learning Android.

Please see the FXPAL internship page for more information about applying, and do not hesitate to ask me any questions you might have about this project. Please disregard the January application time-frame.

Released: Reverted Indexing source code

on Comments (1)

I am pleased to announce that we are releasing a version of the reverted indexing framework as open source software! The release includes the framework and an implementation in Lucene.

Reverted indexing is an information retrieval technique for query expansion, relevance feedback, and a variety of other operations. The details are described on our web site, in several posts on this blog, and in our CIKM 2010 paper. The source code and JAR file can be downloaded from Reverted Indexing page; see the Javadocs for details of the API.

Continue Reading

Obviously wrong

on Comments (3)

So Microsoft is suing Barnes & Noble for patent infringement. Well, that’s what patents are for: the right to sue. And that’s what licenses are for: the right to avoid getting sued. The only thing is, if you’re going to sue someone with half a brain, you should at least make sure your patent is reasonably solid.

With that said, one of the patents that Microsoft claims that Nook is violating deals with annotating documents, an area I know a bit about. The patent, filed in December 1999, claims a system and method to associate annotations with a non-modifiable document. The idea is that file positions in the document associated with user-selected objects are used to retrieve annotations from some other location, and to display these annotations for the user.

Sounds obvious, no? So obvious, in fact, that when we built such a system in 1997, we didn’t bother patenting this.

Continue Reading

Eye tracking on a laptop

on Comments (1)

Tobii and Lenovo presented a laptop with a built-in eye tracker at CeBit last week. The eye tracker allows the user to control the laptop, for instance selecting files to open and selecting active window from an expose like view. Engadget have a video of a demonstration of the eye control on the laptop here. I wished I could get my hands on it for some testing. A laptop with a built-in eye tracker certainly has potential, from making eye tracking easier and more flexible for disabled and making usability testing using eye tracking more flexible allowing the usability specialist to move from their labs to the field.

Continue Reading

Recommendations needed

on Comments (7)

In one of our research projects, we are trying to compare some alternative algorithms for generating recommendations based on content similarity. As you might expect, we have some data we’re playing with, but the data is noisy and sometimes it’s hard to make sense of the variability: is it due to noise in the data, or is the algorithm trying to tell us something?

So my thought was to break the problem into two parts: first deal with our algorithms on known data, and then apply the results to the new, noisy data to see what’s there. My purpose in writing this post is to solicit suggestions about which publicly-available data we should be using.

Continue Reading

Looking for an HCIR intern

on Comments (1)

It’s intern time again! I am looking for someone to help me run an exploratory study of a collaborative, session-based search tool that I’ve been building over the last few months. Session-based search frames information seeking as an on-going activity, consisting of many queries on a particular topic, with searches conducted over the course of hours, days, or even longer. Collaborative search describes how people can coordinate their information-seeking activities in pursuit of a common goal.

The intern for this project will help frame a set of research questions around collaborative, session-based search, and then take the lead on an experiment to gain insight into this rich space and to help understand how to improve our search tool. The intern will also participate in writing up this work for publication at a major conference such as CHI, CSCW, JCDL, etc.

Continue Reading

When is one>two and seven==eight?

on Comments (1)

So Google recently released the Google books N-gram viewer along with the datasets.

There’s been plenty of press about it, and the Science article based on this data is an interesting read.

I was trying to come up with a simple, yet insightful query. My initial trial was modernism,postmodernism which immediately had me wondering about hyphenation or the lack thereof…  In any case, the upshot seems to be that the use of the term postmodernism started 1978ish. Neat, though I think I won’t need to clear space for my Nobel Prize anytime soon.

I toyed a little bit with other terms like generation X which has an odd sort of bump in the graph around 1970. Not sure what’s up with that, though perhaps there’s some data collection artifacting as discussed in this article.  I wasn’t inclined to deep end on this and was happy enough to have my prior knowledge confirmed by noting that the use of “generation X” took off in the mid 1990’s.

My final trial was a bit more on the minimal side: one,two,three,four,five,six,seven,eight,nine,ten. There shouldn’t be any surprise here that “one” is more common than “two” is more common than “three”, is more common than “four”. It probably shouldn’t be a surprise that each succeeding number is less frequent by roughly a factor of 2.

Occurence of numbers in google books N-gram viewer

Google books n-gram viewer for numbers

Less intuitive (to me anyway) is that “ten” squeezes in front of “seven” and “eight” (OK, so maybe it’s a round number), “seven” and “eight” are basically tied, but even more odd is that before 1790 or so, the putative occurrence of “six” and “seven” were virtually non-existent.

Detail on number occurrences

Turns out it appears to be the same issue with the “medial S” that Danny Sullivan describes in greater detail in his post. In other words, it’s an artifact of OCR and an indication of the evolution of typography rather than the evolution of language.

One mystery solved; now why are “seven” and “eight” tied in frequency?

Kudos to Google for releasing the viewer and data.