I’ve been going on and on in blog posts and in comments about the business of reviewing papers as a socially useful activity (given the right incentives) and how the reviews themselves should be rated to identify effective reviewers. The idea behind is not new—Amazon implemented something like this a long time ago—but it is useful to understand it better. This article by Jared Spool offers a good account of the history, the mechanics, and the effect.
After describing the response to the latest Harry Potter book (3,000 reviews, 4,000,000 sales), the article goes on to say that
Using the pattern we see frequently in web use, we can predict that the number of reviews that will get any votes follows a power-law distribution. This means that only a few will get a substantial number of votes (helped by the fact they’ll be promoted to the top). A handful will get a small number of helpfulness votes, but most reviews won’t get any.
Because of the power-law distribution, on sites with substantially lower traffic than Amazon, it’s very possible that the helpfulness question won’t be answered enough to be useful. That said, if it’s implemented like Amazon and reviews without votes avoid saying “0 of 0 people found this helpful”, there’s probably no harm in implementing it for lower volume sites.
The upshot is that we’d need to add a lot more drama and wizards into our papers to have the readership enjoyed by J. K. Rowling. On the other hand, even a small cumulative reward for reviewing (or even rating, a la SciRate) articles might induce a large enough fraction of readers to engage with the content for the reviews to have real value.
But let’s consider some other interfaces for commenting, in a more relevant context. Sites like BibSonomy, CiteULike, and Connotea are all designed explicitly to collect references to academic papers. These references (bookmarks on steroids) can be construed as votes for papers. The more votes, the more likely the paper is to be interesting or influential.
So how good are these tools? Are they used? Are they usable only as fancy bookmarks for the people who saved the bookmark, or is there benefit to others from this activity? How does the use of these social bookmarking sites compare with the way SciRate (and Eleanor) uses scites?
I compared the services for ease of using it as a voting mechanism, for the ability to add comments, and the ability to apply social search techniques like collaborative filtering. I left out sites such as del.icio.us, digg, and reddit because they were not geared primarily at scientific publication, but at web sites in general.
CiteULike There is implicit voting because each paper shows how many people saved a reference to it, and papers can be tagged to help with grouping and retrieval, but there is no obvious way to rank search results by the number of people who saved the paper. CiteULike does make it possible to identify the person who made the citation, and has some provisions for contacting the person via the “connect” button. CiteULike currently has about 2.8M articles, but I could not figure out how many people have contributed entries, or what the distribution of entries per paper is.
BibSonomy This site also keeps track of how many people marked each paper, and also allows keyword tagging. Furthermore, it has a notion of folkrank by which search results can be ranked, but I could not get it to work. I was expecting a list of papers matching my query starting with the most-often tagged paper, and going down from there. But the list was in no discernible order, and certainly not in an order correlated with the number of people who marked each paper. The “popular” menu yielded three small sets of papers deemed popular in the last seven, 30, and 120 days, and appeared to let you search the sets. Unfortunately, typing in a keyword switched the view to “search” and the notion of popularity as an organizing principle was dropped. Finally, I could not figure out how to get any contact information for an arbitrary user, but there are handy features for following a user’s additions, and for identifying groups of users with similar annotation patterns. I could not find any information about the number of papers in the collection or about the number of users contributing entries.
Connotea I could not evaluate it because the dynamic part of the site seemed to be broken for three days prior to this post.
SciRate Scirate can be used to vote for (scirate) papers, can be used to leave comments (although seemingly this practice is rare), and can be used to contact people based on their scirates. The collection that can be annotated is large, but consists predominantly of papers stored in arXiv.org, making it less useful for disciplines that do not (or cannot for copyright reasons) deposit their manuscripts into arXiv.
My impression overall is that these sites have value without needing to achieve the scale of Amazon and Harry Potter. Having better tools to sift through the collections, to use collaborative filtering to find interesting papers, and to indicate problems with papers as well as what’s good about them would make these sites more valuable.
Hi (from CiteULike)
I’d like to correct a possible misunderstanding – you said “CiteULike does make it possible to identify the person who made the citation”. I’m simplifying a bit, but all of the (public) data on CiteULike is sucked directly from the publishers’ websites, so no user “makes” the citation – that user was just the first to post it. A subsequent poster will get the same (untainted) data from the originating website – which user posted first is largely irrelevant. In fact, many articles are pre-loaded into CiteULike via RSS feeds from the publishers.
(This is not quite true, when you add an article to CiteULike directly from a website, you get an untainted version, but if you COPY another user’s entry, you’ll get any modifications he or she may have made.)
Fergus,
Thanks for clarifying! What I was trying to get at was the ability to identify other people who are interested in the same article that interests you so that, as Eleanor on an earlier post, you could contact them with ideas, discussion, etc. After all, that’s one of the powerful things about social media — the ability to find others with shared interests.
In reviewing these social bookmarking sites, I was struck by the difficulty in some cases of establishing a person’s interests or reputation. It seems to me that leveraging people’s bookmarking (which is done for personal benefit — being able to find the paper again) to benefit the group is a tremendous opportunity that has been under-utilized on CiteULike and on BibSonomy. Fortunately, the data are all there, so it should be possible to grow that capability and fine-tune the interfaces to facilitate the discovery of interesting papers and interesting people.
Looking forward to more improvements in the future!