Do not try this at home: playing with arXiv.org

on

Having written about reforming the academic publication process, and having suggested that arXiv.org be used to archive workshop papers for HCIR’09 (and ’07 and  ’08), I decided to upload (with the authors’ permissions) papers from the JCDL 2008 workshop on collaborative information seeking that I co-organized last year with Jeremy Pickens and Merrie Morris. I read the info on the arXiv.org site and decided to give it a shot. It turned out to be less straightforward than one might imagine.

The process (ongoing as I write this) consists of the following steps:

  1. Read the instructions for uploading papers.
  2. Somehow stumbled on the page for conference proceedings (and workshops, I presume).
  3. Created an HTML file that represents the contents of the workshop. In addition to the intro material that I wanted to be visible when the page contents were displayed, I added a link for each paper, according to the example given in the instructions. This involved making up unique ids for each paper that I then needed to upload separately; more on that later.
  4. Slogged through the submission form. This is a three step process, each step littered with dire warnings.  In the end, this produced my first success. I received two confirmatory emails, each full of more caveats, saying that my submission was accepted, and would be published tomorrow afternoon.
  5. I then proceeded onto the workshop papers, repeating steps 3 and 4, but with PDF files rather than with the HTML.
  6. The first file I tried worked, and I quickly had my first paper uploaded and the linked the conference proceedings page.
  7. I had no such luck with the second file I tried to upload.  It turns out, however, that for reasons comprehensible only to mathematicians and physicists, I could not upload a perfectly good PDF file that had been submitted by its authors to the workshop because the file was generated–gasp!–in LaTex. Apparently, arXiv requires the LaTex source, from which it generates the PDF.
  8. I contacted the author, and shortly thereafter received a ZIP archive with all the files required to generate the PDF, which I uploaded instead of the PDF. Was I done?
  9. No! The LaTex parser complained about a missing .bbl file in the presence of a ref.bib file. Having no experience with this stuff to speak of, I reported this result to the author, who suggested that I rename a .bbl file to ref.bbl, and try again.
  10. Success? Not exactly: the file was parsed and uploaded, and a PDF was generated, but it contained no references. Sigh.

I am now in the process of soliciting all the LaTex sources from my workshop participants so that I can appease the arXiv gods. On the other hand, I am tempted to print the damn things out and scan them on my nice color scanner that generates PDF, and upload that. Crude? Yes. Effective? Ought to be.

The upshot is that arXiv.org is an interesting idea that I will continue to explore, and will probably make use of for other workshops, but it is not for the faint of heart. I am not sure that the complexity that is exposed in their web interface is warranted, and I am certain that no HCI person was ever allowed within a one mile radius of the team that built this tool. To foster adoption by authors in other disciplines, my recommendation is to build an upload client that manages the complexity, runs the contents checks before uploading, and generally strives to prevent the errors that the web site so frequently admonishes the user not to make.

Oh, and they should add reader comments to the web site.

Update: A paper was rejected for being 3060kb in size, in excess of their 3000kb threshold. I am appealing.

Update: The LaTex problem was solved by editing the tex file and placing the bibliography inline.

Update: Another LaTex submission required me to edit the image references to add the .jpg extensions which apparently were optional on the system that the author used.

Share on: 

1 Comment

  1. […] FXPAL Blog On technology and beyond! « Do not try this at home: playing with arXiv.org […]

Comments are closed.