The Geomblog: arxiv

Showing posts with label arxiv. Show all posts

Tuesday, January 11, 2011

Are open tech report sites taking off in CS ?

For a while now, the math and physics have amused themselves by wondering why the CS community is slow to adopt the arxiv. In the past year or so, I've noticed an uptick in postings on the arxiv (especially around conference deadlines).

Prompted by David Eppstein's review of 2010 in cs.DS, I decided to get some stats on publication counts at the arxiv and ECCC for the past four years. My method:

go to arxiv.org/list/FIELD/YY (thanks, David)
Read off the total number of papers listed

For the ECCC, papers are numbered by YEAR-COUNT, so looking at the last paper published each year sufficed to get the count.

I did this for cs.{CC, DS, CG, LG} (LG is machine learning/learning theory)

Caveat: I ignored cross submissions, so there's some overcounting. I'm hoping that at least to determine trends this is not a major issue.

Here are the results:

Overall, it's clear that arxiv submissions in theory CS are climbing (and rapidly in the case of cs.DS), which I'm quite pleased to see. The growth rates themselves seem quite steady, so it's not clear to me whether the fraction of papers going on the arxiv is itself increasing (there's good evidence that the total number of papers people are writing in general is increasing).

Monday, June 07, 2010

Why double blind review occasionally annoys me.

Submit a paper to a conference that expects blind submissions
Resist the urge to place the paper on the arxiv, because of said blind submission policy, and the misguided belief that placing the paper online will violate the spirit of said policy
Watch as a stream of papers on conference topic magically appear on the arxiv.

Friday, January 08, 2010

Guest Post: Question on posting to the arxiv

ed. note: this post is by Jeff Phillips. For another recent post on arxiv publishing issues, see Hal Daume on the arxiv, NLP and ML.

It seems that over the last few months, the number of papers posted to the arXiv has been noticeably increasing, especially in the categories of Computational Geometry and Data Structures and Algorithms.

I have posted several (but not all) of my papers on the arXiv. I still do not have a consistent set of rule under which I post the papers. Here are a couple circumstances under which I have posted paper to the arXiv.

A: Along with Proceedings Version:
When conference version does not have space for full proofs, so in conjunction with proceedings version, post full version to arXiv. This is a placeholder for the full version until the journal version appears. Additionally, the arXiv paper can be updated when the final journal version appears if it has changed.

Sometimes, I link to the arXiv version in the proceedings version. This makes it easy for a reader of the proceedings to find the full proofs.

If more conferences move to the SODA model where proceedings versions can be much longer (~20 pages), then this situation may not often be necessary.

B: Along with Submitted Version:
When you want to advertise a piece of work, but it has only been submitted, post a version to arXiv. This is useful if you are giving talks on the work, and want a documented time stamp so you can't get scooped, or say, you are applying for jobs and want to make your work very available and public.

This is closer to the math philosophy where many (most?) people submit a version of a paper to arXiv as soon as they submit it to a journal. I think it would be great if CS adapted this policy, as it would be a lot easier to track results. I have a friend who as a math graduate student would start every day by perusing the dozen or so new arXiv post in his area and choosing one paper to read. He told me that almost every paper he read as a grad student was on the arXiv. Wouldn't a world like that be extremely convenient?

However, I have had an issue following this rule. Last year I submitted a paper to a conference and concurrently, submitted a longer version to the arXiv. The paper was unfortunately, not accepted to the conference. My coauthor and I extended the results to the point where it made sense to split the paper. Half was then submitted and accepted to another conference, and full proofs were made available through a tech report at my coauthor's institution, as he was required to do. The second half which has also been extended is now under submission.

I might like to post the (full) second half to the arXiv, but do not want to double the part from the previous post. I am not sure if it make sense to merge the papers at this point either. And I would also like to note on the arXiv page that that version has been extended and part appears as a tech report.

What is the proper arXiv etiquette for this situation?

The Geomblog

Pages