The Geomblog

Wednesday, August 25, 2010

Latest news from the unnamed-but-working-on-it theoryCS Q&A site

4 days in (public) beta, and the theoryCS Q&A site (or as someone described it, the 21st century Usenet) is chugging along. There are nearly 600 users now, with nearly 300 "active" users - people who've posted, asked a question, commented, etc. Since there isn't a way to broadcast to the user community directly, I'm taking the liberty of doing it here.

If you're enjoying using the site, please visit the meta site (click the small 'meta' on the top banner). If you don't like how the site is working, please visit the meta site. In both cases, you'll find important discussions about how the site is working and should work, and your input will shape how it evolves. For example,

What's the appropriate level for questions on the site ?
How should questions be tagged (for easy searching and indexing)
Who should be moderating ?
Which questions should be closed (deactivated) if any ?

and almost as important:

What should the site name be ?

Another reminder: please use your votes liberally. In fact, if you think a question is worthy of being read, it probably deserves an upvote (and the same for an answer). Frequent voting makes the whole system run smoothly.

And now, back to your regularly scheduled theoryCS programming: some interesting recent questions for your entertainment (please comment at the question links, rather than here):

Are there good constructions of space-efficient expanders ?
Can you design a heap with a stable ExtractMin operation (without using time-tags)?
When is polynomial testing equivalent to polynomial evaluation ?

Monday, August 23, 2010

cstheory Q&A site now open to the public !

The theoryCS Q&A site, in private beta the last seven days, is now open to the public. Please visit, ask questions, answer them, and hopefully we can make this a valuable resource for the community. There are currently over 100 questions, 300 answer, and 300+ users ! Some of the questions that I thought were quite interesting:

Analogs of compressed sensing
The importance of Integrality Gap
Surprising Results in Complexity
How hard is unshuffling a string (now with a bounty!)

If you have questions about the site and the protocols for questions, please visit the meta site. There, a bunch of users are trying to flesh out what will be the policies, as well as elect temporary moderators, and so on.

Monday, August 16, 2010

cstheory q&a site now open !

After a long wait, the theoryCS q&a site is now open. It's in private beta for the next 7 days, so if you didn't commit already, you'll have to wait till next Monday.

Please visit the site, ask your questions, and answer other questions. Some points to note from experience with Mathoverflow:

If you're so inclined, please participate in the meta site as well. That's the place for discussions about the site itself, and is a good place to get involved and help shape the community.
Please vote liberally and often. Especially in the beginning, voting questions and answers up when appropriate helps encourage others to stick around and participate more. If a question seems interesting enough to read, it deserves an upvote !

Saturday, August 14, 2010

Social theoryCS...now 98% better...

Whatever you may think of the P vs NP hullabaloo over the past week, it has to be said that as a social experiment in peer review, this was an exhilarating and magnificent process to watch. There's enough chatter in the media about how this might point the way forward for peer review, and I'm skeptical of drawing too many conclusions from this one experiment with arguably the most important problem in our field.

But I'm encouraged by both the level of participation, and the enthusiasm it has generated about the field. I can't troll through the over 800 comments to find them, but I read numerous comments about how watching this process unfold made the commenter ever more excited about getting into theoryCS. Bob Sloan made a brief cameo to point out the value of this discussion as an outreach tool.

This is a long way of coming around to the news that the theoryCS Q&A site is now at 98% commitment, which means that some time in the next few days it'll go into beta. The beta stage starts with a 'private' seven day period, in which only those who committed (and therefore have promised to answer/ask a total of 10 questions over the entire beta duration of 90 days) get to use the site. After that seven day period, the doors will open for any interested participants.

The initial seven day period will have strong administrative elements (seeding the site with questions, discussing meta issues about the scope of questions etc), and while no major issues will be resolved, you can have some influence about how the forum gets shaped if you get in early. So if you haven't committed yet, do so, otherwise you'll have to wait the 7 days to get in.

Wednesday, August 11, 2010

P vs NP: What I've learnt so far...

(ed: note. it is somewhat embarrassing to be categorized as someone "trying to unravel what is up with the claimed proof". All I've really helped with is getting the wiki set up, and I'm very impressed with the amount of work that everyone is putting into keeping it current)

Whether or not Deolalikar's proof turns out to work, I have to say that in a span of three days, I've learnt a heck of a lot about the individual components in his paper.

Like everyone else, I learnt my descriptive complexity back in the day, and marvelled at the fact that P, NP and PSPACE could be characterized syntactically. The discussions about the subtleties of the order relation in LFP have been fascinating though, and while the issue remains unresolved, there's a wealth of new(er) material that I've now been exposed to.
Similarly, while I was familiar with the idea that hardness could be captured as a phase transition in random k-SAT, the bizarreness of the landscape (clustering! freezing! condensation!) was quite new to me, and it's been illuminating to learn from the experts.
The relationship between the solution space for k-SAT and error-correcting codes is also something that I was not aware of. And it might be the source of yet another issue with the paper.
A point made by Terry Tao is even more intriguing:

If nothing else, this whole experience has highlighted a “philosophical” barrier to P != NP which is distinct from the three “hard” barriers of relativisation, natural proofs, and algebraisation, namely the difficulty in using average-case (or “global”) behaviour to separate worst-case complexity, due to the existence of basic problems (e.g. k-SAT and k-XORSAT) which are expected to have similar average case behaviour in many ways, but completely different worst case behaviour. (I guess this difficulty was well known to the experts, but it is probably good to make it more explicit.)

There's been a bit of back-and-forth on whether "too much time" is being spent on perusing this paper, and of course people are entitled to their own opinions on this matter.

My take is slightly different. While none of the "revelations" listed above are new to people in the relevant areas, they're certainly new to me, and given how unlikely it is that I'd be able to pigeon-hole the relevant experts one-on-one to explain these ideas to me, the discussion is serving an important purpose. In effect, it has created a virtual workshop with even more discussion than a real one !

We've spent a lot of time over the past few years bemoaning the fragmentation of theoretical computer science, and have spent many blog-hours arguing over how to make STOC a destination conference for theoreticians again. But what the discussion over this paper has done is bring together experts in very different areas to talk with each other and exchange ideas, results and the crucial "mental frameworks" that are usually impossible to share except in oral discussions.

And that, in my view, is a net plus regardless of what happens with the proof.

Tuesday, August 10, 2010

A 'polymath' home for analysis of the Deolalikar proof

In collaboration with the Polymath project (and many thanks to Terry Tao for making this happen, and putting in lots of editing work in short order), there's now a new wiki to collect various threads of the discussion on the Deolalikar P != NP paper (which has been updated). With a lot of help from Terry, I've seeded the wiki with the initial discussion points that have come up. Please continue to edit this page as new ideas come up, or there's other discussion of the paper.

This wiki replaces the Google doc that I had originally set up: it's a lot easier to edit, and Google was struggling under the weight of all the people trying to view the file.

The Lipton/Regan posts will continue to host the main research threads. If you are so inclined, you can use this post for "meta"-discussions as is customary in polymath land. Content will be transferred over to the wiki periodically, and more discussion threads can be forked if things start getting unwieldy (the first Lipton post is pushing 117 comments already).

Monday, August 09, 2010

On the Deolalikar proof: Crowdsourcing the discussion ?

Update: the google doc has been replaced by a wiki. Please read this post.

By now everyone knows about the new proposed proof that P $\ne$ NP by Vinay Deolalikar. Better minds than mine are on the case right now, and little snippets of questions are coming out in comments (especially on Dick Lipton's original post).

It's a little hard to keep track of all the discussions, so I thought I'd try a crowd-sourcing experiment. At this link is a google doc that contains some of the initial discussion topics being raised around the proof. It's crudely organized, with a section for each major set of issues, and paragraphs for each person's comment, with sourcing via URL.

It's editable, so here's the idea. As you see issues get raised and resolved, enter that into the doc. The following rules apply:

No discussions: this should be comment/resolution only !
No personal, or nontechnical commentary. This is purely for discussion of the mathematics involved, and nothing else
Add references (urls, or whatever) whenever possible, and quote all context so as not to cause miscommunication.

The goal is to collect the discussions in one place. Ideally, this would be some kind of polymath setup, and maybe it can be converted over to that (if someone does set it up that way, I'll add a link here).

Let's see if this works !

Thursday, July 08, 2010

TheoryOverflow: An Update

Now that I'm done with SODA (any word on number of actual submissions?), I thought I'd do a brief update on the proposed theoryCS Q&A site (currently laboring under the inelegant name of TheoryOverflow). I've had various questions from people, and have wondered about others. So rather than force you all to trawl meta.stackoverflow.com for answers, I thought I'd dish them out here.

I want my TO site ! When is it coming ?

The site is going through what is called a 'commit' stage. The StackExchange overlords (the people providing the service) have in their infinite wisdom decided that the success of such sites is predicated on people committing (with their names and email) to participating, and so we're slowly building up enough commit mojo to get into a trial stage
But we have more commits than the average attendance at STOC !
Ok. firstly, that's rude ! Secondly, there's a complicated formula by which the system determines whether we have enough commits to move into a beta stage. it's also being throttled right now at 90% of its true value because the software for making the sites is still being tested on the few sites that are already in beta.
At this rate (248 commits, 27% progress), we'd need 1000 commits to get to beta. That's larger than the attendance at STOC, FOCS and SODA put together !

Say after me three times: theory conferences are all fine, and we don't have to do anything to change anything about them. Feeling better now ? So the process by which commit progress is calculated is very complicated, involving your reputation on related sites, the number of "badges" you've earned, how friendly you are with Lady Gaga, and other such mysterious parameters. So while there's some correlation with the number of commits, high-reputation users on related sites can bump up the progress meter, under the rationale that people used to Q&A sites are more likely to help this site get off the ground.
But my mathoverflow rep doesn't appear to count ?
No it doesn't. there appear to be technical complications involved in this, because MO used an older version of the software. I personally question this approach to the commit phase, but it's not my software.
This is silly: why can't we all just go over to mathoverflow ?

This is a tricky point. The MO folks have been more than welcoming to theoryCS, and they've explicitly said that they'd be happy to have us over at their site. While I personally would not mind at all if we just started using MO more regularly, I feel that we might get drowned out in the much larger universe of MO, and if we can get our own site, it might be easier to navigate and find interesting questions/answers. But it's always an option.
We're computer scientists, aren't we ? why can't we roll our own ?

Um, well, ... we're THEORETICAL computer scientists, so theoretically we could. Oh wait, you actually want us to do it now ? how annoyingly practical !

More seriously, there is actually an open source system called OSQA that's powering the new machine learning Q&A site called metaoptimize. I didn't hear about this till a few days ago, but it's definitely something that we can consider if the theoryoverflow site appears to stall.
So what now ?

For now, we (that means YOU!) continue to drum up support and commits for theoryoverflow. In terms of average number of commits/day, we're still doing quite well, and the StackExchange folks have indicated that they're likely to change the formula for commit progress based on how the sites already in beta are doing. If a few more weeks go by and things look stalled, we can revisit the matter.

Saturday, July 03, 2010

Tools every modern conference needs

(while I procrastinate on my SODA edits)

Crowdvine: an all-encompassing social network solution for conferences that includes schedule planning, networking, activity monitors (who's coming to my talk) etc
A paper discussion site (could be within crowdvine, or even something like Mark Reid's ICML discussion site)
A good paper submission/review manager like HotCRP (not CMT !)
Videolectures.net
Facebook/twitter/blog for official announcements.
At the very least, a website with all the papers for download. If necessary,zip or torrent links for downloading proceedings (especially if videos are involved).

And no, we did none of these for SoCG 2010. But a guy can dream, no ? While I'd be shocked to see crowdvine at any theory conference (price, culture, you name it), I think videolectures.net would be a valuable resource, and many of the rest require time but are otherwise free.

Monday, June 28, 2010

TheoryCS discussion site: A call for help.

Many of you are familiar with Mathoverflow.net (MO), the discussion site where (mostly) professional mathematicians come to discuss research problems, get questions answered by experts in the area, and have very enlightening discussions on a host of topics in mathematics.

The conversation is at a very high level. The moderators (and the community at large) do an excellent job of pruning out the posters fishing for homework answers, the vague questions with answers that any google search could return, and off topic questions that are better served by a more amateur-centric forum. What remains are questions that are at least at the advanced graduate student level, and that frequently spur open-ended discussion among the advanced researchers in the area.

It's a dream site: it has much less clutter than sci.math (or comp.theory for that matter), and because of modern tagging and filtering technology, is a lot easier to navigate than the old Usenet.

While theoryCS questions are welcome at MO, they sometimes get neglected for a lack of critical mass in the audience. Many in the theoryCS community participate in MO, but it would really be ideal to have our own discussion site that's structured along the same lines with the same intent:

to be a site for professional researchers to come together and ask/answer questions about research.

If you're a student, this could be an invaluable resource to learn about different aspects of TCS, and get to know researchers in the area. If you're an active researcher, this is a great place to post queries about topics that you might not be familiar with but need in your own work.

We're in the process of creating such a site, courtesy of Anand Kulkarni (aka polybot). Stack Exchange is the software system and the organization that maintains discussion sites like this, and has built a system to create, discuss and vet sites to ensure that they have enough critical mass to sustain themselves.

The theoretical computer science site is here. The way it works is as follows:

We start with a 'define' phase with prototypical questions that are right and not-right for the site. This has been completed, with a sufficient number of questions that people have voted for.
We move on to the 'commit' phase, where we are now. Here, we need commitments from people (with their actual names attached) that they will participate in the site once it goes into beta testing. We have a number of commitments so far, but we need much more in order to move to phase 3, which is
Beta testing, where the site goes active and we see if we can sustain it with questions and discussions for a while. If so,
The site gets created.

This is a great opportunity to create a site that can truly be our own, and that would be a go-to location for all aspects of theoretical computer science, whether it be algorithms, complexity theory, quantum computing, game theory, programming language theory, logic, or whatever. Tagging and filtering means you don't have to get swamped in a sea of posts on topics you don't care about, and if you've used MO, you know how easy it is to have the site only display topics that are relevant to you.

What should you do ? Go to this site, and commit to participating in the beta. Committing costs nothing - it's literally one click after you authenticate. Then all you have to do is spread the word so we get enough people to move to the beta phase.

If you're skeptical about this (or maybe have wasted too much time on comp.theory in years gone by knocking down P vs NP cranks), do go to MO and see what a theoryCS site could become.

And if you're a theoryCS blogger with your own following, please spread the word ! you all know who you are :)

p.s there's some confusion about exactly how many people are needed to get to the next phase. The answer is that it's not based on numbers, but based on the reputation of the people committing (as measured by their activity on related sites - but not MO sadly). Most folks in our community are unlikely to have large reputation in the related (mostly programming) websites, so we'll need a good number of people (probably in the low 100s) to get there. Hence this appeal (note that if you commit and then use the 'share' link to get someone else to sign on, your "reputation" increases, and that improves the overall progress of the site in a cascading effect)

Metrics via pullbacks

One of the things I end up having to do a lot of is ponder distance metrics. Not the nicely behaved norm-induced ones, but more bizarre entities. metrics over shapes, metrics on lattices, metrics on distributions, and the like.

There are many constructions for building metric structures on spaces for which it's not clear how to do so. One of the neatest methods is via pullbacks, exploiting the algebraic and continuous duals for vector spaces.

The basic idea is as follows: You want to build a metric on some (usually ill-formed) vector space V. Fortunately for you, the space V* of linear functionals over V is better behaved. Even better, you can define a norm on V*. This allows you to do a little magic.

Define a function || || on V as ||x|| = sup f(v), over all f in V*, where ||f|| <= 1. This is of course the "dual" norm. It can be shown that it indeed satisfies the properties of a norm. Once you have a norm, then d(x,y) = ||x -y ||. Voila !

These types of constructions are particularly useful when dealing with distributions (the Schwarz kind) and their geometric generalizations, the currents (which are a measure-theoretic way of defining surfaces). Distributions can be nasty - you can only interact with them via their linear functionals (the space of smooth functions with compact support). But this construction allows you to put nice metric structures on them.

Some examples of metrics arising in this manner:

The l_1 distance between probability measures (or the total variation distance)
The earthmover distance between probability measures (this is quite nifty)
The current distance (between measures, or between currents).

Sunday, June 27, 2010

And for some Sunday entertainment

(dare I say XKCD-style) Flowcharts for the life of the tenured and untenured professor. A collaborative School of Computing effort between my colleagues John Regehr, Matthew Might and myself (who says professors can't collaborate inside a department!).

Incidentally, we also make up the vast majority of our department's blogging presence.

Friday, June 25, 2010

Bad Research As Spam

Jon Katz and Matt Welsh have both written recently about the problems of dealing with crap papers, mainly the pain of having to review them. In an unrelated event, I got into a discussion with some colleagues about the problems of "crap research', and ended up formulating a theory: viz,

bad research is like spam email

in the sense that

There seems to be no way to stop it, and many economic incentives to continue it
You can't stop it by diktat
The only way to deal with it appears to be effective filtering. Like spam, bad research has less of an effect when it gains no attention.

There are other similarities:

we can block spam by filtering certain domains. We also tend to ignore certain kinds of conferences
we can block spam by blocking certain email addresses. We also might ignore certain researchers, or at least downweight their work after a series of bad experiences.
More explicit spam blocking policies create a false-negative problem. False-negatives are also a big problem in research.

But this analogy also suggests that we shouldn't be designing strategies to eliminate bad research. We should be designing better methods to focus attention on good research, via better filtering and highlighting mechanisms (social, authoritative or otherwise).

Personally, I think bad research is less of a problem than lots of cargo-cult research, where it looks a lot like good research is being done, but when you look closely, you realize that nothing of value has been added to the universe. Sadly, a lot of funded research is also like this.

PS: False negatives are a highly underrated problem in research. I think we vastly overestimate our ability to judge what kinds of research will have lasting value and what potential impact a paper can have. So it's important to err on the side of being more generous to papers, rather than less.

Wednesday, June 23, 2010

The Future of STOC

Lance has now created a blog, 'The future of STOC', where he's been posting responses from people who were asked to comment on the original proposal for modifying STOC. The responses are almost unanimously negative and make for interesting reading.

My response is linked there as a PDF: I thought I'd place the main text here, just to solicit some comments. After writing this, I've done some digging for data (which will hopefully appear in a post shortly) that brings up an interesting angle to the 'accept more papers' discussion.

My response: This is a terrible way of solving the problem, because...

There are better solutions!

It is puzzling to me that we're jumping straight to an alien (to us) conference model, when there are proven hybrid conference models that exist within the larger (conference-driven) computer science community. ICALP (theory), SIGMOD, VLDB and ICDE (the top three database conferences), ICML and NIPS (the top machine learning conferences), KDD and SDM (the main data mining conferences), SOSP (the main OS conference), and (to a degree) SIGCOMM and INFOCOMM (networking) all have the following model:

A main research ``track'', with peer reviewed papers. Optionally, an ``industrial'' track for more applied work.
A set of workshops, for which topics are solicited via a call for proposals.
A set of tutorials, for which again topics are solicited via 5-page abstracts and reviewed. Panel discussions and demos (optionally)

The conference itself is a 5-day event, with research and industrial tracks, panels, and tutorials, all blended together (workshops either bracket the conference, or are interleaved). I've been to many of the conferences listed above, and I can assert that I've met many people not originally of the community that attend because of these satellite events.

Other variants include limiting the number of actual talks, while still including all accepted papers in the proceedings (the remainder of the papers are presented as posters). This opens up more time during the day for satellite events as well.

Note: I've begun noticing more tutorial events at STOC (STOC 2010 has a full day of these). This definitely is progress, although I believe that workshops draw more attendance. I also think it's important to solicit these from the community, rather than having the PC decide topics. Doing so both increases participation and increases the sense that the community is part of the whole conference.

Math meetings are fractured

I've never attended an AMS meeting myself, but from all accounts (and I have heard MANY), they are highly fractured. The meetings draw attendees from the spectrum of mathematical areas, but they all essentially group together in tiny miniconferences - I've heard numerous stories of multiple rooms in which the only people listening to a talk are the five other speakers and a few graduate students. This is not the way I want to see our flagship theory conference evolve.

People won't come

For better or for worse, we're used to the 'publish-attend-present' model of CS conferences, rather than the 'meet-greet-discuss' model of math/science conferences. I suspect that many people will not come to a STOC in which there is no proceedings and no real publication attached to the presentation (at least none that can go on a CV). There's nothing wrong with the model per se, but it's not something our community is comfortable with, and since other theory conferences won't go this route, I don't see how we'll ever get comfortable with it.

Bottom Line

I applaud the idea of shaking things up in regard to the format for STOC. I just feel very strongly that we should learn from other models that exist within the computer science community, rather than making a radical shift towards a
model that's part of a very different framework for how the publishing/dissemination process works. I am unconvinced that the proposed model would solve the problem of attendance, and it has a good chance of making STOC entirely irrelevant.

Social theory...

as in, doing theory socially. Three points of interest:

Luca Trevisan is soliciting ideas for building a schedule/recommendation engine for FOCS using collaborative filtering. He's on a short time frame, so you need to know what you're doing, but I dare say there's an excellent ALENEX submission waiting for anyone who has the time to work on this.
Anand Kulkarni is proposing a theory overflow site, much like Math Overflow (which many of you inhabit already). I've been relatively happy with MO, and they're quite friendly towards algorithms folks (although sometimes a little confused about the difference between theoryCS and programming). But I do often tire of wading through pages and pages of unrelated questions to get to interesting ones.

I don't know if there's enough global support for theory overflow, but I do know that MO has been a fantastic resource for research-level mathematics, and with enough participation, theory overflow could get there too. If you don't know what I'm talking about, go to mathoverflow.net. If you think that it's a waste of time, I'll mention that among the ACTIVE participants there are Terry Tao, Timothy Gowers, Richard Stanley and Bill Johnson (as in Johnson and Lindenstrauss)
Mark Reid has built a discussion site for ICML 2010 (ICML has been doing this for a few years now). Each paper at the conference gets a page, and anyone can post comments on the page. Authors can opt to get email whenever someone posts a comment, and can in this way interact with discussants. I wonder if something like this might soon become a de facto part of all conferences.

Tuesday, June 22, 2010

Case Study in Large Theory Conferences: ICALP

Luca Aceto had posted a comment on my last post about STOC, describing his experiences running ICALP as a large theory conference. I thought it was fascinating, and requested that he repost a longer version on his blog. That post is here, and I strongly encourage anyone who's been involved in this discussion to go and read it... now....

I wanted to highlight a few snippets from it that I feel reinforce a number of points that I've been arguing.

ICALP is a three-track conference, and has been so since 2005, though typically only two tracks are running in parallel at each point in time. At ICALP 2008, in addition we had 12 satellite events, including the DYNAMO training school for doctoral students

Note that ICALP had an attendance range of about 500 - where we'd like STOC to be. It fits in with the pattern I was describing: more satellite events appears to correlate with larger attendance.

As an aside, ICALP had Peter Winkler do a masterclass on math puzzles. Frankly, if we could just hire Peter Winkler and Persi Diaconis to do lectures at every STOC, our numbers would go into the stratosphere ;)

The workshops were held the day before ICALP or during the week-end following it. They were selected by the ICALP organizers amongst a fairly large number of proposals that we received in response to a call for workshops, based on their perceived scientific quality and on their potential interest to the ICALP community.

I've said this before, but I do think that if we go the route of having workshops/tutorials, the best way to do it is how other conferences do it: have a workshops chair solicit proposals, and decide from amongst them. The workshop organizers then take care of the work of getting speakers. It will ease the burden a lot.

I firmly believe that the task of organizing a conference like ICALP should be shared amongst several people. This certainly worked for us and helped us work more cheerfully, and overcome personal problems, mishaps and periods of crisis and panic that arose during the year before the conference took place

Very true. Again, most conferences that have lots of activities have a large organizing group, with proceedings chairs, arrangements chairs, workshop chairs, tutorial chairs, and so on. Apart from the fact that people's CVs get nicely bloated with service activities, more participation at the top level can actually help with overall attendance, as well as alleviating many of Michael's concerns (although technically he was more concerned with colocation, which is an idea I like, but does take more logistical coordination).

Monday, June 21, 2010

On acceptance rates and flagship conferences

There's been a lot of back and forth on ways of increasing attendance at STOC, and in our wonderful theory way, all of this has happened in a universe unencumbered by the presentation of actual data.

I thought I'd dig up statistics on what exactly goes on at a number of major conferences in different areas in computer science. My idea was to take some of the major areas in the field, identify their flagship conference (or one of two as the case may be), and compile statistics on acceptance rates, attendance, and general conference activities.

The areas I considered (with main conference in parentheses) were

databases (SIGMOD)
machine learning (ICML)
operating systems (SOSP)
networking (SIGCOMM)
architecture (ISCA)
graphics (SIGGRAPH)

and the results I got were interesting (all data that I compiled can be found in this spreadsheet: feel free to add other areas, or update numbers, as you see fit). Where I could, I tried to get a sense of attendance/acceptance rates from either asking people or looking at published numbers for recent years: the ACM DL has acceptance rates for many of the above. Information on conference activities were taken from the most recent year I could get data for (usually 2010 or 2009). The main points:

All of the listed conferences had attendance in the 500-600 range (except ISCA with average attendance of 400, and SIGGRAPH with 2000+ in the research side). So they are definitely conferences with attendance that STOC would like to mimic.
Acceptance rates varied, but most were below 20% (ICML being the exception at 25%). STOC is at 28% or so
Number of papers accepted varied widely (23 for SOSP, 150 for ICML). I find this particularly interesting: it would seem that attendance correlates more with the perception of being 'flagship' than the actual number of papers accepted.
Most conferences had long lists of colocated workshops. The smallest number was SIGCOMM last year with 5, and others had many more. STOC had none.
Tutorials were variable: some places had many (SIGGRAPH had 27) and some had none. STOC had 3.
With the exception of ISCA last year, all the conferences had significant poster sessions, either consisting of all papers accepted, or as a separate track with many posters. STOC had none.
The conferences all had other activities: demos, industrial tracks, works in progress or other such things (ISCA being the exception). STOC had none.
Durations varied between 4 and 6 days (including the initial day). Most had 5. STOC is 4.

To me, there are two things that stand out from this.

The number of papers accepted does not appear to make a difference to the attendance. SOSP happens once every two years, and accepts 23-25 papers, and gets 500 attendees !! ICML gets a similar number of attendees with 150 papers accepted each year.
There are a TON of activities at these conferences. Indeed, I think ICALP and ESA match them in terms of level of activity, but certainly not STOC. I've been a proponent of satellite events around a conference to increase attendance, and the STOC/EC/CCC colocation does seem to have helped. I'm also intrigued by the idea of colocating SoCG with STOC.

You may draw your own conclusions...

p.s for the legions of readers who will start protesting that these communities are much larger than the theory community, I will merely point out that almost no one in this discussion thinks that the theory community is 300 strong: the question is more about getting the rather large theory community to show up in force for STOC.

Update: Michael Mitzenmacher has a post up listing specific logistical issues that come up with expanding the set of activities at a conference. He points out that if we decide to go to multiple satellite events (whether as separate events or whatever), we'll have to adjust to a much greater degree of organizational commitment up front, as well no small amount of 'attitude adjustment'. For anyone who's ever attended a business meeting, this is a scary thought :)

Saturday, June 19, 2010

The Shape of Shape Analysis Research, Part III

(a brief series on shape analysis: for earlier episodes, click here)

Shape matching research in computational geometry is fundamentally distance-based. In other words, we start with a distance function, and then design algorithms to compute it, or minimize it under transformations, or approximate it, and so on.

There's an important problem with this point of view. While computing the distance between two shapes is an important tool in shape analysis, it's not the only problem. Other equally important problems include:

Finding a shape similar to a query shape
Matching pieces of shapes together
Organizing shapes into groups (i.e clustering)

And so the problem with the distance-based viewpoint is that all you get at the end is an abstract metric space. You can compute d(x,y) in an appropriate amount of time (maybe), but you lack all the additional structure needed to solve these other problems efficiently. With our modern knowledge of metric embeddings, it's always possible to ask if these distances can be embedded in a more tractable space, but it turns out for measures of interest (Hausdorff, Frechet, earthmover), this cannot be done without incurring huge errors.

The idea of shape spaces turns this process around. Rather than starting with the distance, and trying to find a space to embed it in, shape-space based methods start with a mapping that takes a shape to a single point in a (usually curved) space, and use an induced metric (usually some kind of geodesic) as the distance.

By at least one unsourced account, this view of shape dates back to Riemann, but the modern formulation of this approach started with David Kendall, in the 70s. His idea was extremely elegant.

Consider a collection of closed simply connected regions of the plane (the shapes), each shape described by k points on its boundary. Each of these points can be described by the two coordinates (x,y), which we will write as the complex number x+iy. By a shifting transformation, we can ensure that the centroid of each shape lies at the origin. This loses one (complex) degree of freedom, yielding a k-1 dimensional complex vector.

Next, consider what it means to rotate the shape around the origin. In the complex plane, this corresponds to multiplying by the complex number z = exp(i theta). Doing the appropriate projective transformation, this means that we can identify a shape with a single point in k-2 dimensional complex projective space.
The distance between two shapes is now defined as the geodesic distance between two points in this space.

There are a few important points to note here:

Each shape of k points is mapped to a single point in a k-2 dimensional space.
All shapes are assumed to have the same number of points, which correspond across shapes.
The space is constructed by quotienting the original representation (the k-dimensional complex vector) by the special orthogonal group.

This last point is particularly crucial: the invariance under transformations is folded directly into the representation, rather than being something to "solve" via minimization.

The general program outlined by Kendall (map shapes to points on a manifold quotiented by a suitable set of transformations) has led to many other constructions, among the more notable being Bookstein's shape space and the Michor-Mumford representation for planar closed curves invariant under diffeomorphisms (which bears a strong resemblance to a summed variant of the Frechet distance). These methods have (for reasons unknown to me) taken up residence primarily in the computer vision community.

A Critique.

There is much to like about the shape space approach to shape analysis. Fundamentally, by embedding shapes in a space with structure, it gives us both a distance measure and a geometry to play with, and this is invaluable. However, there are serious limitations to the ideas developed thus far.

Computation: It's all very well to come up with a mathematically elegant formulation of a distance as a geodesic, but it's a lot harder to actually compute these distances. In practice, researchers often resort to heuristics with no guarantees beyond local convergence. To me, this is like building a beautiful mansion in a pit of mud: it's hard to get in and out with a lot of dirt and pain.
Scalability: the mathematical complexity also makes it harder to do scalable computations on shapes.
Global vs local features: I'll have more to say about this later, but these approaches (generally speaking) construct a global signature for a shape, which limits one's ability to do partial matching.
Correspondences: The Kendall method at least requires explicit correspondences between points in each shape. Finding correspondences is one of the most annoying parts of shape analysis (and affect most methods for comparing shapes).

Next: We examine the problem of hearing shape, or how the Laplacian starts to figure in.

Thursday, June 17, 2010

Rebooting how we publish in CS.

Dan Wallach has a thought-provoking proposal on how to reboot the CS publication process from the ground up. Read the entire proposal here.

Here's an edited version of a response I sent to him (short version: I like it !)

I think the time is ripe for this: it seems that people are getting more and more used to using the arxiv/iacr/eccc for tech reports and DBLP as a de facto list of papers, and even regularly subscribing to arxiv rss feeds to see what's new. bibref management systems like Mendeley/citeulike would also really benefit from this.

While (like others) I'm concerned about facilitating ranking schemes too much (I personally think the h-index is an abomination, but that's a different discussion), I think that even if the only outcome of this was to have a centralized single repository for CS publications, that in itself would be a major benefit.

I'm less sure about attention/reputation mechanisms though. It's clear that one of the challenges for researchers today is the 'eyeballs problem': how to get attention to your work amidst the sea of publications. While one might argue that Google and page-rank have done a good job of this, i think that over time it's become more and more top heavy, with a few locations acquiring sticky reputation and sucking in attention, and while this might be ok for general news, it's not so for research, where more often than not, good ideas can come from less "well known" sources.

I don't think CSPub causes any additional problems in this regard - but it would seem like much more thought is needed to design *transparent* ranking schemes. While google can do what they want with their ranking scheme, and keep it as a trade secret, a public service such as CSPub should try to keep ranking methods as transparent as possible. (hack-proof ranking methods ? I know there's research on this !)

It's over !!

Yes, by far the most stress-filled SoCG I've attended is now over. I'm hoping we didn't traumatize the attendees too badly.

I apologize for the lack of posts during the conference - I just didn't have enough time to compose intelligent thoughts about the talks that I actually did manage to attend, and even Bernard's metamorphosis into Steve Jobs (at least in talk style) went unremarked upon :).

There were lots of great talks though, and hopefully as I get more time, I'll be able to mention some of them.