Wednesday, September 30, 2009
Justice!
I can vouch for the class, since I indeed took it as a sophomore, some large number of years ago. I can also vouch for it in that I've watched the first lecture online, and the production quality is extremely high. Funnily enough, the first lecture was exactly as I remember it 20 years ago -- the script hasn't changed that much. (If you watch the lecture, you'll see the examples are quite memorable -- I honestly do remember them from when I took the class. But I won't spoil them for you here.) I'm going to watch all the lectures, and see how the class stands up after all these years. I hope you'll join in for some of the fun.
Tuesday, September 29, 2009
Blog Posts of the Day
This is interesting to me because this very issue came up in our last search (which I was leading), where we ended up making 6 offers (and got 3 acceptances). We (the hiring committee) recognized that we were making a rather significant request to have 6 simultaneous outstanding offers. We also recognized the dangers in trying to sequentialize these offers. First, there was the internal danger -- the complex discussions (we had such a great committee, we wouldn't have argued) we would have had to undertake to rank-order everyone we wanted to make an offer to. And second, there's the external danger that the candidate -- who will, of course, find out they were the "second choice" -- takes the ordering as a negative signal and becomes more inclined to take another offer. One can argue whether or not a candidate should take such an ordering as a signal, or whether such a reaction is a purely emotional response. (See Mihai's post, for example, and judge for yourself in that case.) But it was clear to us that, even if no such signal was intended, there was a high risk that would be the interpretation from the standpoint of the candidate.
Mihai's post provides a solid empirical data point that we were right to have this concern; it's something I will keep in mind (and if necessary point to) in future hiring discussions. I'm glad we were able to make 6 simultaneous offers, and give all of the candidates we made offers to the right signal.
Something not worth reading is Kevin Carey's article in the Chronicle of Higher Education, where he seems to be saying that Harvard should be doing more for undergraduates and in particular admitting more undergraduate students. It's so bad, I can't bring myself to link to it. Without judging the point of criticism, I've pointed out that his rant is pretty much devoid of an actual argument; if you care (and really, I'd suggest reading Mihai's stuff first!), you can see that I'm somehow now embroiled in an argument with him here and here.
Saturday, September 26, 2009
UC President Interview
Thursday, September 24, 2009
Extending the Sketching of Sketches Result
They consider a model where a stream of pairs (i,j) in [n]^2 arrive, giving a joint distribution (X,Y), and the goal is to determine how close the joint distribution is to the product of the marginal distributions under various metrics, which naturally corresponds to how close X and Y are to being independent. All the normal goals in the streaming setting hold (small space, small computation per new pair, one or small number of passes, etc.). We extended one of their main results to higher dimensions (a problem they had left open) : the stream is now k-dimensional vectors in [n]^k, and we wish to approximate the L_2 distance between the joint distribution and the product of the marginal distributions in a single pass. We give a randomized algorithm that is a (1 plus-minus epsilon) approximation (with probability 1-\delta) that requires space logarithmic in n and the stream size m and proportional to 3^k. The paper should be of interest to those who know and like the original Indyk/McGregor paper, which I assume is a signficant part of the streaming community.
The Indyk/McGregor proof required a clever application of the Arithmetic Mean-Geometric Mean inequality. To move the proof up into higher dimensions, we end up having to use the AM-GM inequality twice, along with a clever partitioning of the objects (edges in [n]^k) to get the right constant factor 3^k out. It also seems that this is the "right" dependence on k, at least for this algorithmic approach. (A previous result by Braverman/Ostrovsky also tackled this problem, but had a 2^O(k^2) dependence on k.)
We submitted this to SODA, where it didn't get in, but seemed to be on the borderline. The reviews were on the whole quite reasonable. My take is that in this paper, we (by which, of course, one should read "Kai-Min and Zhenming") took a nice previous SODA result, presented a non-trivial extension, and really wrote it up very clearly and nicely -- I'd like to think we really clearly explained the essence of the argument and why it works. The reviewers, I think, recognized this, but on the other hand had to deal with the issue that it was just a single result. The original paper had more results, and, at the risk (or hope) of instigating commentary, I'll suggest that SODA as a whole is more interested in quantity over writing quality ; a hodgepodge of lemmas and theorems stands a better chance of getting in than a single well-written idea. (The original Indyk/McGregor paper, I should be clear, both has a number of results and excellent ideas, and is not what I'm referring to in this statement.) I understood going in that this would probably be borderline for SODA; I had hoped it would make it in, but this is certainly not a case where I would complain about the decision or the review content by the PC. (One might argue that perhaps SODA should take more papers, and in particular seek out more papers of this type, but that's a different, higher-level argument.)
There are other conferences we could submit to, but the conferences with upcoming deadlines didn't seem particularly suitable. (In particular, they weren't in the US, and would require either extensive travel for me, or visa issues for the students.) Since the result seemed to us essentially done as it was, we're submitting to a journal. But it's available now for anyone who is interested.
Tuesday, September 22, 2009
Not Teaching / Double Teaching
The reward for the work last semester is that I'm not teaching this semester. I still find I'm coming in, in part because there's a lot of administrative things that have to get done (especially the first few weeks of the semester), and in part because it's often not productive to work at home with a small toddler there. But I'm enjoying having the time available that comes from not having to teach.
The question is what to do with that time. I've had a good research summer, including finishing off a number of journal versions that had been lying around, and I certainly could just keep going forward on a number of research projects. But it feels like I've been given a few months of opportunity, and I should find a way to use it. I have a couple of ideas for books, and I feel like (after I finish off a few more dangling research projects) I should motivate myself to start one of those. I know a book is a big project, so even with this opportunity it will take some mental discipline to get started.
For those who are faculty, what would you do with a semester off of teaching (where you can't go away on sabbatical)? Anyone have any suggestions on what to do with the gift of time?
Friday, September 18, 2009
FOCS Early Registration (A Tale of 3 Web Sites)
Keep in mind there will be a special 50th FOCS Anniversary day on Saturday October 24th with talks by Dick Karp, Mihalis Yannakakis, Noga Alon, and Manuel Blum.
Be careful which FOCS 2009 page you go to. The one that Google brings up first is focs2009.org, which I won't even give a link to, since it's a squatter, which at least is polite enough to point to the Georgia Tech site. The second one Google brings up with the Yale site, which Dan must have set up for submissions, but it's the Georgia Tech site that has all the local arrangements info. Having just chaired I can understand why it's easier on the organization end to have one site maintained by the PC chair and another by local arrangements, but it doesn't seem like the best idea from the attendee end. Ah well, now you know where on the web to go.
Wednesday, September 16, 2009
Random Musings for the Day
Richard Lipton is taking bets on whether P = NP; well, not really, but he thinks there are "implicit odd" in how we do research suggesting the "belief" that P = NP is in the 1,000,000 to 1 range, and doesn't see the evidence for it. (I disagree, somewhat, in the comments.)
Harvard's budget cuts have substantially closed our deficit (the "crisis" is a full further year out).
Harvard's Dean of Admissions and Financial aid answers the all-important questions. (Do I have to read now, if my oldest is a decade away from college?)
The unfolding story of the murder of Annie Le (see here if you're not caught up on your news) is disturbing on many levels, but in particular because I don't want science students -- and in particular female science students -- to feel unsafe in their buildings at whatever hours. I haven't heard much in the news reports about how situations like this (where I mean potentially unsafe working conditions such as the need to work in labs late, and -- although we surely don't know the story yet -- the appearance that the murder might have been done by a maladjusted male admirer) might impact people, and especially women, pursuing scientific careers. But the story made me think of it.
Tuesday, September 15, 2009
SIGGRAPH article
I was primarily involved in the "hash table construction" part of the paper. The ideas there are familiar in various works (though I don't know of another parallel implementation of cuckoo hashing); the question is how to put pieces together for performance on this architecture. We end up using a 2-level scheme; first hash the items into (1st-level) buckets, with a limited number (512) of items per bucket. Then (2nd level) for each bucket the items are placed in parallel in a cuckoo hash table, using up to 512 thread (one per item) and fast memory for each bucket. Then everything is copied out to global memory. The comparison point, for applications, is sorting/binary search; sorting has been highly optimized for parallel processors. Generally, we come out about even on the construction times, and much better on the lookup times.
The cool stuff is the applications my co-authors tested with the hash table. They use it to compute real-time intersections of surfaces, and for matching of objects (buildings, etc.) to video frames. The accompanying video explains it all better than I could here (picture, thousand words, all that). Since my co-authors did all the work on the applications end, I think I'm allowed to say I'm impressed with the results -- graphics applications just look really great.
The "structure" of the paper reminds me of networking papers I've been involved with. Here's an algorithm/data structure, and here are 1-3 applications to show you what it can really do. This last part is usually a lot of work, since you have to embed the algorithm/data structure into a real system, with many moving parts beyond the algorithm/data structure you're promoting. Getting things to work is always a challenge (which is why simulation is so popular -- and, in many settings, discouraged).
I'm very excited about continuing to do more in this area, and that's due entirely to the amazing abilities of my co-authors: Dan A. Alcantara, Andrei Sharf, Fatemeh Abbasinejad, Shubhabrata Sengupta, John D. Owens, and Nina Amenta.
Monday, September 14, 2009
Colleges, Newspapers
It's not really a new theme, I've been hearing this for a while. I find myself agreeing in spirit -- I know things are going to change dramatically in education. I just find myself skeptical that anyone really knows what the change will end up looking like at this point. Here are the key paragraphs to me:
"When this happens -- be it in 10 years or 20 -- we will see a structural disintegration in the academy akin to that in newspapers now. The typical 2030 faculty will likely be a collection of adjuncts alone in their apartments, using recycled syllabuses and administering multiple-choice tests from afar."
Strangely, this is a fair description of my Harvard Extension School class, which is "recycled" from my regular Harvard class -- the lectures are taped, the assignments are pretty much the same, and the one difference is since I can't have proctored "in-class" exams, I do a multiple-choice exams over the Internet. Eerie. Maybe somebody does know what the change will end up looking like.
"Not all colleges will be similarly affected. Like the New York Times, the elite schools play a unique role in our society, and so they can probably persist with elements of their old revenue model longer than their lesser-known competitors. Schools with state funding will be as immune as their budgets. But within the next 40 years, the majority of brick-and-mortar universities will probably find partnerships with other kinds of services, or close their doors."
It's nice to hear that I'm probably safe. (Harvard's still considered an elite school, even after the endowment drop, right?) I just need the current system to last another 30 years or so. But, as usual, this is something for those starting or thinking about a PhD to ponder...
Thursday, September 10, 2009
Wednesday, September 09, 2009
Research Labs "vs." Academia
I spent two+ years at Digital Systems Research Center out of graduate school, and I loved it. It was great training (Andrei Broder was, as I always rave, a great mentor to me there, but there were many other more senior scientists there who were very helpful to me as well), it really cemented my interest in connecting systems and theory, and my experience matched, to a great extent, the positives mentioned Danah Boyd's blog entry: freedom to pursue what I wanted, a great density of great people, many chances to interact with students, and so on. All without the hassle of grant-writing.
If I hadn't also liked being a professor, I would have gone back. But I liked being a professor, and ended up staying at Harvard. I don't think I'd say one is better or worse than the other. I think it depends on a lot on the person, and for many people, both are possible answers.
The one looming aspect of the research lab (perhaps underestimated in the linked blog post) is the issue of job security. I moved to Harvard just after Digital got bought my Compaq, later bought by HP, later... well, the Systems Research Center just doesn't exist. Similarly, IBM and AT&T labs have gone through massive changes over the years. Microsoft seems like a safe bet in terms of a stable research lab environment for the next decade or so, but it's hard to prophesize beyond that. Now, the "insecurity" of the corporate environment is maybe not such a big deal -- people from SRC for the most part just moved on to different jobs, at Microsoft or Google or Yahoo or HP or a startup or academia or wherever, and continued to be very successful. But I do know from firsthand and secondhand experience it can be an unpleasant, even if only temporary, external disruption if your company situation changes significantly.
I hadn't thought much of the research lab vs. academia issue much in recent years since as a family we're quite happy in our present location, and Boston hasn't really been a hub for research labs until recently. Now, I suppose, there are more options. While I don't imagine leaving academia any time soon, I have been spending some time this summer at the Microsoft New England lab, and have been really enjoying it. But more on that will be a topic for another post.
Tuesday, September 08, 2009
ESA 2009
ESA is unusual in that it has an Engineering and Applications Track, so you actually see some papers with graphs and numerical results from implementations. I only point this out because I almost collapsed in shock when leafing through the papers I saw some actual performance plots (in algorithms papers????), but then recalled this feature of ESA.
I think there's something at ESA for everyone; sadly, I'm flying back and missing a number of the talks of papers I'd like to see, including some hash-based data structure talks:
Rina Panigrahy and Eric Lehman. 3.5-Way Cuckoo Hashing for the Price of 2
Johannes B Hreinsson, Morten Krøyer and Rasmus Pagh. Storing a Compressed Function with Constant Time Access
Martin Aumüller, Martin Dietzfelbinger and Michael Rink. Experimental variations of a theoretically good retrieval data structure
A paper outside my usual scope that caught my eye:
Erik D. Demaine, MohammadTaghi Hajiaghayi and Dániel Marx. Minimizing Movement: Fixed-Parameter Tractability
Fixed-parameter tractability seems like a great direction for getting efficient algorithms for hard problems; the paper reminds me that I could see adding a lecture on it to my undergraduate algorithms class.
The title that got my attention most was:
Andrew McGregor, Krzysztof Onak and Rina Panigrahy. The Oil Searching Problem
I just wanted to know what the oil searching problem was from the title. It's a competitive ratio/exploration problem generalizing the standard cow-path problem: you've got n wells, each with oil at a different depth, how do you drill to find oil with as little drilling as possible. As the authors point out, this can model research -- you've got n problems that will require different unknown amounts of work to solve....
Award-winning papers are:
Best student paper : Heidi Gebauer. Disproof of the Neighborhood Conjecture with Implications to SAT
Best paper: Christoph Dürr, Flavio Guiñez and Martín Matamala. Reconstructing 3-colored grids from horizontal and vertical projections is NP-hard
Before heading to the airport, I got to see Erik Demaine's talk on Algorithms Meet Art, Puzzles, and Magic. It's an interesting and entertaining talk, though I wish he had a little more time at the end. I recommend it if it comes to a town near you.
On the plus side, ESA still gives out a hefty LNCS paper book, so I'll be able to read about what I miss on the way home, at the cost of dragging some extra heft around various airports.
Monday, September 07, 2009
Controversy at ESA
1) I was talking with some others this afternoon about awards. Why aren't more conferences giving test of time awards (particularly in theory)? An award for the best paper from 10 years ago at the conference seems like a wonderful thing to have regularly.
Then we started talking about negative awards. Besides giving a best paper award at the conference, what if we gave a worst paper? Funnily enough, it didn't seem like such a negative. "Congratulations, you did just the right amount of work to get your paper over the bubble to get in!" How bad could you feel about that? Perhaps a worst talk award, in contrast, would inspire people to prepare better talks.
2) ESA had issues this year with deadlines -- they had to set their submission deadline a week after ICALP decisions, but still leave enough time for the other associated ALGO workshops to have their deadlines after and satisfy publication constraints for proceedings. This didn't give a lot of time for the PC decisions. I know I had issues chairing STOC as many other conferences tried to set their deadlines after decisions would be announcing.
Why don't we get a unified theory calendar, and have all deadlines preset a year in advance? (This was essentially Thore Husfeldt's idea, if I've interpreted him right.) Stick it on a website/Wiki somewhere, so everyone can see what's going on in advance and plan appropriately? Of course, such advance planning and coordination seems impossible (until we actually get organized and do it).
3) A related issue -- ICALP vs. ESA. Does one dominate the other in your mind? (Because of deadlines, ESA gets and accepts a lot of ICALP rejects.) Maybe we can get a European version of the SODA vs. FOCS/STOC debate going.
4) Amos Fiat, showing the slide at the business meeting on what percentage of papers in each "category" were accepted: "The higher the percentage, the less the PC knew about that topic." Discuss. (Once those SODA reviews come out, you can discuss in that context as well.)
ESA Talk and Paper
One open question that was "solved" between the time from the paper to the talk is that a group of us now have tight bounds on the "thresholds" for cuckoo hashing in the case of 3 or more choices, with 1 bucket per bin. Hopefully we'll have a writeup available soon, but I guess I'm "annoucning" the result.
Friday, September 04, 2009
Wednesday, September 02, 2009
ESA Plans
Tuesday, September 01, 2009
ITA, Polaris
About the time I was there, they were embarking on a new project -- a whole new flight reservation system for Air Canada codenamed Polaris. I wondered what had happened to that project, and when we might see the results. So I was disappointed when news finally came out that from the Air Canada side, the rollout was suspended.
Under the assumption that their tech on this project would be at least a fraction as awesome as their previous systems, and because I'd like local tech companies to do well, I hope they find another customer. If anyone has inside info to post anonymously (hopefully of the form "We're doing just fine, thanks.") please feel free...