Thursday, January 31, 2008

Quantum Algorithms for Computational Geometry

You'd think the title of this post is a joke paper title, but it actually isn't. There are now at least two papers (the second one even uses the word 'revisited' in the title!) on quantum algorithms for geometric problems.

Neither paper (and they basically admit this themselves) really pushes the envelope in the QC dimension. The first one, by Sadakane, Sugawara and Tokuyama, uses Grover's quantum search algorithm as a subroutine to solve a number of search problems in geometry, many of these reduced to the problem of finding the lowest point in the upper envelope of an arrangement of halfplanes. The second, by Bahadur, Durr, Lafaye and Kulkarni, uses a quantum algorithm for element distinctness based on a random walk method by Ambainis to solve some of the above problems. They also show lower bounds for some of these problems. One interesting result is an n^{2/3} lower bound, and nlog n upper bound (!) for 3SUM.

I'm not sure what to make of these results: apart from the fact that I have a hard time seeing a need for quantum algorithms for geometric problems, I don't know whether these are anything more than (insert model of computation) algorithm for (insert name of problem) type results (not that such results are bad, but they are not always interesting).

Monday, January 28, 2008

Plotting implicit functions: a bleg..

Does anyone know of good tools for plotting implicit functions ? Ideally, I'd like something that could plot functions of the form f(x,y) = 0, and even contours of the form f(x,y) = c.

(for those who aren't hip with web jargon, a bleg is a beg on a blog ;)

Important heuristics: Alternating optimization

The Kleinberg/Tardos algorithms textbook has a nice chapter on local search heuristics. As has been discussed earlier, learning heuristics as part of an algorithms class is important so that we know what to do when a problem is NP-hard and approximations are either provably hard or elusive.

However, perhaps one of the most important heuristics that is used in practice is not discussed, and I have yet to find a source that examines its power fully. This heuristic is "Alternating minimization", and appears "in the wild" in two wildly popular heuristics, the k-means algorithm in clustering and the Iterated Closest Pair algorithm in model registration.

Suppose you have a cost function D defined over the product of domains P and Q. For concreteness, let's say that P is the space of all assignments of points to clusters, and Q is the space of all assignments of centers to clusters. The standard k-means cost function asks for an assignment of points to clusters, and an assignment of centers to clusters, so that the sum of squared distances of points to their cluster centers is minimized.

In general, such an optimization is hard to perform. However, it is often the case that for a fixed p in P, we can minimize D(p, Q), and vice versa. Continuing the example above, using Euclidean distance as the underlying distance, it's easy to see that for a fixed set of points in a cluster, the minimizing center is their centroid. Moreover, for a fixed set of centroids, the assignment is a straight nearest neighbour assignment.

This is the idea behind alternating minimization. Starting with arbitrary p0 and q0, we repeat the two-step procedure:
  1. p_i = arg min D(p, q_{i-1})
  2. q_i = arg min D(p_i, q)
in the hope that it will converge to the optimal solution.

Now there are three questions that one usually has to answer for an alternating minimization. The first one is relatively straightforward:


Does it converge ? In general, it's often fairly easy to write down a potential function that decreases strictly with each iteration and is always positive. Thus, convergence can often be established for this procedure.

Does it converge to the optimal solution ? In general, no. This procedure can often get stuck in local minima. However, (and this is one of the more attractive aspects of this heuristic), Csiszar and Tusnady showed that as long as a certain "5-point" property holds (a simple inequalities involving iterates of the procedure), the process will converge to the optimal solution. In their paper, this result is used to show convergence when P and Q are convex sets of distributions, and D is the Kullback-Leibler distance.

Another useful example where the procedure converges is when P and Q are convex sets, and D computes the closest distance between the two sets. In that case, each iterative step determines the closest point in a convex set closest to a given point.

A variant of the above question is: does it converge to a close-to-optimal solution ? Again, in general, one cannot say anything useful about this. However, for k-means, recent work by Arthur and Vassilvitskii, and Ostrovsky, Rabani, Schulman and Swamy, has shown that choosing start points carefully can yield provable bounds on the quality.

Does it converge in polynomial time ? Perhaps the most important question about this heuristic is whether it converges in polynomial time. For k-means, it can be shown that there are examples that will force the heuristic to take superpolynomial to converge. However, recent results in smoothed complexity have shown that if the input is perturbed ever so slightly, then the iterative procedure converges in polynomial time (with high probability). In a sense, one can view this result as explaining the success of the method in practice.

Alternating minimization appears in many domains, and is a common trick for "solving" complex optimizations. It should definitely be taught wherever heuristics are discussed.

Tuesday, January 22, 2008

SODA 2008: Summaries

I've written before about the perils of blogging at the conference: another problem is that if you don't blog, people come by and start asking you when the posts are going to appear.

Before you read further, do check out posts by DavidE (II), Michael Mitzenmacher (I, II), and a "stranger in a strange land" take by Hal Daume, my colleague at Utah and an ML/NLP person whose blog you should definitely follow if you're interested in such topics.

There's been a lot of work on clustering in high dimensional spaces, and I want to say a lot more about this, especially the "abstract framework" results of Kumar, Sabherwal and Sen, but that will have to wait for another post. For now, I want to highlight the paper, "Clustering for Metric and Non-Metric distance measures" by Ackermann, Blomer and Sohler. Essentially what they do is take the framework for approximate linear time clustering in high dimensions, and remove the need for either symmetry or triangle inequality in the distance measure, replacing it by a set of sampling conditions that have a more statistical feel to them. Doing this allows them to cluster with more general distance functions, including the relative entropy or Kullback-Leibler divergence. I'm a sucker for results that illustrate "what's going on", especially in the realm of approximate high-D geometry, and this is a great example of attempting to get to the core strucutural properties needed to make an algorithm work.

Outtakes:
  • After many years, I actually missed the business meeting this time, so no drinking games or updates. From what I hear, Austin is the first choice for the venue in 2010 (we're in NYC for 2009). Claire Mathieu is the PC chair for SODA 2009.
  • Overall, I really enjoyed SODA this year: somehow I felt that there were far too many interesting papers (too many for me to attend the talks, that is), and the papers were of really high quality.

Monday, January 21, 2008

SODA 2008: Day 1

I've been terribly remiss in blogging, because I've actually been trying to do some work (!). At any rate, do read David's take on Day 1.

The first session I attended was on embeddings (which was placed parallel to the geometry session: NOT A GOOD IDEA). The first talk was by Edo Liberty, on work with Nir Ailon on "Fast Dimension Reduction Using Rademacher Series on Dual BCH Codes".

The Johnson-Lindenstrauss lemma, which tells us that any set of n points in an d dimensional l_2 space "approximately" lives in a space of dimension k = log n, has had a huge effect on data analysis, as a tool for reducing the dimensionality of a set of points. Although the construction is very simple (project each point using a matrix populated with random entries), it doesn't scale, because the matrix required is dense, and so is hard to maintain for large high-dimensional data sets. Also the transformation itself takes time O(dk) per point, and is therefore expensive if d and k are large.

Earlier work by Ailon and Chazelle showed that the JL-transform can be implemented in time O(min(d log d,k^3), which improves the "trivial" bound for certain values of k (as a function of d). The paper presented above improves the embedding speed for much larger ranges of k, and draws on a number of tools from Banach spaces and functional analysis, as well as using some coding theory to "derandomize" the construction. It's a good example of practice (how do we implement the JL-transform efficiently) driving new and nontrivial theory results.

Saturday, January 19, 2008

ALENEX/ANALCO 2008: Invited talks

I'm in San Francisco for the SIAM-ACM Symposium on Discrete Algorithms (SODA), and as is tradition, Saturday is the day for ALENEX (the workshop on experimental algorithms) and ANALCO (the workshop on analytic combinatorics).

ALENEX:

The ALENEX invited talk today was on routing problems. Rolf Mohring from TU Berlin presented a pair of case studies of routing problems in the "wild".

The following video explains the first case study:


Actually, the problem he was looking at was a tad simpler: you have containers coming in at Hamburg port, and you have automated robot vehicles that ferry the containers from the loading dock to their temporary storage locations. The problem is to create routes for the vehicles (on the fly) so that they reach their destination at a reasonable time without colliding with each other.

The actual solution involved successive shortest paths on a "time-expanded" flow graph, which in their case was a relatively simple grid graph. The time expansion refers to the fact that an "edge" is occupied between two time instants, and of course travel is blocked on an edge when it's occupied.

The second application, which in my mind was more impressive, was an instance of large-scale optimization to redesign the Berlin subway system. Perhaps the most impressive aspect of this work was that it's actually being used ! The latest incarnation of the Berlin subway schedule uses the schedule designed by their scheme, and the speaker showed a neat 4 minute Numb3rs-style video at the end targeting the layman that explained their work.

This is a tremendous achievement, both technically and politically. And I can't but wonder whether it's something about Germany itself, the home of classic engineering, that allows optimizations designed by researchers in a university to actually be implemented on a subway system. I'm probably wrong here, in that there must be numerous examples of success stories of OR in the US as well, but it does make me wonder.

Analysis: As examples of "applying algorithms in the real-world", the talk was great. As an example of the intricacies of doing "applied algorithms", it was less impressive: most of the "compromises with the real-world" were the same kinds of modifications to theoretical problems that one might expect to see in papers at ALENEX, and there was little takeaway to be had or lessons to be learned: I'd have loved to hear about how exactly they were able to convince the Berlin subway authority to implement their schemes.

ANALCO:
This was either planned, or a happy coincidence, but the ANALCO invited speaker was Don Knuth, just a few days after his 70th birthday. The title of his talk was 'Puzzling problems', and as befits a Knuth talk, it was a random collection of cute problems in combinatorics. He had promised 5 such, with an exponentially increasing amount of time spent on each successive problem, but he only got through three of them. David's already described them in more detail here; all I can say is that it was a typical Knuthian talk :)

And yes, there were talks ! One talk that I'll draw attention to is on the work by Coleman and Wirth on the feedback arc set problem on tournaments. Given a directed graph in which for each pair of vertices, there's a directed edge one way or the other (i.e a tournament graph), can we delete the fewest number of edges so that the resulting graph is a DAG ? One important application of this problem is for aggregating rankings. Think of all the non-bribed figure skating judges at a competition, each creating a ranking of the performers. How do we aggregate these rankings into a "consensus" ranking ?

What is interesting about this problem is that it relates closely to sorting algorithms. In a sense, you can think of the rank aggregation problem as attempting to "sort" the participants into a total order, much like we have with a sorting problem. An earlier paper by Ailon, Charikar and Newman showed that a quick-sort like heuristic actually achieves a 3-approximation to the optimal solution to the feedback arc set problem, and other local heuristics for feedback arc set can also be related to sorting algorithms (although the analog of insertion sort performs quite badly).

Mergesort has not been translated to this framework: it's not obvious (but not hopeless) to see how to reformulate mergesort as an algorithm for feedback arc set, and indeed one open question from the earlier Ailon et al paper is an analysis of the behaviour of a mergesort-like method.

That's all for today, folks ! Tomorrow the action starts in earnest...

After notes:
  • The hotel wireless works in the lecture rooms (!). It's also free. this must be a first...
  • The location is great. The hotel is on Van Ness and there are tons of restaurants (and Peets) within a block, as well as a drugstore and a Whole Foods.
  • The ALENEX business meeting was the shortest business meeting in recorded history: it took all of 5 minutes. I commend the organizers for their good taste, and can only hope that SODA emulates them.

Thursday, January 17, 2008

A Post-Doc opening

I'll post this on the usual forums shortly, but thought I'd place it here since SODA is coming up.
I'm looking to hire a post-doctoral researcher to work with me at the School of Computing. The position is initially for one year, with the possibility of extension for another year based on mutual consent. Applicants must have expertise in algorithms and computational geometry, and preferably should have experience with high-dimensional geometry, approximations, and large data algorithms (external memory/streaming/etc). Some familiarity with differential geometry is a definite plus.

For more on my research, visit my webpage. To apply, please send me a CV, a statement, and three letters of reference. Feel free to contact me if you have questions about the position.
To add to the above: if you think you might be interested, and will be at SODA, drop me a note and we'll try to meet up.

Wednesday, January 16, 2008

Readable math

Bill Gasarch asks about math books you can actually read. This reminded me of my post from over 3 years ago, "Books that read like a thrilller". I'd update that list with
  • Information theory, by Cover and Thomas
  • Randomized Algorithms (Motwani-Raghavan): it's an old book, but even when I read it today, I realize just how well it flows.
  • Convex Optimization (Boyd/Vandenberghe): there's just something about the typesetting :)
  • Computational Complexity (Arora/Barak)
  • Convex Analysis (Rockafellar): A model of precision and clarity
  • (almost, but not quite in the pantheon): Approximation algorithms, (Vazirani)

Two surveys/sites on string matching and text processing.

1. A survey on string algorithms and data structures by Paolo Ferragina.
2. A paper surveying the theory and practice of compressed text algorithms, by Ferragina, Gonzalez, Navarro, and Venturini.
3. The "Pizza and Chili" site for test data and sources for compressed text indexing (associated with the paper above).

Tuesday, January 15, 2008

The CACM is dead ! Long live the CACM

Readers of this blog will know that I love to HATE the CACM, the embodiment of all that is wrong in our IT-obsessed, SCIENCE-ignoring culture. Well, the main source of my angst is now gone ! vanished ! replaced by something that <shudder> resembles an actual scientific magazine. Here's the story:

Way back in 2005 (before Web 2.0 ! Prehistoric!), the ACM realized that something needed to be done about the CACM. As Peter Denning tells it,
When David Patterson was president of ACM, many researchers told him they thought CACM had never regained its vaunted glory of the 1970s. Patterson set up a committee to review the current model and propose ways to recharge its content and scope
This committee was chaired by Moshe Vardi David Patterson, and Moshe Vardi, who is now the editor-in-chief (EIC) of the new and improved CACM. He recounts a story that is familiar to all of us:
While the format envisioned in the early 1980s was that of a content-rich magazine, the reduced role of the editorial advisory board combined with a relatively small professional staff meant that most of the content came from submitted articles. Over the years those articles have evolved to be strongly slanted toward Management Information Systems. Over time, a significant segment of the ACM membership lost interest in the publication.
The committee was working off the model of Science, as a fast turnaround-high-prestige magazine that showcased work of broad interest, without necessarily being too technical in focus. The main conclusions of the committee were to leave the more "practice" sides of computing to Queue magazine, and focus on a more research and news structure. The new CACM has the following basic outline:
  • News: The news section will have a very distinct “voice,” covering research and practice in computing on a global scale.
  • Practice: CACM will offer coverage of cutting-edge, emerging technology issues.
  • Breakthrough Research: The goal is to bring research articles, covering the broad spectrum of computing research, back to CACM. This will be a combined effort of conference and program chairs (ACM as well as non-ACM) and the CACM editorial board to select the best research material coming out of conferences and share that information with the wide readership of CACM. Each selected research paper will be adapted to the broad CACM readership and will be preceded by a short “Technical Perspective,” giving readers an overview of the important points and significance of the research.
  • Refereed Articles: CACM will continue to publish, as it has since its early days, peer reviewed articles.
  • Opinions and Views: The final component in the content model of the new CACM is a section dedicated to opinions and views, typically of a nontechnical nature.
Of course, the middle two bullets are the new ones, and hopefully will drive CACM back to its roots as a technical magazine. What's interesting to me is that as Peter Denning tells it, almost this exact same restructuring was first proposed in 1983, and floundered after a visit to the offices of Science, where the ACM folks realized that a true research-driven magazine required far more technical manpower than the ACM of the time was willing to commit. Moshe Vardi indicates plans to expand the staff of CACM in order to make sure that the new incarnation can be effective in this mode.

Another important aspect of the new incarnation is a complete graphic design overhaul:
There is also a need for a complete redesign of CACM’s Web site, with the goal of creating a dynamic, rich Web site that is much more than an online store of the magazine’s content. The aim is to think of CACM as consisting of a print publication, a rich Web site, and email channel to readers.
And so we now have the 50th edition of CACM, in its new form. It's extremely unfair to judge a major revamp like this after one issue, but let's see how it looks anyway.
  • The design: It's pretty slick, with a web-based contents and navigation mechanism inside what is called a "Published Web format" designed by a company called Texterity. The feel is PDF-like, but on the web. There are all kinds of nice features for extracting interesting web links from pages, or from sections. Overall, quite snazzy (but then I'm not a professional graphic designer, so what do I know). I will add that given the general 30 year lag between technology in the "real world" and on the ACM sites, this is a ginormously humongous improvement. The interface does seem a little slow to respond to things like page turns and menu options; I'm using the actual PDF for quick reading and text cut-and-pasting.
  • General content: Given the 50 year anniversary, there are many retrospectives, as well as articles by each of the past CACM EICs. These tell a very interesting picture of the rise and fall of CACM, especially in the period when Peter Denning was in charge. He tells of how budget cuts and restructuring led to the decision to remove research content from it. There are perspectives from well known luminaries in the field, including Jon Bentley, Rodney Brooks, Jeanette Wing among others. There's a longer article by Gordon Bell on "a theory of the evolution of a computer"
  • Research content: There are two main "breakthrough research" articles. A first on MapReduce, the Jeffrey Dean/Sanjay Ghemawat algorithmic design paradigm that runs much of Google's internals, with an introduction by David Patterson. The second is on an old friend, locality sensitive hashing, (in a new incarnation by Alex Andoni and Piotr Indyk) and the introduction is by Bernard Chazelle, which basically means you all should IMMEDIATELY stop reading this and go read what he wrote, gems like, "Sharp measure concentrations and easy spectral predictions are the foodstuffs on which science feasts."

    Both of these are excellent choices for "broad interest" articles. MapReduce has revolutionized the way we think (or should think) about large-scale computing, and there's even been some attempt at an algorithmic model around it. Nearest neighbour search is one of the most basic operations in any kind of data analysis, and I know for a fact that many people actually use LSH for their data mining problems. More of this, please !
  • The opinion section: Not much to remark upon in this issue: definitely nothing that would satisfy a focus group attendee's wish for "blood on the pages". But again, time will tell.
For years, we've all been complaining about the CACM, and the new revamp definitely deserves our attention. It'll take some time before we can tell if the new version of CACM is truly better, but the signs are looking good.

Saturday, January 12, 2008

Core Sets As Grids

This is another post in my ongoing series of missives from the algorithms seminar I ran last semester (yes I know, I'm late: I have a good excuse though, and it currently weighs 6 lb 11oz).

One of the most exciting developments in the area of high dimensional geometry was the introduction of the idea of core sets. The basic premise is this: suppose we have a geometric optimization problem on a set of points P of size n (Amen!) that can be solved exactly in time p(n) (where p might be some relatively unpleasant polynomial in n). Suppose we can select a set of points S (not necessarily contained in P) of size g(1/epsilon) (the "core-set") such that the value of the optimal solution on S is close (i.e within a 1+epsilon factor) to the value on P, then we can run our expensive algorithm on S, rather than on P. The new running time is n (typically) to compute S, plus p(g(1\epsilon)) to obtain the approximate solution. What we've done here is construct a linear-time approximation scheme for our optimization.

There's a bit of nice magic here: we are essentially claiming that a set of points with size independent of the input can suffice to give a good approximation to the desired answer.

There are many ways of understanding the real power of core sets, and when they exist: perhaps the most surprising result is that for certain high dimensional problems, it is possible to find a core-set that has size independent of both the input size and the input dimension, thus circumventing the famed curse of dimensionality. But that's not what I want to discuss here. Rather, I want to present the view of core sets as a kind of generalized grid: I find this viewpoint less "magical" and often useful when trying to figure out when a core-set might actually exist.

Sparsifying via a grid:
Suppose we have a large collection of points in the plane, and we wish to "shrink" the input to some manageable size. The first thing we can try is to lay down a grid, with side length R, and snap each point to the center of the grid cell it's in. If n is large compared to R, we'd expect many points to lie in each cell. If the diameter of the point set is D, then we need (D/R)^2 grid cells.

There are two problems with this: firstly, D could be much larger than n. In fact, since D is a function of the coordinates of the input, it could be exponentially larger than n ! Secondly, the perturbation of each point to the center of the grid incurs an error of possibly R/\sqrt{2}, and so R needs to be very smalll.

Consider the problem of computing the minimum enclosing ball of a collection of points. The radius of this ball is within a constant factor of D by the triangle inequality. This means that if we're willing to tolerate a multiplicative error in our bounds, we can perturb each point by epsilon*D, while maintaining an epsilon-approximation. Here, setting R = epsilon' D suffices, and we only need O(1/epsilon^2) cells. Voila ! a core-set.

Of course, we're not going to be this lucky all the time: for example, if we want to compute the width (the smallest distance between parallel lines that bound the point set), then the diameter has no relevance: the width of a point set can be zero even if its diameter is large (if all points are on a line). In that case, how would we choose a grid size that would work for us ?

Approximating all directions:
The answer, as it turns out, is to try and approximate the "diameter" in all directions. More precisely, we can define the extent of a point set along a certain direction as the diameter of the projection along that direction. Having done so, the property we desire of our "grid" is that it approximates this extent function along each direction. For many interesting optimization problems, the extents suffice to compute the optimal solution, and so approximating the extents suffices to approximate the solution.

The easiest way of approximating extents would be to compute the convex hull of the point set. This might not be a sparse representation in general. A better answer is to place a grid, and pick out all non-empty grid cells along the boundary (actually, even picking the extreme cells along each vertical direction suffices).

Notice that by concerning ourselves only with extents, we have essentially eliminated the problem of large diameters (because we can always scale the point set). What remains is the problem of variation in the extents. If a point set has a very small extent along a direction (say the direction that defines its width), then in order to capture that extent correctly, we'd need a very fine grid resolution.

Exploiting fatness:
The problem here appears to arise from the fact that a point set can be skinny. The notion of fatness in computational geometry is a very powerful one: there are many ways of defining it, but in all cases, the measure captures the degree to which a convex shape is "rounded" versus "skinny" (one definition is the ratio between the minimum enclosing ball and the largest inscribed ball). If a point set is alpha-fat, then upto a (constant) factor alpha, it's round, and therefore has comparable extents in all directions.

Performing an affine transform can make a point set alpha-fat (for any constant alpha). Although an affine transform will play havoc with distances, it preserves ratios: in particular, the ratio between exact and approximate extents.

And with this, we are done. We transform the point set so that it's alpha-fat, and choose a grid size that's a function of alpha and epsilon (after scaling the point set so that the diameter is 1). Then we snap all points to grid cells as before, and pick up the highest and lowest grid cells in each vertical column. What we get is a point set that captures the important characteristics of the input, but whose size depends solely on the error parameters: in other words, a core-set.

Extensions:
What makes this idea particularly powerful is that it can be extended. For example, we can construct core sets for the "extents" of a set of linear functions by going to the dual space. Similarly, we can construct core sets for collections of polynomials by using the linearization trick and then going to the dual space (in higher dimensions). But ultimately, it boils down to constructing a clever grid.

Note: most of this exposition comes from a reading of Geometric Approximation via Coresets, by Pankaj Agarwal, Sariel Har-Peled and Kasturi Varadarajan.

Thursday, January 10, 2008

Happy Birthday, Don Knuth !

Today is Don Knuth's birthday, and by way of tribute, there are posts from David, Luca and Scott that commemorate him. Among the many obsessions of Knuth is one that went into the new editions of TACP: typesetting the names of cited authors in their native language. This meant a laborious process of converting names into their original scripts and then doing a Unicode translation. I was at Stanford for some of this, and did my (tiny) share of translations in Hindi (and even once in Tamil!).

But what I wanted to highlight is a very innocuous result from Vol 2 of TACP that's both well known, elementary to prove, and provides a key tool for much of the work on streaming that came much later.
Can we pick a random element from a set of items whose cardinality we do not know ?
In this setting, we can examine an item and decide whether to retain it or discard it, but at the end of the process, we must hold in our hands an element chosen uniformly at random.

This is the kind of problem you might encounter on a Google/Microsoft interview: it's simple to state, and so is the answer, but it requires at least some "lateral" thinking. Without further ado, here's the answer (so stop right here if you want to puzzle it out yourself)

Pick an item and store it. Pick the next one, and replace the first one with it with probability 1/2. Pick the third one, and do a replace with probability 1/3, and so on. At the end, the item you've stored has a probability of 1/n of being any particular element.
This is of course a streaming algorithm. You can imagine the items arriving one by one, and using a local counter and a single word of storage, you can execute the algorithm. First of all, this can be generalized to maintaining a random sample of size k: in general, the technique is called reservoir sampling, and has spawned many generalizations.

There are at least three different ways of arguing the correctness of this procedure - another reason why I find it so elegant.

1. Suppose the sampling procedure is correct for the first k items. Now when item k+1 appears, it becomes the new sample with probability 1/(k+1). Every prior item currently has a probability of 1/k of being picked, and therefore remains the chosen sample item with probability k/(k+1) * 1/k = 1/(k+1). Hence, by induction, the procedure works

Like most induction proofs, this argument leaves something wanting. Here's a less rigorous, but in my mind more appealing argument:

2. Suppose we pick the kth item with probability p_k. Let an adversary suddenly halt the stream at this step. If p_k is not equal to 1/k, then the last item is not picked with a uniform probability over the current stream. So in some sense, the algorithm has no choice, since it doesn't know when the stream ends.

3. A more muscular proof of the "let's roll up our sleeves and crank things out" variety, goes as follows: Let p_i be the probability that the i^th item is picked for the sample, replacing the current sample element. The probability that i survives is



Since these numbers must be equal for all i, we can equate any two of these, in particular the terms for i and i+1, which yields the expression



Some simple telescoping later, once we observe that p_1 = 1, gives the desired probabilities.

CS researchers and the practitioners of computer science tend to view each other warily from a distance, each side convinced that the other has absolutely no idea of what "real CS" is about. Don Knuth straddled both worlds effortlessly, gaining respect from 15 year old hackers and 50 year old researchers alike. And that's the most tremendous feat of all.

Update: More tributes, by Bill Gasarch and the Quantum Pontiff.

Monday, January 07, 2008

Levi Conant Prize for Hoory, Linial and Wigderson

The joint meetings of the AMS and MAA are going on right now, and among the prizes being awarded is the Levi L. Conant Prize for the best expository article in the Notices or the Bulletin of the AMS in the last 5 years. One of the two awardees is the group consisting of Shlomo Hoory, Nati Linial and Avi Wigderson, for their article 'Expander graphs and their applications' in the Bulletin of the AMS (here's the related course).

Congratulations ! On a side note, I'm truly impressed by the number of prizes awarded for articles that are expository or targeted at the general public. I don't think that the mere creation of prizes encourages people to write such articles; rather, the existence of such prizes is a post facto demonstration of the intrinsic value attributed to exposition in the mathematics community.

Thursday, January 03, 2008

A call for information and help

In response to Muthu's query about invited speakers at conferences, Iftah Gamzu at Tel Aviv University set up a very nice page containing all the relevant information. He also has the next generation of the 'Accepted papers at theory conferences' page originally maintained by Anupam Gupta, as well as a page with upcoming dates (conference/submission) for most theory conferences.

Iftah sent me a request: he's missing information on speakers/talks for many conference-year combinations, and was hoping for some help from the community. Here's what's missing:
  • Invited talks at
    • STOC 2000.
    • RANDOM-APPROX 2003 and 2005.
    • SoCG 2001 and 2004.
    • IPCO 2007 summer school.
  • Titles of the talks
    • CCC 2001.
    • Gene Myers, Christos Papadimitriou, and Ehud Shapiro at ESA APPROX 2002.
    • Marino Zerial at ESA 2005.
    • Leslie Lamport, and Butler Lampson at PODC 2003.
    • Elias Koutsoupias at SPAA PODC 2005.
    • Robin Thomas at STACS 2004.
    • Daniel Lewin at SPAA 2000.
    • Andrew Chien at SPAA 2002.
    • Adi Shamir at ICALP 2005.
  • Name of the speaker of
    • "Discrete Tomography: Reconstruction under Periodicity Constraints" at ICALP 2002.
    • "Program Debugging and Validation Using Semantic Approximations and Partial Specifications" at ICALP 2002.
  • Can someone confirm that there were no invited talks at
    • FOCS 2004, CCC 2004, and SPAA 2004?
If anyone reading this has the relevant information, they can either post it in the comments, or email him directly

Monday, December 31, 2007

Comte de Buffon, and a happy new year !

My personal resolution for the new year is that I will actually try to write down a larger fraction of the blog posts that are floating around in my head, especially the many that were born while I was sitting in my algorithms seminar.

The NYT has a travel feature on visiting the birthplace of Comte de Buffon, a contemporary and competitor to Carolus Linnaeus, who by this account played Hooke to Linnaeus' Newton. Of course, I (and probably most of my readers) know Buffon not for his work in biology and the natural world, but for the famous problem that bears his name and by some accounts gave rise to the area of geometric probability.

The Buffon needle problem is this: Suppose we drop a needle of length l onto a grid with lines spaced d units apart. What is the probability that the needle touches a grid line ? (let's assume for now that l <> d, the expression is more involved).

The Buffon's needle problem is a textbook example of the area of geometric probability: doing probability theory on entities defined geometrically. This is a harder thing than one might imagine: as Bertrand demonstrated much later on, the very notion of a uniform random sample can be called into question. He showed this by posing a simple question:

Draw an equilateral triangle and circumscribe a circle around it. Pick a chord at random. What is the probability that the chord length is longer than the side of the triangle ?

It turned out that the "Pick a chord at random" step leads to all kinds of problems. Bertrand presented three different equally reasonable ways of picking a chord, and showed that each of the three ways led to a different probability. At the time, this was viewed as a paradox: it seemed strange that what appeared to be a well defined question would lead to three different answers. Of course, the modern view is that there is no paradox, merely a different choice of measure space to work with in each case.

But if that's the case, is there a "natural" choice of measure ? One elegant way of thinking about this was to think of this as a physical process that must obey certain transformation invariants. This leads inexorably to notions like the Haar measure: can we assign a measure to subsets of a group that is invariant over left (or right) group actions ?

The Haar measure has practical utility for anyone working in geometry. Consider the problem of choosing a random rotation (this comes up in when doing Johnson-Lindenstrauss embeddings). In 2 dimensions, this is easy, since the space of rotations is the unit circle. However, in three dimensions or higher, it gets a bit trickier to define the correct measure to sample with respect to. Since rotations form a group, we really need a measure that is invariant under the group action, and a (unique upto scaling) Haar measure is guaranteed to exist in this case, yielding the desired measure to sample from (it's actually fairly easy to generate a random rotation algorithmically: the easiest is to populate a matrix with independent normally distributed random variables, and then orthogonalize it).

But it all goes back to a needle, a grid, and a frustrated naturalist. Happy new year !

Wednesday, December 26, 2007

DARPA Mathematical challenges

Via Ars Mathematica and Not Even Wrong, a DARPA-issued list of challenges in mathematics for the 21st century. Some that puzzled me:
  • Computational Duality: Duality in mathematics has been a profound tool for theoretical understanding. Can it be extended to develop principled computational techniques where duality and geometry are the basis for novel algorithms?

    I'm not sure what this is trying to say, or whether I'm reading it wrong, because the story of linear programming, primal dual schemes, and the Lagrangian is the story of using "duality and geometry as the basis for novel algorithms"
  • What are the Physical Consequences of Perelman’s Proof of Thurston’s Geometrization Theorem?
    Can profound theoretical advances in understanding three-dimensions be applied to construct and manipulate structures across scales to fabricate novel materials?

    Thurston's geometrization conjecture talks about the structure of 3-manifolds: i.e 3 dimensional surfaces living in higher dimensional spaces ? What kinds of materials could be fabricated using this ?
  • Computation at Scale: How can we develop asymptotics for a world with massively many degrees of freedom?

    This sounds like there's a nice computational question lurking somewhere, but I'm not quite sure where.
Nice to see algorithmic origami and self-assembly also mentioned. I am particularly intrigued by the reference to the geometry of genome spaces.

Tuesday, December 11, 2007

On the dominance of the Ivies..

Via Cosmic Variance:

Exhibit A:

“One thing we all must worry about — I certainly do — is the federal support for scientific research. And are we all going to be chasing increasingly scarce dollars?” says Drew Gilpin Faust, Harvard’s new president.

Not that Faust seems worried about Harvard or other top-tier research schools. “They’re going to be—we hope, we trust, we assume—the survivors in this race,” she says. As for the many lesser universities likely to lose market share, she adds, they would be wise “to really emphasize social science or humanities and have science endeavors that are not as ambitious” as those of Harvard and its peers.

Exhibit B:
Mario Capecchi: 2007 Nobel Laureate, Ph.D Harvard (1967). Associate Prof. Biochemistry (Harvard) (1969-1971)
Moved to U. Utah (1973).

Through a series of bold experiments begun in the 1980s, Capecchi demonstrated that he could alter any gene in a mouse cell by replacing it with a modified version. At the time, scientists were skeptical that such altered DNA could be targeted to a particular gene. But Capecchi was not to be deterred. Indeed, his studies demonstrated that it is possible to replace an intact, functional gene with a modified version that can zero in on the corresponding DNA in the chromosome.
And finally:

Some 50 universities are located in the Boston area. Rather than collaboration, Capecchi felt that the thousands of researchers were working in isolation on projects that promised "immediate gratification." As he explained, "Everyone is so aware of what everyone else is doing. 'What's new?' was asked every day. That limits you to short-term returns, posing questions that you know can be answered in six months."

In contrast, the University of Utah in Salt Lake City offered "a relaxed atmosphere, where you could work on projects whose outcome may take 10 years. The relative isolation tends to make you more focused on the biological question you're working on.

"It was a good choice," said Capecchi of his decision to relocate to the U of U in 1973.

Friday, December 07, 2007

VLDB dogfight, and peer review

I guess this had to happen sooner or later, given how easy it is to publish a critique. I'm only surprised that more people aren't doing it.

Via Daniel Lemire comes a link to a page by John Wu where he rakes a VLDB 2006 paper over the coals and doing so, mounts a serious criticism of the review process that led to the paper being accepted. I don't have a dog in this fight, and know next to nothing about the area, but the criticisms are prima facie plausible, even if mitigated by the fact that he's the "injured party" in this argument. One idea that he proposes, and that Daniel endorses, is this:
Our first recommendation is for the program committee to assign one member to be responsible for a paper. To encourage committee members to take this responsibility seriously, the accepted paper should be printed with the name of the responsible person. Many journal articles are printed with a footnote to the effect of “Accepted on the recommendation of So-and-So.” Similar footnote could be printed on the final copy of the conference proceedings.
It's an interesting idea. I've heard this being proposed before, and apart from the fact that papers can often slip in without strong champions, I don't see too much that's wrong with it.

Neighborly polytopes and compressed sensing

I ran a seminar on approximate high dimensional geometry this semester, and there are many posts queued up based on interesting observations from the papers that were presented. This post is about polytopes, compressed sensing, and lower bounds for convex hulls.

Compressed sensing is a fascinating topic that (depending on how you look at it) is influenced by approximation theory, sketching techniques, compression, and geometry. Posts by Terry Tao and Igor Carron are good places to learn more about this, and the paper repository at Rice is another great resource. What I wanted to talk about in this post however, is an interesting result by Donoho and Tanner that connects back to structures that geometers are intimately familiar with, as well as helping me understand the nature of convex hull lower bounds.

Let's briefly sketch the motivating problem. Suppose you're given a signal s that you want to express in terms of linear combinations of basis functions drawn from a dictionary. A key idea in compressed sensing is that the dictionary itself is redundant, in that there are multiple ways of representing a signal via linear combinations of basis functions. A useful analogy is that of barycentric coordinates. We can represent a point in the plane uniquely via a linear combination of three non-degenerate points (i.e a simplex), but this stops being true if we have more than three points that form the "basis".

Given that there are multiple representations, which is the best one ? The next important idea is that of sparsity: namely, that the number of coefficients needed is very small compared to the size of the dictionary. Again, returning to our analogy, it would be as if the dictionary had 100 points in it, and all we needed to determine was a representation using a linear combination of three points.

So one rendition of the compressed sensing problem is: Suppose there's a signal that you interact with via a 'measurement matrix' (which means you supply a vector of coefficients, and get the value of the inner product of the signal coefficients with the measurement coefficients; each row of the matrix is one such measurement). You retain the output values y; formally, you can write this as y = Ax, where x is the unknown signal vector. Can you find an x with the fewest number of nonzero coefficients (i.e can you minimize the l0 norm) that satisfies this equation ?

This problem is NP-hard; intuitively, you can encode SUBSET SUM in this problem, by setting up an appropriate y vector. Suppose however that we relax the optimization, and rather than asking to minimize the l0 norm, we ask to minimize the l_1 norm of the target vector x ? It turns out that this problem can be solved via linear programming.

Under what conditions then can the l_1 solution yield the l_0 solution ? Suppose that we could reconstruct the vector x uniquely from y. In that case, it doesn't really matter what cost function we use: if there's only one solution, then we'll find it either way.

This is where neighborly polytopes enter the picture. A polytope in R^d is said to be m-neighborly if any m vertices lie on a face of the polytope. Note that the simplex is always d-neighborly: essentially this definition tries to capture the idea of a "simplicial core" of a polytope: crudely put, an m-neighborly polytope has an m-simplex encoded within it somewhere.

If we think of the measurement y as being a point reconstructed from m coefficients, then if y lies on the m-simplex, there are unique coefficients we can assign to the vertices of the simplex to reconstruct it. Donoho and Tanner showed that this is essentially sufficient: namely that the polytope AT^d (where T^d is the d-simplex) is m-neighborly if and only if there's a unique reconstruction of y (implying that the l_1 optimization yields the desired sparse vector).

This is quite an elegant result in its own right: further, they generalize this to a "robust" variant, where the polytope might be approximately m-neighborly, in which case they show that with high probability a unique solution can be reconstructed (this latter result resembles the Johnson-Lindenstrauss theorem). What's interesting about these results is that they connect a very old concept in geometry (neighborly polytopes) to a reconstruction algorithm in compressed sensing.

A side effect of understanding neighborly polytopes from this perspective is that I finally saw why cyclic polytopes "work" as a lower bound for estimating the size of a convex hull. We can ask what the largest value of m can be for which some d-dimensional polytope is m-neighborly. It turns out that the answer is md/2, and the example is the cyclic polytope.

Why is this interesting ? If we go back to the intuition of m-neighborly polytopes containing an m-dimensional simplicial core, it becomes clear. All vertices of a simplex lie on the convex hull. If a polytope is m-neighborly, it should come as no surprise that its convex hull has size n^m. We thus recover the lower bound for a convex hull size in d dimensions from this fact.

Thursday, December 06, 2007

Leibniz Prize

Via an anonymous commenter comes the news that Susanne Albers is one of the 2008 Leibniz Prize winners. This is the most prestigious science award in Germany (and that's all of science, not just computer science). Going back over the list of past winners, it seems (and everything's in German, so I could be wrong here) that the last theory prize winner was Gunter Ziegler, back in 2001. Susanne Albers, for those who don't know, works in online algorithms, approximations, and algorithmic game theory.

From the Leibniz Prize page:
The Leibniz Programme, established in 1985, aims to improve the working conditions of outstanding scientists and academics, expand their research opportunities, relieve them of administrative tasks, and help them employ particularly qualified young researchers. Up to ten prizes are awarded annually with a maximum of €2.5 million per award. The prize is awarded from a slate of nominations put forward by third parties. The prizewinner is selected by the Joint Committee on the basis of a recommendation from the Nominations Committee for the Leibniz Programme.
The Leibniz Prize is administered by the DFG, which according to Wikipedia translates to the German Research Foundation, which would make it the equivalent of the NSF, and the Leibniz Prize roughly comparable to the NSF's Waterman Prize.

Update: here's the announcement (in German)

Disqus for The Geomblog