Friday, September 16, 2011

Sampling a discrete distribution

The following cute question came up at lunch today (all credit goes to Aaron Archer).


Informal challenge: You are given a discrete distribution with n possible outputs (and you know the probability pi of each output), and you want to sample from the distribution. You can use a primitive that returns a random real number in [0,1].

This was the way the problem came to me, and the goal was to find a very fast practical solution. Think of n as around 106 and use a regular machine as your machine model (real number means a "double" or something like that).


Formal challenge: To formalize this completely for a bounded-precision computer, we can say that we work on a w-bit machine and probabilities have w-bit precision. The distribution is given by n integers xi, and these values sum to exactly 2w. Output "i" should be produced with probability exactly xi / 2w. You are given a primitive that produces a uniformly distributed w-bit number.

The challenge is to get O(1) worst-case time. (NB: some fancy data structures required.)

Thursday, July 28, 2011

International Olympiad, Day 2

I suspect I got a lot of bad karma this day (I'm the evil mind behind "Crocodile" and "Elephant".) Congratulations to the winners! See here for final results.

Problem Crocodile. You have a weighted undirected graph with a start node and k designated target nodes. Two players play a full-information game, taking turns:
  1. Player 1 moves on the nodes of the graph, starting at the designated start node, and aiming to reach a target node. In each turn, she can traverse any edge out of her current node [except the blocked edge / see below], incurring a cost equal to the weight of the edge.

  2. Player 2 can block one edge at every turn. When an edge is blocked, Player 1 cannot traverse it in the next turn. An edge can be blocked multiple times, and any past edge is "unblocked" when a new edge is blocked.
Find the minimum budget B such that Player 1 can reach a target node with cost ≤ B, regardless of what Play 2 does. Running time: O~(n+m).

Problem Elephants. You have n elephants on the line at given coordinates. The elephants make n moves: in each unit of time, some elephant moves to another position. You have cameras that can photograph any interval of length L of the line. After each move, output the minimum number of cameras needed to phtograph all elephants in the current configuration.

Running time: subquadratic in n.

Problem Parrots. You are given a message of M bytes (0..255) to transmit over a weird channel. The channel can carry elements between 1 and R, but elements don't arrive in order (they are permuted adversarially). Send the message, minimizing the number L of elements sent across the channel.

Running time: polynomial. (Yes, this is an easy problem for a trained theorist.)

Monday, July 25, 2011

International Olympiad, Day 1

The problems for Day 1 of the IOI have been posted (in many languages). Exciting! Here are the executive summaries. (Though I am on the International Scientific Committee (ISC), I do not know all the problems, so my appologies if I'm misrepresenting some of them. In particular, the running times may easily be wrong, since I'm trying to solve the problems as I write this post.)


The usual rules apply: if you're the first to post a solution in the comments, you get eternal glory. (If you post annonymously, claim your glory some time during the next eternity.)

Problem Fountain. Consider an undirected graph with costs (all costs are distinct) and the following traversal procedure. If we arrived at node v on edge e, continue on the cheapest edge different from e. If the node has degree 1, go back on e. A walk following these rules is called a "valid walk."

You also have a prescribed target node, and an integer k. Count the number of valid walks of exactly k edges which end at the prescribed target node. Running time O~(n+m) ["O~" means "drop logs"]

Problem Race. You have a tree with integer weights on the edges. Find a path with as few edges as possible and total length exaclty X. Running time O(nlg n) [log^2 may be easier.]

Problem RiceHub. N rice fields are placed at integer coordinates on the line. The rice must be gathered at some hub, which must also be at an integer coordinate on the line (to be determined). Each field produces one truck-load of rice, and driving the truck a distance d costs exactly d. You have a hard budget constraint of B. Find the best location for the hub, maximizing the amount of rice that can be gathered in the budget constraint.

Running time: O~(N).

Friday, July 22, 2011

Balkan Olympiad (Day 2)

Now is time for the problems on day 2 (of 2). See here for day 1. Feel free to post solutions in the comments (you get eternal glory if you're the first to post one) and discuss anything related to the problems.


Day2–Problem TimeIsMoney. You have a graph with two costs (A and B) on each edge. Find a spanning tree that minimizes (its total A-cost) * (its total B-cost) [that's "times" in the middle]. Desired running time: polynomial. Sharper bounds are possible (I think I get O(n2m log)) but this is hard enough. To be entirely fair, the contestants just need to find an algorithm, not to prove it runs in poly-time, which may be easier (but I'm writing for a theory audience so consider yourself challenged).

Day2–Problem Trapezoid. Consider two horizontal lines, and n trapezoids stretching from one line to the other. The proverbial picture worth 1000 words:
Find the maximal set of nonintersecting trapezoids. Running time: O(nlg n).

Day 2–Problem Compare. Alice gets a number a, and Bob gets a number b. Both are integers in {0, ..., 4095}. Bob's goal is to compare b to a (and output "greater than", "less than" or "equal"). Charlie is helping them. Alice can send Charlie a set A ⊂ {1, ..., 10240} (intuitively, think of Alice marking some bits in an array of 10Kbits). Bob can ask Charlie whether some x is in A or not (think of Bob as querying some bits of the bit vector). The goal is to minimize (for worst-case a and b)
T=|A| + the number of queries made by Bob
Desired solution: We know how to get T=9. In the Olympiad, T=10 was enough for full score, but we left the problem open-ended, so T=9 would've earned you 109 points.

Somewhat unusually for computer olympiads, in this problem we could test contestant solutions exhaustively: we ran their code for all 40962 choices for (a,b) and really computed the worst-case T.

Tuesday, July 12, 2011

Balkan Olympiad

The Balkan Olympiad is over, and I am slowly recovering from the experience of chairing the Scientific Commitee. For your enjoyment, I will post summaries of the competition problems. Perhaps this post should be titled "Are you smarter than a high school student?"


Day 1 – Problem 2Circles. You are given a convex polygon. Find the largest R such that two circles of radius R can be placed inside the polygon without overlapping. Target running time: O~(N). I think this problem illustrates the rather significant difference between coming up with an algorithm on paper and actually implementing it (only 4 contestants scored nonzero; the committee says ``Sorry!'').

Day 1– Problem Decryption. We define the following pseudorandom number generator:
  • initialize R[0], R[1], R[2] with random values in {0,..., 255}
  • let R[n] = R[n-3] XOR R[n-2].
We also define the following cypher:
  • let π be a random permutation of {0,...,255}
  • to encrypt the i-th byte of the message, output π(Message[i] XOR R[i])
You have to implement a chosen plaintext attack. You get a device implementing this cypher. You may feed it a message of at most 320 bytes (you give the device one byte at a time, and observe the encyption immediately). Your goal is to recover the secret keys (R[0..2], and π).

Day 1–Problem Medians. Given an array A[1..n] of integers, we define the prefix medians M[0..(n-1)/2] as M[k] = median(A[1..2k+1]).

The problem is: given M, recover A. O~(n) running time.


PS: This was by far the best prepared Committee that I've been on, and I think we conclusively proved that, no matter how much work you do before the Olympiad, there's still a lot left to do during the Olympiad itself. Many thanks to everybody who volunteered their time. I wish I could do more to express my gratitude, but for lack of other ideas, it is my pleasure to acknowledge the committee here:
  • Marius Andrei - Senior Software Engineer, Google Inc.
  • Radu Ștefan - Researcher, Technische Universiteit Eindhoven
  • Dana Lica - "I.L. Caragiale" High School, Ploiești
  • Cosmin Negrușeri - Senior Software Engineer, Google Inc.
  • Mugurel Ionuț Andreica - PhD, Assist., Polytechnic University, Bucharest
  • Cosmin Tutunaru- Student, Babes-Bolyai University, Cluj-Napoca
  • Mihai Damaschin - Student, Bucharest University
  • Gheorghe Cosmin - Student, MIT
  • Bogdan Tătăroiu - Student, Cambridge University,
  • Filip Buruiană - Student, Polytechnic University, Bucharest
  • Robert Hasna - Student, Bucharest University
  • Marius Dragus - Student, Colgate University NY
  • Aleksandar and Andreja Ilic -- University of Nis, Serbia; external problem submitters, who were not intimidated by the fact that I only posted the problem call in Romanian :)

Friday, April 22, 2011

Please submit...

  1. Papers to MASSIVE 2011. This will be a SoCG satellite workshop (think Paris in June...). Publication here does not hinder publication anywhere else.

  2. Algorithmic problems for the Balkan Olympiad (BOI 2011). I am chairing the Scientific Committee, and we need your help in putting together an interesting set of problems. If your problem is selected for the competition, you will be invited to be a member of the Scientific Committee, and you get a free trip to the Olympiad if you accept (think Bistrița in July...). Anybody can submit problems, but I think the free trip only applies if you're starting somewhere in Romania and you are a Romanian/EU(?) citizen. Hence the call for problems is in Romanian; sorry.

  3. Comments to other blogs. This blog is now on full moderation after I had to delete a batch of comments that did not score particularly high on the sanity scale. Don't worry, the main point of this is not censorship; nasty but coherent comments will continue to be accepted.

Friday, November 19, 2010

Complexity Theory

If you're a student thinking of going into complexity theory, take a good look around and ask yourself: "Do I really want to be in a field with this amount of groupthink?" [1,2,3,4,5, and last but certainly not least 6]


Back in the day, I was myself saved from a career in complexity by Omer Reingold's logspace connectivity.