Friday, April 27, 2007

A Consensus Closures JSR Proposal

I had set aside work on the closures prototype for a couple of months to write a JSR proposal that represents a consensus among the folks thinking about this area. You can find it at http://www.javac.info/consensus-closures-jsr.html. One of the things I learned is that unanimous agreement is rarely possible. There are those who feel that nothing should be done to the Java programming language, and such people will not be swayed by simple but powerful additions. Our latest JSR proposal comes as close to achieving consensus as I believe possible. All but one of the authors of the three widely-discussed closures proposals have agreed to support it.

The purpose of the JSR proposal is to define the problems to be solved and circumscribe the permitted solution space. It doesn't mandate a particular solution, though it does offer the Closures for Java specification as an example of a solution to many (but not all) of the problems. This should not be surprising, as that spec was written specifically in an attempt to satisfy the requirements. Still, the spec is a work in progress.

So what is next? I hope we'll have some active discussion at JavaOne about where to go from here.

Thursday, March 29, 2007

Closures for Organizing Your Code

Much of the discussion of Closures in Java has been about they way they affect public APIs. But there is another aspect that is just as important: the way closures affect private APIs between parts of your program. Closures often enable a tremendous simplification of a program design compared to what would be required in their absence. The following describes my implementation of a graph algorithm for computing the Jeffersonians of a graph using algorithm K from Knuth's The Art of Computer Programming, volume 4B, section 7.5.7.

As you may be aware the set of Jeffersonians of a graph is best computed using a complex recursive algorithm. Although recursive algorithms can be translated into algorithms without using recursion (Java without recursion remains Turing-complete), the recursive version of the algorithm is much shorter and easier to understand. We're lucky to be living in an age in which virtually all programming languages support recursion. Though details of the implementation are not important, my implementation went something like this:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(g, result);
    return result;
}

The idea is that the recursive part of the algorithm can pass around the collection into which the result will be placed, and every Jeffersonian that is found will be placed into the collection:

private void findAllJeffersoniansInternal(
Graph g, Collection<Jeffersonian> result) {
// complex recursive algorithm here
Jeffersonian found = ...;
result.add(found);
// more complex recursion here
}

One pot of coffee and an all-nighter later I had this working like a charm. The next day my tech lead asked me to add an API element that determines whether or not a graph has a Jeffersonian or not. That was easy:

public boolean hasJeffersonian(Graph g) {
    return findAllJeffersonians(g).size() != 0;
}

This didn't pass code review. The problem is that this new method is to be used in the inner loop of Google's über-secret application that will take over the world. Never mind that. The problem is performance. Determining whether or not a graph has a Jeffersonian can be done in linear time, but enumerating all of them requires quadratic time (or worse). But my implementation does it the hard way. By then it was Friday afternoon and I really wanted to head home for a glass of wine, so I did what any self-loathing software engineer would do: I cut and pasted the complex recursive code in findAllJeffersoniansInternal into hasJeffersonianInternal and added a boolean return value (true when a Jeffersonian was found). Then I added logic to short-circuit the rest of the algorithm once a Jeffersonian had been found at any step. The code was messy but workable, and I had it passing tests in less than an hour. The code duplication left me somewhat uncomfortable, but the two methods were different enough that merging them would have been hard. I considered adding a second flag so I could have one body of code to do both versions, but I decided to leave that refactoring until Monday.

Something very strange happened over the weekend, though. On Monday my pointy-haired boss told me there was both good news and bad news, and asked which I wanted first. Knowing how these jokes work (the second one always trumps the first) I asked for the bad news first. The bad news was that my machine had crashed, losing all of my work from Friday. Including my implementation of hasJeffersonian. The good news was that my machine had been replaced with a brand new one, a fast new 40-core workstation, and it came with JDK7 preinstalled. I had been using JDK6 before, so I was eager to try the new Java language features.

Taking a fresh look at the problem of writing hasJeffersonian, I decided to refactor the original program to pass a closure instead of a collection:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(g, { Jeffersonian j => result.add(j); });
    return result;
}

private void findAllJeffersoniansInternal(
Graph g, {Jeffersonian => void} foundJeffersonian) {
// complex recursive algorithm here
Jeffersonian found = ...;
foundJeffersonian.invoke(found);
// more complex recursion here
}

Then I realized I could use the nicer syntax allowed for passing a closure to a method:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(Jeffersonian j : g) {
        result.add(j);
    }
    return result;
}

Solving the second problem was then trivial:

public boolean hasJeffersonian(Graph g) {
    findAllJeffersoniansInternal(Jeffersonian j : g) {
return true;
} return false; }

That was the entire implementation. I had a strange sense of elation, but I couldn't quite tell why. I could no longer remember why the problem was so messy on Friday. This refactoring seemed trivial, and this code was so clear. What made it so hard before?

Then I woke up. It's 2007, not 2009. JDK7 is barely a gleam in the eye of Sun. My machine is only dual-core. Consensus on closures is elusive. As far as I can tell, there isn't any such thing as a graph's Jeffersonian, or a Google plan to take over the world. It's Monday morning, and I have to figure out how to merge two almost-copies of a big recursive algorithm.

But on the bright side, my boss is a really nice guy.