Thursday, March 29, 2007

Closures for Organizing Your Code

Much of the discussion of Closures in Java has been about they way they affect public APIs. But there is another aspect that is just as important: the way closures affect private APIs between parts of your program. Closures often enable a tremendous simplification of a program design compared to what would be required in their absence. The following describes my implementation of a graph algorithm for computing the Jeffersonians of a graph using algorithm K from Knuth's The Art of Computer Programming, volume 4B, section 7.5.7.

As you may be aware the set of Jeffersonians of a graph is best computed using a complex recursive algorithm. Although recursive algorithms can be translated into algorithms without using recursion (Java without recursion remains Turing-complete), the recursive version of the algorithm is much shorter and easier to understand. We're lucky to be living in an age in which virtually all programming languages support recursion. Though details of the implementation are not important, my implementation went something like this:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(g, result);
    return result;
}

The idea is that the recursive part of the algorithm can pass around the collection into which the result will be placed, and every Jeffersonian that is found will be placed into the collection:

private void findAllJeffersoniansInternal(
Graph g, Collection<Jeffersonian> result) {
// complex recursive algorithm here
Jeffersonian found = ...;
result.add(found);
// more complex recursion here
}

One pot of coffee and an all-nighter later I had this working like a charm. The next day my tech lead asked me to add an API element that determines whether or not a graph has a Jeffersonian or not. That was easy:

public boolean hasJeffersonian(Graph g) {
    return findAllJeffersonians(g).size() != 0;
}

This didn't pass code review. The problem is that this new method is to be used in the inner loop of Google's über-secret application that will take over the world. Never mind that. The problem is performance. Determining whether or not a graph has a Jeffersonian can be done in linear time, but enumerating all of them requires quadratic time (or worse). But my implementation does it the hard way. By then it was Friday afternoon and I really wanted to head home for a glass of wine, so I did what any self-loathing software engineer would do: I cut and pasted the complex recursive code in findAllJeffersoniansInternal into hasJeffersonianInternal and added a boolean return value (true when a Jeffersonian was found). Then I added logic to short-circuit the rest of the algorithm once a Jeffersonian had been found at any step. The code was messy but workable, and I had it passing tests in less than an hour. The code duplication left me somewhat uncomfortable, but the two methods were different enough that merging them would have been hard. I considered adding a second flag so I could have one body of code to do both versions, but I decided to leave that refactoring until Monday.

Something very strange happened over the weekend, though. On Monday my pointy-haired boss told me there was both good news and bad news, and asked which I wanted first. Knowing how these jokes work (the second one always trumps the first) I asked for the bad news first. The bad news was that my machine had crashed, losing all of my work from Friday. Including my implementation of hasJeffersonian. The good news was that my machine had been replaced with a brand new one, a fast new 40-core workstation, and it came with JDK7 preinstalled. I had been using JDK6 before, so I was eager to try the new Java language features.

Taking a fresh look at the problem of writing hasJeffersonian, I decided to refactor the original program to pass a closure instead of a collection:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(g, { Jeffersonian j => result.add(j); });
    return result;
}

private void findAllJeffersoniansInternal(
Graph g, {Jeffersonian => void} foundJeffersonian) {
// complex recursive algorithm here
Jeffersonian found = ...;
foundJeffersonian.invoke(found);
// more complex recursion here
}

Then I realized I could use the nicer syntax allowed for passing a closure to a method:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(Jeffersonian j : g) {
        result.add(j);
    }
    return result;
}

Solving the second problem was then trivial:

public boolean hasJeffersonian(Graph g) {
    findAllJeffersoniansInternal(Jeffersonian j : g) {
return true;
} return false; }

That was the entire implementation. I had a strange sense of elation, but I couldn't quite tell why. I could no longer remember why the problem was so messy on Friday. This refactoring seemed trivial, and this code was so clear. What made it so hard before?

Then I woke up. It's 2007, not 2009. JDK7 is barely a gleam in the eye of Sun. My machine is only dual-core. Consensus on closures is elusive. As far as I can tell, there isn't any such thing as a graph's Jeffersonian, or a Google plan to take over the world. It's Monday morning, and I have to figure out how to merge two almost-copies of a big recursive algorithm.

But on the bright side, my boss is a really nice guy.

Friday, March 16, 2007

A Compact Object Comparator

Every now and then a problem arises where the right solution would be to impose an arbitrary total ordering on a collection of objects. The simplest example of this is when you need to sychronize on more than one object, all at the same time, to maintain some consistency condition across those objects. Using Closures, you might invoke a utility method like this:

Locks.withLocks(lock1, lock2) {
    // code protected by both locks
}

To avoid deadlock, every piece of code that locks the same set of locks should do so in the same order. Rather than forcing all callers of the withLocks method to worry about getting them in the right order, the implementation of withLocks can sort the incoming locks. Then the caller can just pass the locks in arbitrary order, knowing that they will be locked "in the right order". It doesn't actually matter what order we sort them in, as long as we always get the same order for the same objects. The implementation of withLocks can use Collections.sort to sort the incoming locks, but java.util.concurrent.locks.Lock is not naturally comparable, so we need to pass an appropriate comparator to sort. We need a java.util.Comparator<Lock>, but a java.util.Comparator<Object> would work just as well. Let's specify, and then implement, a suitable comparator. Here is what we need:

/**
 * Returns a comparator that imposes a complete order on all objects.
 * Each invocation of this method may yield a distinct comparator,
 * or may yield the same comparator.
 */
public Comparator<Object> totalOrder() { ... }

How are we going to do this? One idea is to create an assignment of long values to each object, as needed. That would look something like this:

public Comparator<Object> totalOrder() { return new TotalOrder(); }
private class TotalOrder implements Comparator<Object> {
    long nextNonce = 1;
    Map<Object,Long> codes = new IdentityHashMap<Object,Long>();
    public int compare(Object o1, Object o2) {
        Long l1 = getNonce(o1);
        Long l2 = getNonce(o2);
        return l1.compareTo(l2);
    }
    synchronized Long getNonce(Object o) {
        Long nonce = codes.get(o);
        if (nonce == null) {
            nonce = nextNonce++;
            codes.put(o, nonce);
        }
        return nonce;
    }
}

There are two major problems with this approach. First, it causes object retention. Objects whose space would otherwise be recovered by the garbage collector are retained because they are reachable as keys in the codes map. We can't fix this by simply using a WeakHashMap; without the identity semantics of IdentityHashMap the technique doesn't work. We really need WeakIdentityHashMap for this, but no such class exists in the JDK yet. Fortunately, "crazy" Bob Lee has come to the rescue with an implementation of this concept inside the recently open-sourced Guice dependency injection framework. I think this belongs in the JDK, and now is the time to propose it for JDK7.

The other problem with this implementation is that this utility takes up too much space. In general, every time you call the compare method one or two objects might be created and added to the map.

Another idea for implementing this utility is to sort the objects based on their identity hash code. Identity hash codes are well distributed, almost like random numbers. That is naturally thread-safe, and would look something like this:

private class TotalOrder implements Comparator<Object> {
    public int compare(Object o1, Object o2) {
        if (o1==o2) return 0;
        int i1 = System.identityHashCode(o1);
        int i2 = System.identityHashCode(o2);
        return (i1<i2) ? -1 : (i1==i2) ? 0 : 1;
    }
}

This is much more compact than the previous approach. But because identity has codes are not guaranteed to be unique, it occasionally treats two distinct objects as equal.

We can get the best of both worlds - a space-efficient comparator and a complete order - by combining the two approaches:

private class TotalOrder implements Comparator<Object> {
    long nextNonce = 1;
    Map<Object,Long> codes = new IdentityHashMap<Object,Long>();
    synchronized Long getNonce(Object o) {
        Long nonce = codes.get(o);
        if (nonce == null) {
            nonce = nextNonce++;
            codes.put(o, nonce);
        }
        return nonce;
    }
    public int compare(Object o1, Object o2) {
        if (o1==o2) return 0;
        int i1 = System.identityHashCode(o1);
        int i2 = System.identityHashCode(o2);
        if (i1 != i2) return (i1<i2) ? -1 : 1;
        Long l1 = getNonce(o1);
        Long l2 = getNonce(o2);
        return l1.compareTo(l2);
    }
}

By the way, if you haven't already checked it out, see "crazy" Bob Lee's Guice dependency injection framework. We use it extensively at Google. By really taking advantage of recent language features such as generics and annotations, the Guice framework is very flexible and yet much simpler than existing frameworks. Throw away your XML and write your Java code in Java!

thanks to "crazy" Bob Lee for contributing the Guice framework, and for reviewing this essay.