Neal Gafter's blog: March 2007

Thursday, March 29, 2007

Closures for Organizing Your Code

Much of the discussion of Closures in Java has been about they way they affect public APIs. But there is another aspect that is just as important: the way closures affect private APIs between parts of your program. Closures often enable a tremendous simplification of a program design compared to what would be required in their absence. The following describes my implementation of a graph algorithm for computing the Jeffersonians of a graph using algorithm K from Knuth's The Art of Computer Programming, volume 4B, section 7.5.7.

As you may be aware the set of Jeffersonians of a graph is best computed using a complex recursive algorithm. Although recursive algorithms can be translated into algorithms without using recursion (Java without recursion remains Turing-complete), the recursive version of the algorithm is much shorter and easier to understand. We're lucky to be living in an age in which virtually all programming languages support recursion. Though details of the implementation are not important, my implementation went something like this:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(g, result);
    return result;
}

The idea is that the recursive part of the algorithm can pass around the collection into which the result will be placed, and every Jeffersonian that is found will be placed into the collection:

private void findAllJeffersoniansInternal(
        Graph g, Collection<Jeffersonian> result) {
    // complex recursive algorithm here
    Jeffersonian found = ...;
    result.add(found);
    // more complex recursion here
}

One pot of coffee and an all-nighter later I had this working like a charm. The next day my tech lead asked me to add an API element that determines whether or not a graph has a Jeffersonian or not. That was easy:

public boolean hasJeffersonian(Graph g) {
    return findAllJeffersonians(g).size() != 0;
}

This didn't pass code review. The problem is that this new method is to be used in the inner loop of Google's über-secret application that will take over the world. Never mind that. The problem is performance. Determining whether or not a graph has a Jeffersonian can be done in linear time, but enumerating all of them requires quadratic time (or worse). But my implementation does it the hard way. By then it was Friday afternoon and I really wanted to head home for a glass of wine, so I did what any self-loathing software engineer would do: I cut and pasted the complex recursive code in findAllJeffersoniansInternal into hasJeffersonianInternal and added a boolean return value (true when a Jeffersonian was found). Then I added logic to short-circuit the rest of the algorithm once a Jeffersonian had been found at any step. The code was messy but workable, and I had it passing tests in less than an hour. The code duplication left me somewhat uncomfortable, but the two methods were different enough that merging them would have been hard. I considered adding a second flag so I could have one body of code to do both versions, but I decided to leave that refactoring until Monday.

Something very strange happened over the weekend, though. On Monday my pointy-haired boss told me there was both good news and bad news, and asked which I wanted first. Knowing how these jokes work (the second one always trumps the first) I asked for the bad news first. The bad news was that my machine had crashed, losing all of my work from Friday. Including my implementation of hasJeffersonian. The good news was that my machine had been replaced with a brand new one, a fast new 40-core workstation, and it came with JDK7 preinstalled. I had been using JDK6 before, so I was eager to try the new Java language features.

Taking a fresh look at the problem of writing hasJeffersonian, I decided to refactor the original program to pass a closure instead of a collection:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(g, { Jeffersonian j => result.add(j); });
    return result;
}

private void findAllJeffersoniansInternal(
        Graph g, {Jeffersonian => void} foundJeffersonian) {
    // complex recursive algorithm here
    Jeffersonian found = ...;
    foundJeffersonian.invoke(found);
    // more complex recursion here
}

Then I realized I could use the nicer syntax allowed for passing a closure to a method:

public Collection<Jeffersonian> findAllJeffersonians(Graph g) {
    Collection<Jeffersonian> result = new ArrayList<Jeffersonian>();
    findAllJeffersoniansInternal(Jeffersonian j : g) {
        result.add(j);
    }
    return result;
}

Solving the second problem was then trivial:

public boolean hasJeffersonian(Graph g) {
    findAllJeffersoniansInternal(Jeffersonian j : g) {
        return true;
    }
    return false;
}

That was the entire implementation. I had a strange sense of elation, but I couldn't quite tell why. I could no longer remember why the problem was so messy on Friday. This refactoring seemed trivial, and this code was so clear. What made it so hard before?

Then I woke up. It's 2007, not 2009. JDK7 is barely a gleam in the eye of Sun. My machine is only dual-core. Consensus on closures is elusive. As far as I can tell, there isn't any such thing as a graph's Jeffersonian, or a Google plan to take over the world. It's Monday morning, and I have to figure out how to merge two almost-copies of a big recursive algorithm.

But on the bright side, my boss is a really nice guy.

Friday, March 16, 2007

A Compact Object Comparator

Every now and then a problem arises where the right solution would be to impose an arbitrary total ordering on a collection of objects. The simplest example of this is when you need to sychronize on more than one object, all at the same time, to maintain some consistency condition across those objects. Using Closures, you might invoke a utility method like this:

Locks.withLocks(lock1, lock2) {
    // code protected by both locks
}

To avoid deadlock, every piece of code that locks the same set of locks should do so in the same order. Rather than forcing all callers of the withLocks method to worry about getting them in the right order, the implementation of withLocks can sort the incoming locks. Then the caller can just pass the locks in arbitrary order, knowing that they will be locked "in the right order". It doesn't actually matter what order we sort them in, as long as we always get the same order for the same objects. The implementation of withLocks can use Collections.sort to sort the incoming locks, but java.util.concurrent.locks.Lock is not naturally comparable, so we need to pass an appropriate comparator to sort. We need a java.util.Comparator<Lock>, but a java.util.Comparator<Object> would work just as well. Let's specify, and then implement, a suitable comparator. Here is what we need:

/**
 * Returns a comparator that imposes a complete order on all objects.
 * Each invocation of this method may yield a distinct comparator,
 * or may yield the same comparator.
 */
public Comparator<Object> totalOrder() { ... }

How are we going to do this? One idea is to create an assignment of long values to each object, as needed. That would look something like this:

public Comparator<Object> totalOrder() { return new TotalOrder(); }
private class TotalOrder implements Comparator<Object> {
    long nextNonce = 1;
    Map<Object,Long> codes = new IdentityHashMap<Object,Long>();
    public int compare(Object o1, Object o2) {
        Long l1 = getNonce(o1);
        Long l2 = getNonce(o2);
        return l1.compareTo(l2);
    }
    synchronized Long getNonce(Object o) {
        Long nonce = codes.get(o);
        if (nonce == null) {
            nonce = nextNonce++;
            codes.put(o, nonce);
        }
        return nonce;
    }
}

There are two major problems with this approach. First, it causes object retention. Objects whose space would otherwise be recovered by the garbage collector are retained because they are reachable as keys in the codes map. We can't fix this by simply using a WeakHashMap; without the identity semantics of IdentityHashMap the technique doesn't work. We really need WeakIdentityHashMap for this, but no such class exists in the JDK yet. Fortunately, "crazy" Bob Lee has come to the rescue with an implementation of this concept inside the recently open-sourced Guice dependency injection framework. I think this belongs in the JDK, and now is the time to propose it for JDK7.

The other problem with this implementation is that this utility takes up too much space. In general, every time you call the compare method one or two objects might be created and added to the map.

Another idea for implementing this utility is to sort the objects based on their identity hash code. Identity hash codes are well distributed, almost like random numbers. That is naturally thread-safe, and would look something like this:

private class TotalOrder implements Comparator<Object> {
    public int compare(Object o1, Object o2) {
        if (o1==o2) return 0;
        int i1 = System.identityHashCode(o1);
        int i2 = System.identityHashCode(o2);
        return (i1<i2) ? -1 : (i1==i2) ? 0 : 1;
    }
}

This is much more compact than the previous approach. But because identity has codes are not guaranteed to be unique, it occasionally treats two distinct objects as equal.

We can get the best of both worlds - a space-efficient comparator and a complete order - by combining the two approaches:

private class TotalOrder implements Comparator<Object> {
    long nextNonce = 1;
    Map<Object,Long> codes = new IdentityHashMap<Object,Long>();
    synchronized Long getNonce(Object o) {
        Long nonce = codes.get(o);
        if (nonce == null) {
            nonce = nextNonce++;
            codes.put(o, nonce);
        }
        return nonce;
    }
    public int compare(Object o1, Object o2) {
        if (o1==o2) return 0;
        int i1 = System.identityHashCode(o1);
        int i2 = System.identityHashCode(o2);
        if (i1 != i2) return (i1<i2) ? -1 : 1;
        Long l1 = getNonce(o1);
        Long l2 = getNonce(o2);
        return l1.compareTo(l2);
    }
}

By the way, if you haven't already checked it out, see "crazy" Bob Lee's Guice dependency injection framework. We use it extensively at Google. By really taking advantage of recent language features such as generics and annotations, the Guice framework is very flexible and yet much simpler than existing frameworks. Throw away your XML and write your Java code in Java!

thanks to "crazy" Bob Lee for contributing the Guice framework, and for reviewing this essay.

Thursday, March 08, 2007

On The Expressive Power of Programming Languages

There are at least three separate proposals recently put forward in the space of "Closures for Java." Among the criteria for evaluating the proposals, I'd like to discuss two: conciseness (possibly phrased as convenience) of code using the construct, and expressiveness. Conciseness is pretty obvious, and you can compare the proposals on this measure by writing snippets of code that do basically the same thing as each other, but written using existing constructs and then each of the proposed constructs. How many characters, or tokens, does it take to write the code? By the measure of conciseness, shorter is better.

Unfortunately, the authors of the various proposals don't appear to be using a common meaning for "expressiveness" or "expressive power." Consequently, we often end up talking at cross-purposes when comparing the proposals. Some people appear to treat "expressiveness" and "conciseness" as synonyms, but to me these have completely different meanings. Expressiveness is a bit harder to measure, but in some ways more important at this stage of the discussion. See Matthias Felleisen's, On the Expressive Power of Programming Languages, 3rd European Symposium on Programming, Copenhagen, Denmark, 1990, http://citeseer.ist.psu.edu/felleisen90expressive.html, for one attempt to formally capture the meaning of expressiveness.

In my mind, a language construct is expressive if it enables you to write (and use) an API that can't be written (and used) without the construct. In the context of the Closures for Java proposed language extension, control abstraction APIs are the kind of thing that don't seem to be supported by the competing proposals. You don't see the proposals compared side-by-side on this measure because this is something only supported by one proposal. Programmers who have become accustomed to programming with closures find them very useful for factoring out common code in ways that are not currently possible in Java. See, for example, http://www.joelonsoftware.com/items/2006/08/01.html, http://ivan.truemesh.com/archives/000637.html, http://www.talios.com/dear_java_i_need_closure.htm, and http://blog.moertel.com/articles/2005/08/30/closures-and-the-professional-programmer. These kinds of uses might not occur to you if you're mainly a Java programmer, because Java doesn't reward you for thinking this way. But this is another example of expressive power.

I'm not particularly attached to one syntax or another for closures. I don't mean to say that syntax isn't important. Anyone who knows the story of variance and wildcards knows how much I value a good surface syntax. Our proposal describes a particular syntax not because we believe it is the best possible syntax, but because it is hard to write a specific proposal without some syntax. Ultimately, I hope the closures issue becomes a JSR and the expert group takes its time to decide what surface syntax is best. But I believe that the expressiveness of the Closures for Java proposal is the most important reason to consider doing anything in this space at all. If it is just a matter of a slightly more concise syntax, I'm not sure it is worth the trouble.

Monday, March 05, 2007

Java Closures versus MouseListener

The Closures for Java proposal simplifies the code for many purposes where anonymous class instance creation expressions are currently used. When the anonymous class's supertype is an interface with a single abstract method, a closure can be used directly. But if the supertype is a class, like java.util.TimerTask, or has more than one method, like java.awt.event.MouseListener, then you can't use a closure directly. You can still use an anonymous inner class directly, as always, but there are ways of using closures that may be more convenient. For TimerTask, a client can be written this way

void printHelloAfterDelay(java.util.Timer timer, long delay) {
    timer.schedule(TimerTask.of({ => System.out.println("Hello"); }), delay);
}

if we add the following utility method to TimerTask:

public TimerTask of(final Runnable block) {
    class ClosureTimerTask extends TimerTask {
        public void run() { block.run(); }
    }
    return new ClosureTimerTask();
}

Similarly, Peter von der Ahé showed me how to use the builder pattern, along with closures, to simplify handling mouse events:

void addSomeActions(java.awt.Component foo) {
    foo.addMouseListener(new MouseListenerBuilder() 
        .setMouseClicked({ MouseEvent e => System.out.println("Mouse clicked"); })
        .setMouseReleased({ MouseEvent e => System.out.println("Mouse released"); })
        .setMouseEntered({ MouseEvent e => System.out.println("Mouse entered " + e.getComponent()); }));
}

The implementation of MouseListenerBuilder is left as an exercise to the reader.

Neal Gafter's blog