Sunday, October 28, 2007

Java Closures: first prototype

I've finally had some time to make progress on a prototype of closures. If you want to see what an API looks like, you can compare Doug Lea's jsr166y fork-join framework to the same API ported to use the language features of the prototype.

If you want to try it, you can download an executable version of the prototype here. Make sure a JDK6 version of java and javac are on your path. This is binary-licensed under the JRL, but if a JSR is created I expect to license it under GPLv2. There are a few small test cases included.

This prototype supports

  • the BGGA function type syntax
  • closure literals
  • the closure conversion
  • the null type, disjunctive types, and exception transparency
  • definite assignment
  • Unreachable and completion transparency.
  • Catching multiple exceptions at once like catch(X1|X2 ex) { ...
    [not closely related to closures but the implementation was simple once closures are there]

This prototype does not yet support

  • a closure using a mutated variable from the enclosing scope
  • nonlocal control-flow (break, return, and continue)
  • the control invocation statement and loop abstractions

I'm intentionally distributing it before these features are available. The idea is that people can try this version, and compare it to the next version with these features working.

Separately, I'm working on a set of smaller language extensions for JDK7, some of which interact very nicely with Closures. For example, "extension methods" enable you to have the effect of adding methods to existing interfaces (e.g. adding "each", "filter", etc to Collection) without breaking backward compatibility. I'll write more about these over the next few days.

This is still rough around the edges, but any feedback you have is most welcome.

Tuesday, July 31, 2007

Internal Versus External Iterators

In the "Gang Of Four" Patterns book's discussion of the Iterator pattern, we read (page 260):

Who controls the iteration? A fundamental issue is deciding which party controls the iteration, the iterator or the client that uses the iterator. When the client controls the iteration, the iterator is called an external iterator (C++ and Java), and when the iterator controls it, the iterator is an internal iterator (Lisp and functional languages). Clients that use an external iterator must advance the traversal and request the next element explicitly from the iterator. In contrast, the client hands an internal iterator an operation to perform, and the iterator applies that operation to every element in the aggregate.

External iterators are more flexible than internal iterators. It's easy to compare two collections for equality with an external iterator, for example, but it's practically impossible with internal iterators. Internal iterators are especially weak in a language like C++ that does not provide anonymous functions, closures, or continuations like Smalltalk and CLOS. But on the other hand, internal iterators are easier to use, because they define the iteration logic for you.

To make this very concrete, one might define a collection-like interface using external iterators like this:

public interface ExternalIterable<T> {
    ExternalIterator<T> iterator();
}
public interface ExternalIterator<T> {
    T next();
    boolean hasNext();
}

On the other hand, using internal iterators one might define an interface something like this:

public interface InternalIterable<T> {
    void iterate(Function<T> closure);
}
public interface Function<T> {
    void invoke(T t);
}

Languages with well-integrated support for closures (such as Scala, Smalltalk, and Ruby) usually provide support for looping over their collections using internal iterators - they are, after all, easier to use in most cases - while other object-oriented languages (such as C++, Java, and C#) tend to use external iterators. Without well-integrated language support for closures, internal iterators would be too painful to use effectively. For that reason, the Java collection framework uses external iterators. But once we have closures in the language, wouldn't it be worth reversing that decision?

The answer is no, and it isn't just because it would be an incompatible change to an existing interface. As discussed above, external iterators are more flexible for some clients. The simpler code that clients can write using internal iterators is already achieved in many clients (of external iterators) due to the previous addition of the for-each loop in JDK5. For the remaining clients, simple library methods can bridge the gap between internal and external iterators. See, for example, the "eachEntry" method for iterating over the entries of a map, discussed in my earlier postings on closures. To see how easy the conversion is, here is the code to convert from an external iterator to an internal one:

    public <T> InternalIterable<T> internalize(final ExternalIterable<T> ext) {
        return new InternalIterable<T>() {
            public void iterate(Function<T> closure) {
                for (ExternalIterator<T> it = ext.iterator(); it.hasNext(); ) {
                    closure.invoke(it.next());
                }
            }
        };
    }

Iteration using internal iterators is often much easier to implement, because the iterator implementation doesn't have to explicitly store and manage the state of the iteration. Much of the complexity in the implementation of the iterators for Java's HashMap and TreeMap (and their Set cousins) would simply vanish if the iterators were internal. For that reason, it is interesting to see if it is possible to have the iterator implemented internally, but exposed to the client externally, by writing a utility method that converts between the two iterable interfaces. This is the reverse of the conversion above. How easy this is to implement depends on the features of your programming language.

C# provides a "yield return" construct that helps provide the convenience of implementing internal iterators and the flexibility of using external iterators. But it is not quite powerful enough to bridge the gap between them. See notes from Cyrus Najmabadi's attempt to do so. Neither are simple (local) byte-code rewriting systems such as Aviad Ben Dov's Yielder Framework for Java. You can do it using continuations, coroutines, or fibers. But Java doesn't have them.

You can solve the problem in Java by resorting to the use of a separate thread to simulate coroutines. The result is messy and expensive, as each converted external iterator requires its own thread. Here is my implementation; can you do better?