Sunday, September 17, 2006

Failure of Imagination in Language Design

Failure of imagination - that is, failure to make language constructs orthogonal because you aren't quite sure how people will need to use them together - is one of the most common and severe errors you can make in programming language design. It is an error that is often impossible to correct after the fact. There is an approach to programming language design that you can take to maximize your opportunities to err in this way: consider the interactions of individual language features and decide on a case-by-case basis if the interaction will be allowed or not (or even what the semantics should be) based on whether or not you can find use cases and which semantics best fit those use cases.

This is an approach favored by expert API designers, because the domain of API design is best suited to this approach and these designers expect their expertise to carry from one domain to the other. In an API, the components of the API fit together in particular and well-defined patterns due to the typing and subtyping relationships between the elements; the job of the API designer is to ensure that the patterns you can build with the API elements satisfy a set of use cases.

In a programming language, on the other hand, the elements that are used to assemble programs are ideally orthogonal and independent. Programmers can assemble the elements in arbitrary ways, often unexpected but surprisingly useful ways. The "patterns" movement for example records some of these ideas by creating a common terminology for newly discovered ways of assembling programs from primitives.

To be sure, use cases also play a very important role in language design, but that role is a completely different one than the kind of role that they play in API design. In API design, satisfying the requirements of the use cases is a sufficient condition for completeness. In language design, it is a necessary condition.

Wednesday, September 13, 2006

Nominal Closures for Java (version 0.2)

This post describes a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

This revision of the closures proposal eliminates function types, introduces the Unreachable type, and gives related reachability rules. A closure's "return" type is now called its result type. The abbreviated invocation syntax now takes any statement, not just a block, to make nesting work nicely.

Gilad Bracha, Neal Gafter, James Gosling, Peter von der Ahé

Modern programming languages provide a mixture of primitives for composing programs. C#, Javascript, Ruby, Scala, and Smalltalk (to name just a few) have direct language support for delayed-execution blocks of code, called closures. A proposal for closures is working its way through the C++ standards committees as well. Closures provide a natural way to express some kinds of abstraction that are currently quite awkward to express in Java. For programming in the small, closures allow one to abstract an algorithm over a piece of code; that is, they allow one to more easily extract the common parts of two almost-identical pieces of code. For programming in the large, closures support APIs that express an algorithm abstracted over some computational aspect of the algorithm. This proposal outlines a specification of closures for Java without introducing function types.

1. Closure Literals

We introduce a syntactic form for constructing a closure value:

Expression3
Closure
Closure
FormalParameters { BlockStatementsopt Expressionopt }

Example

We can write a closure that adds two to its argument like this:

interface IntFunction {
int invoke(int i);
}
IntFunction plus2 = (int x){ x+2 };

2. The type of a closure

A closure literal has a "closure type" that exists transiently at compile time. It is converted to some object type at compile-time; See Closure Conversion for details. It is a compile-time error if a closure is not subject to a closure conversion.

The type of a closure is inferred from its form as follows:

The argument types of a closure are the types of the declared arguments.

If the body of the closure ends with an expression, the result type of the closure is the type of that expression; otherwise if the closure can complete normally, the result type is void; otherwise the result type is Unreachable.

The set of thrown types of a closure are those checked exception types thrown by the body.

Example

The following illustrates a closure being assigned to a variable of a compatible object type.

interface VoidIntBlock<throws E> {
void invoke(int i) throws E;
}
VoidIntBlock<InterruptedException> sleeper = (int t) { Thread.sleep(t); };

3. synchronized parameters

We allow the qualifier synchronized to be used on formal method arguments. When a closure is passed to such a parameter, the closure conversion (see Closure Conversion) is allowed to close over its context: the closure may use non-final local variables from enclosing scopes, and may use control-transfer statements such as break, continue, and return that transfer control outside of the body of the closure. An overriding method declaration may only add (i.e. may not remove) the synchronized qualfier to formal parameters compared to a method it overrides; an abstract method that overrides one whose parameter is declared synchronized must itself declare the parameter synchronized. If a closure is passed to a parameter that is not declared synchronized, then the only local variables from enclosing scopes it may access are those that are marked final.

Note: this qualifier is necessary to enable API writers to distinguish synchronous from asynchronous receivers of closures. For compatibility with existing APIs, most of which are asynchronous, the default is not allowing closing over the context. The constraint on overriders preserves the substitutability of subtypes: if a closure is allowed when passed to the superclass method, it must be allowed when passed to the subclass method. A consequence of the overriding constraint is that no existing overridable method can be retrofitted to accept synchronous closures without breaking source compatibility. Static methods such as AccessController.doPrivileged can be retrofitted. The synchronized keyword isn't an ideal choice, but it is the best we've found so far.

Names that are in scope where a closure is defined may be referenced within the closure. It is a compile-time error to access a local variable declared outside the closure from within a closure unless the closure is passed to a parameter that is declared synchronized or the variable was declared final.

Note: a local variable that is referenced within a closure is in scope in the body of the closure. Consequently the lifetime of objects referenced by such variables may extend beyond the time that the block containing the local declaration completes.

4. Closure conversion

A closure literal may be assigned to a variable or parameter of any compatible object type by the closure conversion:

There is a closure conversion from every closure of type T to every interface type that has a single method m with signature U such that T is compatible with U. A closure of type T is compatible with a method U iff all of the following hold:

  • Either
    • The result type of T is either the same as the return type of U or
    • There is an assignment conversion from the result type of T to the return type of U; or
    • the result type of T is Unreachable.
  • T and U have the same number of arguments.
  • For each corresponding argument position in the argument list of T and U, both argument types are the same.
  • Every exception type in the throws of T is a subtype of some exception type in the throws of U.

If the closure is not passed to a parameter that was declared synchronized then

  • it is a compile-time error if the closure being converted contains a break, continue or return statement whose execution would result in a transfer of control outside the closure
  • it is a compile-time error if the closure being converted refers to a non-final local variable or parameter declared in an enclosing scope.

The closure conversion to a non-synchronized parameter applies only to closures that obey the same restrictions that apply to local and anonymous classes. The motivation for this is to help catch inadvertent use of non-local control flow in situations where it would be inappropriate. Examples would be when the closure is passed to another thread, or stored in a data structure to be invoked at a later time when the method invocation in which the closure originated no longer exists.

Note: The closure conversion occurs at compile-time, not at runtime. This enables javac to allocate only one object, rather than both a closure and an anonymous class instance.

We are considering an additional qualifier on non-final local variables that allows a closure to access the variable.

Example

We can use the existing Executor framework to run a closure in the background:

 void sayHello(java.util.concurrent.Executor ex) {
ex.execute((){ System.out.println("hello"); });
}

5. Exception type parameters

To support exception transparency, we add a new type of generic formal type argument to receive a set of thrown types. [This deserves a more formal treatment] What follows is an example of how the proposal would be used to write an exception-transparent version of a method that locks and then unlocks a java.util.concurrent.Lock, before and after a user-provided block of code:

interface Block<throws E> {
void invoke() throws E;
}
public static <throws E extends Exception>
void withLock(Lock lock, Block<E> block) throws E {
try {
lock.lock();
block.invoke();
} finally {
lock.unlock();
}
}

This can be used as follows:

withLock(lock, (){
System.out.println("hello");
});

This uses a newly introduced form of generic type parameter. The type parameters E are declared to be used in throws clauses. If the extends clause is omitted, a type parameter declared with the throws keyword is considered to extend Exception (instead of Object, which is the default for other type parameters). This type parameter accepts multiple exception types. For example, if you invoke it with a block that can throw IOException or NumberFormatException the type parameter E would be IOException|NumberFormatException. In those rare cases that you want to use explicit type parameters with multiple thrown types, the keyword throws is required in the invocation, like this:

Locks.<throws IOException|NumberFormatException>withLock(lock) {
whatever();
}

You can think of this kind of type parameter accepting disjunction, "or" types. That is to allow it to match the exception signature of a closure that throws any number of different checked exceptions. If the block throws no exceptions, the type parameter would be the type null.

Type parameters declared this way can be used only in a throws clause or as a type argument to a generic method, interface, or class whose type argument was declared this way too.

6. Non-local control flow

One purpose for closures is to allow a programmer to refactor common code into a shared utility, with the difference between the use sites being abstracted into a closure. The code to be abstracted sometimes contains a break, continue, or return statement. This need not be an obstacle to the transformation. A break or continue statement appearing within a closure may transfer to any matching enclosing statement provided the target of the transfer is in the same innermost ClassBody.

A return statement always returns from the nearest enclosing method or constructor.

If a break statement is executed that would transfer control out of a statement that is no longer executing, or is executing in another thread, the VM throws a new unchecked exception, UnmatchedNonlocalTransfer. Similarly, an UnmatchedNonlocalTransfer is thrown when a continue statement attempts to complete a loop iteration that is not executing in the current thread. Finally, an UnmatchedNonlocalTransfer is thrown when a return statement is executed if the method invocation to which the return statement would transfer control is not on the stack of the current thread.

7. Definite assignment

The body of a closure may not assign to a final variable declared outside the closure.

A closure expression does not affect the DA/DU status of any variables it names.

8. The type Unreachable

We add the non-instantiable type java.lang.Unreachable, corresponding to the standard type-theoretic bottom. Values of this type appear only at points in the program that are formally unreachable. This is necessary to allow transparency for closures that do not return normally. Unreachable is a subtype of every type (even primitive types). No other type is a subtype of Unreachable. It is a compile-time error to convert null to Unreachable. It is an error to cast to the type Unreachable.

Example

The following illustrates a closure being assigned to a variable of the correct type.

interface NullaryFunction<T, throws E> {
T invoke() throws E;
}
NullaryFunction<Unreachable,null> thrower = () { throw new AssertionError(); };

8. Unreachable statements

An expression statement in which the expression is of type Unreachable cannot complete normally.

10. Abbreviated invocation syntax.

A new invocation statement syntax is added to make closures usable for control abstraction:

AbbreviatedInvocationStatement
Primary ( ExpressionList ) Statement
AbbreviatedInvocationStatement
Primary ( Formals ) Statement
AbbreviatedInvocationStatement
Primary ( Formals : ExpressionList ) Statement

This syntax is a shorthand for the following statement:

Primary ( ExpressionList, ( Formals ) { Statement } );

This syntax makes some kinds of closure-receiving APIs more convenient to use to compose statements.

Note: There is some question of the correct order in the rewriting. On the one hand, the closure seems most natural in the last position when not using the abbreviated syntax. On the other hand, that doesn't work well with varargs methods. Which is best remains an open issue.

We could use the shorthand to write our previous example this way

withLock(lock) {
System.out.println("hello");
}

This is not an expression form for a very good reason: it looks like a statement, and we expect it to be used most commonly as a statement for the purpose of writing APIs that abstract patterns of control. If it were an expression form, an invocation like this would require a trailing semicolon after the close curly brace of a controlled block. Forgetting the semicolon would probably be a common source of error. The convenience of this abbreviated syntax is evident for both synchronous (e.g. withLock) and asynchronous (e.g. Executor.execute) use cases. Another example of its use would be a an API that closes a java.io.Closeable after a user-supplied block of code:

class OneArgBlock<T, throws E> {
    void invoke(T t) throws E;
}
<T extends java.io.Closeable, throws E>
void closeAtEnd(OneArgBlock<? super T,E> block, T t) throws E {
    try {
        block.invoke();
    } finally {
        try { t.close(); } catch (IOException ex) {}
    }
}

We could use the shorthand with this API to close a number of streams at the end of a block of code:

closeAtEnd(FileReader in : makeReader()) closeAtEnd(FileWriter out : makeWriter()) {
// code using in and out
}

Acknowledgments

The authors would like to thank the following people whose discussions and insight have helped us craft, refine, and improve this proposal:

Lars Bak, Cedric Beust, Joshua Bloch, Martin Buchholz, Danny Coward, Erik Ernst, Christian Plesner Hansen, Kohsuke Kawaguchi, Doug Lea, "crazy" Bob Lee, Martin Odersky, Tim Peierls, John Rose, Ken Russell, Mads Torgersen, Jan Vitek, and Dave Yost.