What's the point of Closures?
A number of people have asked me: What's the point of closures? Can't you accomplish pretty much the same thing with the existing language constructs? What does it really buy you?
If you haven't programmed using closures, and you've gotten used to the Java idioms for the past ten years, it might be hard to see what closures really buy you. This is my attempt to give you a glimpse of that, by way of an extended example.
The problem
Suppose you are working with an application that maintains a list of documents, and for each individual in some set a list of document annotations. For the sake of this example, suppose the two lists are parallel. That is, if you take the list of annotations from an individual, the values correspond elementwise with the elements of the document list. To make this more concrete:
class Document { ... } class DocAnnotation { ... } class Person { ... } class Documents { static List<Document> allDocuments(); } class Persons { static Set<Person> allPersons(); static Person GEORGE_W_BUSH = ...; } class DocAnnotations { static Map<Person,List<DocAnnotation>> allAnnotations = ...; }
Now, you might ask a question such as: Has George Bush annotated any documents mentioning Iraq as secret. In this hypothetical application, you might code that up something like this:
boolean bushMarkedAnyIraqDocsSecret() { Iterator<Document> docI = Documents.allDocuments().iterator(); Iterator<DocAnnotation> annI = DocAnnotations.allAnnotations.get(GEORGE_W_BUSH).iterator(); while (docI.hasNext() && annI.hasNext()) { Document doc = docI.next(); DocAnnotation docAnn = annI.next(); if (doc.mentions("Iraq") && docAnn.marked("secret")) { return true; } } return false; }
We could abstract this over what we're looking for in the document, who'se annotations we're looking for, and what kind of annotation we're looking for:
boolean personAnnotatedOnKeyword(Person person, String ann, String key) { Iterator<Document> docI = Documents.allDocuments().iterator(); Iterator<DocAnnotation> annI = DocAnnotations.allAnnotations.get(person).iterator(); while (docI.hasNext() && annI.hasNext()) { Document doc = docI.next(); DocAnnotation docAnn = annI.next(); if (doc.mentions(key) && docAnn.marked(ann)) { return true; } } return false; }
It is possible that abstracting in this way allowes us to avoid repeating this loop throughout the code, if the code frequently needs to ask this kind of question. Abstracting common code is good because, among other things, it allows us to reduce the number of things we need to change when we refactor code. For example, if we were to change the representation of a person's annotations to be a Map<Document,DocAnnotation> instead of List<DocAnnotation>, or make a DocAnnotation contain a reference to the Document, we would have to change this kind of loop everywhere it appears. So having a single place where the loop is written is a good thing, as there are fewer places in the code that depend on the exact representation of the data.
Abstracting the loop in JDK5
The next step is to try to abstract the loop itself. Java provides some convenient looping constructs, including the recently introduced for-each loop with its Iterable interface. We can use that, but we need to introduce a type over which we're iterating:
class DocAndAnnotation { final Document doc; final DocAnnotation docAnn; DocAndAnnotation(Document doc, DocAnnotation docAnn) { this.doc = doc; this.docAnn = docAnn; } }
Now we can provide looping support by writing the loop only once in the code like this
Collection<DocAndAnnotation> docsWithAnnotations(Person person) { Iterator<Document> docI = Documents.allDocuments().iterator(); Iterator<DocAnnotation> annI = DocAnnotations.allAnnotations.get(person).iterator(); List<DocAndAnnotation> result = new ArrayList<DocAndAnnotation>(); while (docI.hasNext() && annI.hasNext()) { Document doc = docI.next(); DocAnnotation docAnn = annI.next(); result.add(new DocAndAnnotation(doc, docAnn)); } return result; }
This allows us to write the original loop like this:
boolean personAnnotatedOnKeyword(Person person, String ann, String key) { for (DocAndAnnotation docAndAnn : docsWithAnnotations(person)) { if (docWithAnn.doc.mentions(key) && docWithAnn.docAnn.marked(ann)) { return true; } } return false; }
So far so good, but this solution has an unfortunate feature: it constructs the entire list even if the answer can be found by looking at only the first annotated document. We can do slightly better by constructing a lazy iterator instead of an eager list. We do that by rewriting docsWithAnnotations as follows:
Iterable<DocAndAnnotation> docsWithAnnotations(Person person) { final Iterator<Document> docI = Documents.allDocuments().iterator(); final Iterator<DocAnnotation> annI = DocAnnotations.allAnnotations.get(person).iterator(); return new Iterable<DocAndAnnotation>() { public Iterator<DocAndAnnotation> iterator() { return new Iterator<DocAndAnnotation>() { public boolean hasNext() { return docI.hasNext() && annI.hasNext(); } public DocAndAnnotation next() { return new DocAndAnnotation(docI.next(), annI.next()); } public void remove() { throw new UnsupportedOperationException(); } }; } }; }
Now, without changing personAnnotatedOnKeyword, the same loop works without the overhead of building the entire List<DocAndAnnotation>.
This program does, however, produce a large number of small, transient garbage objects. The Iterator, Iterable, and DocAndAnnotation objects are all short-lived and simply exist to convey data from one part of the program to another. This is not necessarily a problem; HotSpot has a number of garbage-collection algorithms that are good at allocating and reclaiming short-lived objects. But in some applications it could contribute toward a performance bottleneck.
Can the looping code be refactored to avoid all these small allocations? The answer is yes, and there is a standard idiom for doing that in Java. The idea is to turn the control structure of the loop inside-out. Rather than having the personAnnotatedOnKeyword perform the iteration, we have a library method perform the iteration and pass the values to a snippet of code provided by personAnnotatedOnKeyword. That would look something like this:
interface WithDocumentAndAnnotation { void doIt(Document doc, DocAnnotation docAnn); } void docsWithAnnotations( Person person, WithDocumentAndAnnotation body) { Iterator<Document> docI = Documents.allDocuments().iterator(); Iterator<DocAnnotation> annI = DocAnnotations.allAnnotations.get(person).iterator(); while (docI.hasNext() && annI.hasNext()) { Document doc = docI.next(); DocAnnotation docAnn = annI.next(); body.doIt(doc, docAnn); } }
A client can now iterate through a person's annotations by providing a snippet of code in the form of a class that implements the interface:
boolean personAnnotatedOnKeyword( Person person, final String ann, final String key) { class MyBody implements WithDocumentAndAnnotation { boolean result = false; public void doIt(Document doc, DocAnnotation docAnn) { if (doc.mentions(key) && docAnn.marked(ann)) { result = true; } } } MyBody body = new MyBody(); docsWithAnnotations(person, body); return body.result; }
This solves the transient memory allocation problem (if it even was a problem), but this version of the program again unnnecessarily iterates through all of the documents even if it needs to iterate through only a few to compute its result. We can fix that by modifying the WithDocumentAndAnnotation interface's method to return a boolean, which indicates whether or not iteration should continue or whether the loop should abort. This may enable many of the old clients of the docsWithAnnotations method to migrate to the new API, but it may not satisfy the needs of all of them. Presumably we could modify the interface further as we discovered different patterns of control-flow used by clients that iterate through documents and their annotations. We leave the details as an exercise to the reader.
Abstracting the loop with Closures
Closures provide a somewhat more convenient way to abstract the loop:
void docsWithAnnotations(Person person, void(Document,DocAnnotation) block) { Iterator<Document> docI = Documents.allDocuments().iterator(); Iterator<DocAnnotation> annI = DocAnnotations.allAnnotations.get(person).iterator(); while (docI.hasNext() && annI.hasNext()) { Document doc = docI.next(); DocAnnotation docAnn = annI.next(); block(doc, docAnn); } }
This looks almost the same as our last version using JDK5, except that it does not require the introduction of the WithDocumentAndAnnotation interface. Let's see what the client looks like:
boolean personAnnotatedOnKeyword( Person person, final String ann, final String key) { docsWithAnnotations(person, (Document doc, DocAnnotation docAnn) { if (doc.mentions(key) && docAnn.marked(ann)) { return personAnnotatedOnKeyword: true; } }); return false; }
That's it. If we adopt the modifications to the proposal suggested in the Further Ideas section, the client would look something like this:
boolean personAnnotatedOnKeyword( Person person, final String ann, final String key) { docsWithAnnotations(person) (Document doc, DocAnnotation docAnn) { if (doc.mentions(key) && docAnn.marked(ann)) { return true; } } return false; }
The only transient objects created in this version are the closure object itself and the two iterators used in the implementation of docsWithAnnotations. As you can see, the caller of docsWithAnnotations can use control-flow operations like return without the implementation of docsWithAnnotations having to anticipate what kinds of control-flow will be required.
Now, in the interest of full disclosure I should mention that the return statement within the closure is likely to be implemented under the covers using exceptions. That's one more transient object we should count. As for the performance of using an exception in this way, I'm told by HotSpot engineers that an exception used this way is largely optimized away by the VM if found in performance-critical code. The most expensive part of exception handling by far is capturing the stack trace when creating the exception, and we would want to create exceptions for this purpose without stack traces.
That really is it. All of the design iterations we went through to handle variations on this problem in JDK5 simply don't arise as issues. That's the beauty of closures: they allow you to easily abstract away aspects of code that would otherwise require complex contortions.
A slight update on my current feelings about the draft spec. I think we are likely to drop the special syntax for a nonlocal return, and have a different syntax for returning from a closure. This is hinted at in the Further Ideas section. "return" would mean return from the enclosing method or function, and if you're inside a closure you would write a statement something like "^ expression;" to return a value from the closure. Most value-returning closures are unlikely to need this syntax, since they can often be expressed using the "(args) : expression" form.