Showing posts with label recursion. Show all posts
Showing posts with label recursion. Show all posts

Monday, November 8, 2010

There and Back Again - A Developer's Recursion Tale

Recursive functions are so 2009. All the cool kids are now scrambling to convert those legacy recursive functions into stack based iteration. Why would you ever need this, you ask? Maybe you’ve heard of my friend “StackOverflowError”, he likes to hang out on the JVM. Or maybe you’re a browser kinda guy and you’ve seen your alert messages say “Stack Overflow”. Or worst case (like me), you’re supporting IE6 and you finally figured out that the “Browser shows a blank screen” defect you can’t reproduce is IE’s way of overflowing (oh yes it really is). Well, stack based iteration is a way to leave your nice recursive function the way it is, yet still avoid the overflow errors. You just have to keep track of the function stack frames yourself by adding a little boilerplate around your method (and even that can be cleaned up fairly easily).

So here is the tale of taking the road now commonly traveled. Learning to program imperatively, writing some beautiful recursive functions, and then ripping it out again because of platform limitations. There and Back Again – A Developer’s Tale of Recursion...


Read the full article over on the Canoo blog. And if you want to be kind and vote, then head over to DZone first.

Friday, September 4, 2009

Y-Combinator in Groovy

File under uselessly cool (for now at least).

A Y-Combinator in Groovy!

def Y = { le -> ({ f -> f(f) })({ f -> le { x -> f(f)(x) } }) }
Update: @mrdillon was kind enough to point out that my original Y used recursion. His revision is better.

And to prove it works, you can define a recursive factorial function without using recursion in the language.
def factorial = Y { fac ->
{ n -> n <= 2 ? n : n * fac(n - 1) }
}

assert 2432902008176640000 == factorial(20G)
The one-liner closure in the middle is the meat of the method definition and is a classic (i.e. horribly inefficient, known deficient) factorial implementation. I wouldn't write code this way but it makes a nice terse example when written in one line.

So is this useless? Not at all! But getting to an explanation of its use requires just a few examples.

Consider a recursive function that determines if an element is a member of a list. I guess you should also consider that List#contains(Object) does not exist. Here is a first draft of such function:
def isMember(element, list) {
if (!list) false
else
if (list.head() == element) true
else isMember(element, list.tail())
}

assert isMember(2, [1, 2, 3])
Let's just step through it briefly.

if (!list) false
... a simple terminating condition, an empty list never contains the element

if (list.head() == element) true
...if the first item in the list is the element then yes, we found it!

else isMember(element, list.tail())
....otherwise, check all the other items in the list for the same condition

Over time, the isMember will be recursively applied to each element in the list, and call stacks will build up in memory until the element is found, the list is fully checked, or you run out of stack space.

The problem with isMember is the redundancy. Notice how 'element' is always passed as a parameter? The value never changes over the life of the function application, so we shouldn't bother passing it every time. We have closures, so why don't we just 'close over' the value of element and remove the redundant data passing?

We can do it, but because of Groovy's limited recursion support, the implementation is kinda ugly. What I'd like to do is create a recursive closure that captures element once, but Groovy forces me into declaring an inner looping function on one line and defining it on another:
def isMember(element, list) {
def inner
inner = { lat ->
if (!lat) false
else
if (lat.head() == element) true
else inner(lat.tail())
}
inner(list)
}

assert isMember(2, [1, 2, 3])
This is where Y comes into play. I want a recursive closure and that is what Y does! Check out how Y allows me to define a recursive function locally within a method. No more split on declaration/definition.
def isMember = { element, list -> {
def inner = Y { inner ->
{ lat ->
if (!lat) false
else
if (lat.head() == element) true
else inner(lat.tail())
}
}
inner(list)
}

assert isMember(2, [1, 2, 3])
It's not uselessly cool any more is it? Oh wait, without tail call optimization on the JVM you will surely run out of stack space with a nasty StackOverflowException by writing code like this. So don't do it!

I guess it's pretty useless after all.

And if you're looking for a fun weekend project, why don't you do a different one of the Rosetta Code challenges
that are missing for Groovy: http://rosettacode.org/wiki/Tasks_not_implemented_in_Groovy

Tschüs!

Wednesday, July 30, 2008

Morphisms for the Masses

No, not polymorphism. That's one we might consider unlearning. What I'm calling the “morphisms” refers to several types of recursive functions working with lists. Yeah, it's a neologism; that's the type of artistic license that comes with self publishing. But seriously, the morphisms aren't that hard, they just aren't often presented in a learnable way. And learning them gives you a more general way to talk about types of recursive functions, which can help you better explain your abstractions and designs. Let's get started.

Catamorphism
The coolest part of this is that you get to use Greek letters. κατα means “downwards”, as in “catastrophe”, and μορφή means “form” or “shape”. Put them together and you have catamorphism. Apply them to computer science and you have a recursive function that starts with a broad shape (a big old list) and funnels it into a single output (like an integer). The size() or length() method on a list can be a catamorphism. These functions are sometimes called “fold”. They take a type signature of List <Type A> into <Type B>. Or in Java:

int length(List list);
Which might be an implementation of the more general:
T length(List<?> list);
In F# (and I believe OCaml), the type signature for a catamorphism is:
'a list -> 'b
Read these type signatures by looking at the far right element first ('b). That is the return type. Everything else is a parameter. So “tick a” means a generic argument and “tick b” means a generic return type. The type signature of the List.length function in the F# core library is:
('a list -> int)
Phew. We're done with the types. Let's look at one more common example of a catamorphism: the sum function. Take a guess as to the type signature of sum(List)...
int list -> int
Yup, take a list of integers and fold it into a single integer result. And here is a naïve implementation:
let rec sum list =
match list with
| [] -> 0
| head :: tail -> head + (sum tail)
sum takes a list and returns 0 if the list is empty. Otherwise, it splits the list into a head and a tail, and adds the head onto the sum of the tail. Easy Peasy.

Anamorphism
This is the opposite of catamorphism. Some people like to say ”category dual” instead of opposite; you'll sound smarter if you do too. The Greek involved here is ανα, meaning “upward”, and μορφή meaning “shape”. Take a single input and expand it upwards into a wide list. This one is frequently called “unfold”, and a typical Java signature for an anamorphism might be:
List<Integer> first(Integer);
Which would presumably be a function to return a list of the first n numbers between 1 and the parameter. The same type signature in F# is:
int -> int list
That is, a function that takes in integer and produces a list of integers. In general, anamorphisms have the type signature:
'a -> 'b list
A single value is used to generate a list of other values, possibly of a different type. Let's take a look at an anamorphism that generates a list of the first n integers greater than 0, that is, a function with the signature int -> int list:
let rec first max =
match max with
| 1 -> [1]
| x -> x :: first (x-1)
First off, this isn't tail recursive! No bother... the point is to see that a list of integers is generated by building a cons list using the supplied value “max”.

Hylomorphism
Hey, we're done with the basics! If you understand the previous two types then you'll understand this one: a hylomorphism is the composition of an anamorphism and a catamorphism. The fun Greek part is ὑλο (hylo) meaning “wood”, which won't help you remember the term one bit. So putting an anamorphism ( 'a -> 'b list) together with a catamorphism ('b list -> 'c) results in a type signature of 'a -> 'b. Combine an unfold with a fold and you'll have a function that takes a value and returns a value, and hidden in the middle is a recursive function that builds then reduces cons lists. Using our previous examples of first and sum, we can put them together into a simple hylomorphism that sums the first n elements of the natural number scale:
let sumFirst max =
sum (first max)
This example explains the “composition” part of composing the anamorphism and the catamorphism, but it doesn't quite explain the recursive nature of a hylomorphism. See, a hylomorphism is going to build up a list and then perform an operation on each element of the list. Factorial is the archetypal hylomorphism:
let rec factorial x =
if x = 1 then 1
else x * (factorial (x-1))
You'll notice this is not tail recursive. At runtime, the recursive factorial calls build up a list of all the factorial results before applying the * multiplication. You might visualize the processing steps like this:
factorial 4 =
4 * (factorial 3) =
4 * 3 * (factorial 2) =
4 * 3 * 2 * (factorial 1) =
4 * 3 * 2 * 1 =
4 * 3 * 2 =
4 * 6 =
24
This uses a lot of memory and a lot of stack frames! But it doesn't have to be so. Some language compilers will recognize a hylomorphism and short cut it into a less resource intensive and tail-call like structure, applying the multiplication as it becomes available, not waiting until a large list is built up:
factorial 4 =
4 * (factorial 3) =
12 * (factorial 2) =
24 * (factorial 1) =
24 * 1 =
24
Visually, this method of executing the hylomorphism takes less steps, but the point is to eliminate the intermediate list and the resources associated with building it. This shortcut has a name: deforestation. And the compiler that does it is the Glasgow Haskell Compiler. Cool.

And now there's just one last morphism to cover...

Metamorphism
A metamorphism is the categorical dual of a hylomorphism: a catamorphism followed by an anamorphism. There's not a lot to say on this one... it doesn't help you understand deforestation and I don't even have the Greek translation for it. Nonetheless, I'm obligated to cover it for completeness.
A metamorphism is going to compose a catamorphism ('a list -> 'b) with an anamorphism ('b -> 'c list) to create a function of type ('a list -> 'c list). The previous examples can't be composed into a good metamorphism, so consider the function which turns a list of strings into a list of characters...
wordlist_to_chars ["The";"quick";"brown";"fox"]
producing...
['T';'h';'e';'q';'u';'i';'c';'k';'b';'r';'o';'w';'n';'f';'o';'x']
Would be defined as a function composition:
let wordlist_to_chars list =
chars (concat list)
where the concat function has the signature “string list -> string”, chars has the signature “string -> char list”, leaving wordlist_to_chars to have the composed signature “string list -> char list”. Implementing concat and chars as recursive functions is left as an exercise to the reader.

That it... you've seen some morphisms, some deforestation, and some cool Greek letters. Are you ready for paramorphisms, apomorphisms, and "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire”? I'm not, but let me know how it goes.

Friday, May 9, 2008

Functional Calisthenics

An overview of an essay from The ThoughtWorks Anthology caused quite a hoo-ha recently. (That's a cross between a brouhaha and a hoopla, see the Reddit comments and this great response).

Regardless of how you feel about the original article, it's safe to say that practicing the basics of a newly learned skill will improve your ability at it. And you should probably practice with some principles in mind or you might just be perfecting bad habits. As my guitar teacher used to say, "You can learn a bad habit in a week that takes you two weeks to unlearn." I still suck at guitar, but it seemed like sage advice when I was 12.

I'm not trying to learn OO programming, I get enough of that at my day job. But I am trying to learn functional programming. What would functional calisthenics look like? If you're trying to learn functional programming, what are some of the basic principles to try to apply? Here's what I'm trying:

Pick an interesting but small problem to work on. Mathematical problems seem to work well, like these. Problems tied to databases and filesystems don't seem to work as well.

Pick a language that supports first class functions and recursion (to some extent, at least).

And here is my list of principles to follow... please feel free to make your own suggestions:

  • Immutable Objects - Don't allow an object's state to change after it is created.
  • No State - Don't track state, either within an object or globally.
  • No side effects - Don't allow functions to perform side effects.
  • Favor Recursion - Favor recursive algorithms over iterative ones.
  • Favor Higher Order Functions - Favor passing collaborating functions as parameters, returning functions from other functions, or composing new functions out of existing ones.
I have no idea if these principles are really the ones you want to follow, so I figured I'd try this out myself.

My problem: Maximum Segment Sum (MSS). Take a list of integers; the MSS is the maximum of the sums of any number of adjacent integer. I stole the problem from a cool paper I found when trying to figure out what Squiggol meant. For this list of integers:

31 -41 59 26 -53 58 97 -93 -23 84

the MSS is 187. That is the sum of the segment that omits the first two and last three elements of the list.

The language: I picked Groovy because that is my pet right now, but be warned that closure recursion within a script doesn't work well. I ended up having to define a Class to contain my functions. Meh.

The solution:

mss = max º sum* º segs

Simple right? Maybe not. Let's break it down. To solve this we need to generate a list of all possible adjacent values from the integer list, which we'll call "all possible segments", or just segs. Then we need to sum each of those segments, and then we need to find the highest value among those sums. So this:

mss = max º sum* º segs

reading left to right is "the mss is the max of the sum function mapped across the segments". The * is the map function, which in Groovy is Collection.collect(Closure). Call sum on each segment and take the max. The tough part of this is producing all possible segments from the integer list.

The keys to producing the segments are the functions inits() and tails(). inits() returns all the initial segments of a list in order of increasing length:
inits.[a1, a2, ..., an] = [[a1], [a1, a2], ..., [a1, a2, ..., an]]
or, in Groovy
inits([1, 2, 3]) == [[1], [1,2], [1,2,3]]
Tails() returns all the trailing segments of a list in order of decreasing length:
tails.[a1, a2, ..., an] = [[a1, a2, ..., an], [a2, ...an], ..., [an]]
or, in Groovy
tails([1, 2, 3]) == [[1,2,3], [2,3], [3]]
So if you want to produce all the possible segments of the list, all you need to do is map the tails function across the result of the inits function, flattening the result:

segs = flatten º tails* º inits

Let's get Groovy.

At the highest level, the solution is easily expressed: mss = max º sum* º segs
def solve = { list ->
segs(list).collect { it.sum() }.max()
}
assert 187 == solve([31,-41,59,26,-53,58,97,-93,-23,84]
Generate a list of all possible segments from the initial list, loop over each sub-list summing the list contents, and then take the max value.

segs is also easily expressed: segs = flatten º tails* º inits
def segs = { list ->
flatten(inits(list).collect { tails(it) })
}
assert segs([1,2,3]) == [[1], [1, 2], [2], [1, 2, 3], [2, 3], [3]]
Take the inits of the list, and loop across each entry collecting the tails of the entry. Then flatten the whole thing.

inits was simple:
def inits = { list ->
(0..list.size()-1).collect {
list[0..it]
}
}
assert inits([1,2,3]) == [[1], [1, 2], [1, 2, 3]]
This is going to loop once for every element in the list, returning a sub-list from element 0 to 'count' on each loop.

Tails has a very similar implementation:
def tails = { list ->
(0..list.size()-1).collect {
list[it..list.size()-1]
}
}
assert tails([1,2,3]) == [[1,2,3],[2,3],[3]]
This will loop once for every element, returning a sub-list from element 'count' to the end on each loop.

Flatten is where things get interesting. Collection.flatten() does us no good, because we want to reduce the result into a list of lists one element deep, not just flatten the lists into integers. My first iteration used inject() to accumulate the sum, but that seemed to much like tracking state. My interim solution was a function that just walked a nested list, flattening to 1 level. I got this working but realized it could be abstracted more by creating a higher order function.

What I needed first was a function that determined whether an element in a list was a list of primitives (in which case it was a leaf in my tree) or whether it was a list of more lists (in which case it needed flattening). This wasn't too hard:
def isPrimitiveList = {
if (!it) return true
if (!(it instanceof Collection)) return false
if (it.head() instanceof Collection) {
if (it.head().head() instanceof Collection) return false
}
true
}
assert isPrimitiveList([[1], [2]])
assert false == isPrimitiveList([[[1]], [[2]]])
If the parameter is empty or null, simply return true. (Remember, Groovy considers an empty list to evaluate to false). If the parameter isn't a collection then return false too. Otherwise, return true if the head of the head is not a collection (it is a list, but the first element is a primitive).

Then we just define a generic traversal function that will take isPrimitiveList as a parameter, stopping the traversal if that function evaluates to true when passed the current node:
def traverse = {isLeaf, collection ->
if (isLeaf(collection)) return collection
return traverse(isLeaf,
traverse(isLeaf, collection.head()) +
traverse(isLeaf, collection.tail()))
}
If the element is a leaf already, then just return the element. Otherwise, return the results of traversing the head plus the results of traversing the tail. Here you see recursion (the function calls itself) and a higher order function (the function accepts another function (isLeaf) as a parameter). From here it is simple to define a flatten function:
def flatten = traverse.curry(isPrimitiveList)
assert flatten([[1],[2],[3]]) == [[1], [2], [3]]
assert flatten([[[1]],[[2]],[[3]]]) == [[1], [2], [3]]
Curry is the worst kept secret of Groovy. If you're not familiar, it simply wraps an existing closure in another one, binding the first parameter to the passed argument. flatten() is now a traverse() function where the isLeaf closure is always the isPrimitiveList function. There's more to curry than that but it's not worth rehashing.

And there you have it, an MSS solver with only immutable objects, no state, no side effects, recursion, and a higher order function. Took me about 6 hours. Without a doubt, I definitely know the Groovy Collection functions a lot better. It was also interesting to see how my first revision was much more imperative, and over time it was whittled down to something much more functional. I felt this exercise, given the constraints stated, was worthwhile. So whether you hate the idea of functional/object calisthenics or not, this helped me. Does anyone out there have other functional principles that should be included in future exercises?

Full source (and tests) available for download.

Saturday, February 9, 2008

Continuations Passing Style Explained III: Groovy

Update: Like many people, I've been grappling with trying to understand continuations. In these posts, I tried to reduce continuations to their absolute essence. But in doing so, I may have missed the mark widely. I might have reduced and twisted them to the point where they aren't continuations at all, but something else entirely. As the joke goes, "Why isn't an elephant small, round, and white? Because then it would be an aspirin." Maybe all that is left of this post is a strange clever trick/ugly hack. Sorry if I've added to your confusion and misinformation... it was fun to write.

Learning a new language opens your mind to new ways of solving old problems, and so I found myself staring at my Java IDE this afternoon thinking, "What I really need is continuation passing style and recursion." To fully convince myself, I came home and spent Friday night writing a continuation based solution to my problem four different times: in Java 4 (no generics), in Java 5 (with generics), in Java 7 (using Neal Gafter's closures prototype), and in Groovy.

This is Part III of the post, in which I attempt to show how continuation passing style can be used in Groovy. If you're new to continuations you might read Part I for a broader background on the topic and to see the Java 4 and 5 examples. If you're a continuation expert, you might do the same, but then leave a slew of corrections to the post in the comments section. Part II might be of interest of you're keen to see the closures prototype in action.

So a continuation is a function that represents "the rest of the computation given a point in the computation". Continuation passing style is, roughly, passing a function into another function which calls the first function in the return statement. In the previous post, I explained how a method to find the first element of a list and a continuation can be combined to emulate the standard foreach-loop. They can also be combined to emulate a for-every-other loop, or a for-multiples-of-3-loop, or a for-first-matching, or anything you want. That's the power of moving the flow control to the continuation and out of the language (foreach) or enclosing code (Strategy/Template Patterns).

The original Java 5 examples combined an interface representing a function (Continuation) and another function (findFirst) which finds the first element of a list and calls the continuation, passing that first element and the rest of the list without that first element:

interface Continuation<T, U> {
U call(T first, List<T> rest);
}

<T, U> U findFirst(List<T> list, Continuation<T, U> continuation) {
if (list.isEmpty()) return continuation.call(null, list);
T first = list.get(0);
return continuation.call(first, list.subList(1, list.size()));
}
If you want a foreach-loop then you make sure your continuation recursively calls back findFirst on the rest of the list. If you want a for-every-other loop then you have the continuation remove the first element of the "rest" parameter and then call back findFirst recursively. Any iteration strategy is allowed because it is left up to the calling code to specify it. In this way we can define an evens ( ) method which will generate a list of the even numbers from a source list, which normally requires a foreach-loop:
List<Integer> evens(final List<Integer> numbers) {
return findFirst(numbers, new Continuation<Integer, List<Integer>>(){
public List<Integer> call(Integer first, List<Integer> rest) {
if (first == null) return Collections.emptyList(); //no more elements
List<Integer> result = new ArrayList<Integer>();
if (first % 2 == 0) result.add(first); //match!
result.addAll(evens(rest));
return result; //process rest of list
}
});
}
You can also define an exists( ) method that would normally require a for-first-loop:
Boolean exists(final Integer needle, List<Integer> haystack) {
return findFirst(haystack, new Continuation<Integer, Boolean>(){
public Boolean call(Integer first, List<Integer> rest) {
if (first == null) return false; //never found
if (first.equals(needle)) return true; //match!
return exists(needle, rest); //process rest of list
}
});
}
I won't go into detail because I want to spend more time on the Groovy version, which theoretically should be simpler and easier to read. It's original is more fully explained in Part I.

So in Groovy there is no need to declare the Continuation interface or create an anonymous inner class. The findFirst function can simply declare that it accepts a closure:
def findFirst = { list, continuation ->
if (list.isEmpty()) return continuation.call(null, list);
def first = list.get(0);
def rest = list.subList(1, list.size())
return continuation.call(first, rest);
}
If the list passed to findFirst is empty, then the continuation is invoked with null to signal the end of the computation. Otherwise, the first element and the rest of the list are passed to the "continuation" for final processing. For a foreach-loop, the continuation will do something and the call back findFirst with the rest of the list as a parameter. For a find-first-matching-loop, the continuation will simply return true on a matching element and end the looping right there.

Have a look at the evens( ) implementation:
def evens = {numbers ->
return findFirst(numbers, { first, rest ->
if (first == null) return []; //no more elements
def result = [];
if (first % 2 == 0) result.add(first); //match!
result.addAll(evens(rest));
return result; //process rest of list
}
);
}
What's missing is the anonymous inner class. It's been replaced by a closure declaration. The exists( ) method becomes similarly slimmed:
def exists = { needle, haystack ->
return findFirst(haystack, {Integer first, List rest ->
if (first == null) return false; //never found
if (first.equals(needle)) return true; //match!
return exists(needle, rest); //process rest of list
}
);
}
Et voila. You have used continuation passing style to define your own flow control. One function worries about how elements in a list are found, and another function worries about what to do with those elements.

The Groovy version almost seems like a let down after the previous Java 7 post. If you understand recursion then there just isn't much "to get"... whereas the Java 7 version required a bullet list of explanations on how to declare a closure. Interesting...

And remember, without tail call optimization, it would be risky to deploy this code. You'll run out of memory quickly.

Full executable source is available at: http://www.hamletdarcy.com/files/2008/continuations.zip (requires Junit 4 to run).

Thanks!

Continuations Passing Style (poorly) Explained II: Java 7 BGGA Closures Prototype

Update: Like many people, I've been grappling with trying to understand continuations. In these posts, I tried to reduce continuations to their absolute essence. But in doing so, I may have missed the mark widely. I might have reduced and twisted them to the point where they aren't continuations at all, but something else entirely. As the joke goes, "Why isn't an elephant small, round, and white? Because then it would be an aspirin." Maybe all that is left of this post is a strange clever trick/ugly hack. Sorry if I've added to your confusion and misinformation... it was fun to write.

Update #2: This post is pretty much a disaster. I mangled the return statements within the closures because I didn't understand the non-local return "return" keyword.

Learning a new language opens your mind to new ways of solving old problems, and so I found myself staring at my Java IDE this afternoon thinking, "What I really need is continuation passing style and recursion." To fully convince myself, I came home and spent Friday night writing a continuation passing style solution to my problem four different times: in Java 4 (no generics), in Java 5 (with generics), in Java 7 (using Neal Gafter's closures prototype, and in Groovy.

This is Part II of the post, in which I attempt to show how continuation passing style can be used in Java 7 with the current BGGA closures prototype. If you're new to the style you might read Part I for a broader background on the topic and to see the Java 4 and 5 examples. If you're a continuation expert, you might do the same, but then leave a slew of corrections to the post in the comments section. Part III details the Groovy version, and is available at http://hamletdarcy.blogspot.com/2008/02/continuations-explained-iii-groovy.html.

So a continuation is a function that represents "the rest of the computation given a point in the computation". Continuation passing style is, roughly, passing a function into another function which calls the passed function as the return statement. In a sense, the passed function "receives" the result of method call. In the previous post, I explained how a method to find the first element of a list and a collector can be combined to emulate the standard foreach-loop. They can also be combined to emulate a for-every-other loop, or a for-multiples-of-3-loop, or a for-first-matching, or anything you want. That's the power of moving the flow control to the continuation and out of the language (foreach) or enclosing code (Strategy/Template Patterns).

The original Java 5 examples combined an interface representing a function (Continuation) and another function (findFirst) which finds the first element of a list and calls the continuation, passing that first element and the rest of the list without that first element:

interface Continuation<T, U> {
U call(T first, List<T> rest);
}

<T, U> U findFirst(List<T> list, Continuation<T, U> continuation) {
if (list.isEmpty()) return continuation.call(null, list);
T first = list.get(0);
return continuation.call(first, list.subList(1, list.size()));
}

If you want a foreach-loop then you make sure your continuation recursively calls back findFirst on the rest of the list. If you want a for-every-other loop then you have the continuation remove the first element of the "rest" parameter and then call back findFirst recursively. Any iteration strategy is allowed because it is left up to the calling code to specify it. In this way we can define an evens( ) method which will generate a list of the even numbers from a source list, which normally requires a foreach-loop:
List<Integer> evens(final List<Integer> numbers) {
return findFirst(numbers, new Continuation<Integer, List<Integer>>(){
public List<Integer> call(Integer first, List<Integer> rest) {
if (first == null) return Collections.emptyList(); //no more elements
List<Integer> result = new ArrayList<Integer>();
if (first % 2 == 0) result.add(first); //match!
result.addAll(evens(rest));
return result; //process rest of list
}
});
}

You can also define an exists( ) method that would normally require a for-first-loop:
Boolean exists(final Integer needle, List<Integer> haystack) {
return findFirst(haystack, new Continuation<Integer, Boolean>(){
public Boolean call(Integer first, List<Integer> rest) {
if (first == null) return false; //never found
if (first.equals(needle)) return true; //match!
return exists(needle, rest); //process rest of list
}
});
}

I won't go into detail because I want to spend more time on the Java 7 version, which theoretically should be simpler and easier to read. It's also more fully explained in Part I.

So in Java 7 there is no need to declare the Continuation interface or create an anonymous inner class. The findFirst function can simply declare that it accepts a closure:
<T, U> U findFirst(List<T> list, {T, List<T>=>U} continuation) {
if (list.isEmpty()) return continuation.invoke(null, list);
T first = list.get(0);
return continuation.invoke(first, list.subList(1, list.size()));
}

So reading left to right...

  • <T, U>... this is a function that requires two generic types: T and U

  • U... this function will return an object of type U

  • findFirst... this function is named "findFirst"

  • List<T> list... the first parameter is a List containing elements of type T

  • {T, List<T>=>U} continuation.... ooohh, the fun begins. The second parameter is a function, declared using the curly braces. The parameters to this function are a T object and a List of T objects. This function returns an object of type U, and the function is named "continuation". The => thing separates the parameter list from the return type.

Other than that, there is no difference in implementation of findFirst in Java 5 vs. Java 7. If the list passed to findFirst is empty, then the continuation is invoked with null to signal the end of the computation. Otherwise, the first element and the rest of the list are passed to "continuation" for final processing. For a foreach-loop, the continuation will do something and the call back findFirst with the rest of the list as a parameter. For a find-first-matching-loop, the continuation will simply return true on a matching element and end the looping right there.

Have a look at the evens( ) implementation:
List<Integer> evens(final List<Integer> numbers) {
return findFirst(numbers, { Integer first, List<Integer> rest =>
List<Integer> result;
if (first == null) {
result = Collections.emptyList(); //no more elements
} else {
result = new ArrayList<Integer>();
if (first % 2 == 0) result.add(first); //match!
result.addAll(evens(rest));
}
result //process rest of list
}
);
}

What's missing is the anonymous inner class. It's been replaced by { Integer first, List<Integer> rest =>... removing the class declaration and the method definition reduces the lines of code by 2! (That's by two, not by half). But why is the semi-colon missing off the last line? The original version I wrote used the "return" keyword, which doesn't behave the same way inside and outside of a closure. If you want to return a value from a closure, then it must be the last line of the closure, have no return statement, and lack a semi-colon. This means you can't return from multiple places within a closure, and you can't easily move code in and out of closures without some IDE help. I don't feel bad about getting it wrong the first time.

The exists( ) method becomes similarly changed:
Boolean exists(final Integer needle, List<Integer> haystack) {
return findFirst(haystack, { Integer first, List<Integer> rest =>
boolean result = false;
if (first == null) result = false; //never found
else if (first.equals(needle)) result = true; //match!
else result = exists(needle, rest); //process rest of list
result
}
);
}

Et voila. You have used a collector to define your own flow control. One function worries about how elements in a list are found, and another function worries about what to do with those elements.

The purpose of this post is to answer how to do this in Java 7, not whether you should do this. But I can't resist providing a couple thoughts...

  • I wrote this code with only a basic understanding of recursion
  • ...but I love closures in Groovy, so I already comfortable using closures
  • The generics involved were not difficult to write
  • ...but at work, I'm the guy people go to with generics questions (sometimes I wonder why)
  • With minimum effort, a Java developer could learn functional programming
  • ...but the same can be said of generics, which many Java developers refuse to adopt. I certainly got a lot wrong about continuations in this series of posts.
  • The first revision of the code used the return statement, which does not do what I thought it would do
  • ... maybe using "return" for a non-local return by default isn't the right default
  • Finally, without tail call optimization (http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4726340), it would be risky to deploy this code. You'll run out of memory quickly.

After doing this work, I'm impressed with how easy it was for me, personally, to adopt the new closures features. And this isn't the simplest use of closures. I'm also surprised how easy it was to get core concepts wrong, like what "return" does and how continuations work. Some people are asking the question of whether a functional style has any place within Java. I've started to think more about this... and those critics might have a point. It's not entry level code for many beginning programmers. They have my sympathy. But that seems like an issue that businesses can address within a development team rather that within a language feature. After all, beginner programmers will go to extraordinary lengths to do the wrong thing. If you don't want your developers using a language feature then make a policy to forbid it. Easier said then done, though.

This blog post probably won't descend into a closures debate simply because my readership hovers near zero most days... thanks for listening all the same.

Full executable source is available at: http://www.hamletdarcy.com/files/2008/continuations.zip (requires Junit 4 to run). Have fun!

Friday, February 8, 2008

Continuation Passing Style Explained I: Java 5

Update: Like many people, I've been grappling with trying to understand continuations. In these posts, I tried to reduce continuations to their absolute essence. But in doing so, I may have missed the mark widely. I might have reduced and twisted them to the point where they aren't continuations at all, but something else entirely. As the joke goes, "Why isn't an elephant small, round, and white? Because then it would be an aspirin." Maybe all that is left of this post is a strange clever trick/ugly hack. Sorry if I've added to your confusion and misinformation... it was fun to write.

Learning a new language opens your mind to new ways of solving old problems, and so I found myself staring at my Java IDE this afternoon thinking, "What I really need is continuations and recursion." To fully convince myself I came home and spent Friday night writing a continuation based solution to my problem four different times: in Java 4 (no generics), in Java 5 (with generics), in Java 7 (using Neal Gafter's closures prototype), and in Groovy.

In this post I'll try to explain continuation passing style using Java 4 and Java 5 examples. Part II will show how the code changes with the latest closures prototype code, and Part III demonstrates the idea in Groovy.

I run across the scenario that made me think of continuation passing style fairly frequently. It's sort of a variation on the "Hole in the middle" problem described over on Enfranchised Mind. But it's really more of a "Function at the end" approach. I had reams of boilerplate foreach-loop code and just a few lines different right in the inner most loop. A simple, scaled back example is to consider writing a few functions to operator on numeric List data structures, where you need to write an exists( ) method to check for the presence of an element and an evens( ) method to filter all the odd numbers out. Your first revision will probably look something like this:

boolean exists(Integer needle, List haystack) {
for (Object element : haystack) {
if (needle.equals(element)) return true;
}
return false; // nothing found
}

List evens(List numbers) {
List result = new ArrayList();
for (Object element : numbers) {
Integer x = (Integer)element;
if ((x % 2) == 0) result.add(x);
}
return result;
}

You only have to write a couple of these functions before you realize that the list iteration methods are repeated across all of them. And the Template Method and Strategy Pattern don't exactly help here because it's hard to replicate the return statements within the functions. Some iterations stop after a few loops while others iterate over every element. Others might even perform an action on every other element. You'd never write all these loops into the language (foreach, for-first, for-every-other, for-multiples-of-3, etc.)

Enter continuations. A continuation is a function that represents the "rest of the computation given a point in the computation". Naively put, break an algorithm into two parts, have the first part call the second part in the return statement, and then the second part is your continuation. Perhaps you want to count all the files in a set of directories. You might create one function that finds all your directories ( f( ) ) and another function that counts files in a directory ( g( ) ). If you pass g( ) to f( ) as a parameter, and you code f( ) to do it's work and then call g( ), then you have composed a new function and g( ) is the continuation. Most examples you'll find on the Net have to do with tree traversal or web frameworks. The Little Schemer explains it much more simply, and I'm personally sticking with the simpler explanation. Now, full continuations are more complex than this, and I've actually just described a "continuation passing style" and not all continuations. True continuations carry a lot of rules about stack frames and variable scoping that I'm not going to touch.

Now, hopefully your language allows you to easily assign the continuation to a variable and pass it to methods, but perhaps not. You might be stuck with Anonymous Inner Classes, but that's alright. We can piece together a simple version of the evens( ) and exists( ) methods using continuations to see how they work.

The first thing to do is declare a function type:
interface Continuation {
Object call(Integer first, List rest);
}

The argument list may seem odd, but I'll explain. I've been taught to think of a foreach-loop as doing something for each element of a list. Another (better?) way to think of a foreach-loop is doing something for the first element of a list, and then repeating that something on the rest of the list until there are no more elements. That's just recursion. So if we want to emulate a foreach-loop, all we really need is a findFirst( ) method and a continuation that can do something for the first element and then call findFirst( ) again on the remainder of the list.
Object findFirst(List numbers, Continuation continuation) {
if (numbers.isEmpty()) return continuation.call(null, numbers);

Integer first = (Integer)numbers.get(0);
return continuation.call(first, numbers.subList(1, numbers.size()));
}

Basically, if the findFirst method is given an empty list, then our continuation is going to get called with null, signaling the end of the computation. Otherwise, we're going to pop the first element off the head of the list, and hand that element to the continuation along with the rest of the list. In this way you can implement an exists( ) method like so:
Boolean exists(final Integer needle, List haystack) {
return (Boolean) findFirst(haystack, new Continuation(){
public Object call(Integer first, List rest) {
if (first == null) return false; //never found
if (first.equals(needle)) return true; //match!
return exists(needle, rest); //process rest of list
}
});
}

The exists( ) method is looking for a needle (Integer) in a haystack (List). It uses the findFirst method to find the first element in haystack and passes it a continuation which knows how to proceed with the computation. In this case, if the continuation sees that a null is passed then it just returns false because no matching element was ever found. Otherwise, if the element is a match the continuation returns true, control is passed back to findFirst, back out to exists( ), and a false is received on the calling code. Finally, if there is no match, exists( ) is recursively called, but this time the list does not contain the first element.

The evens( ) method is similarly implemented in terms of the findFirst method:
List evens(final List numbers) {
return (List) findFirst(numbers, new Continuation(){
public Object call(Integer first, List rest) {
if (first == null) return Collections.emptyList(); //no more elements
List result = new ArrayList();
if (first % 2 == 0) result.add(first); //match!
result.addAll(evens(rest));
return result; //process rest of list
}
});
}
The continuation looks at the first element, if it is even then it adds the element to it's result set. Then the continuation recursively calls evens( ) on the rest of the list. In the end, you're left with a list of even numbers. The important distinction between the exists( ) and evens( ) methods is the way that both contain unique flow control. Being able to modify how your code returns or accumulates values is an important benefit of using continuation passing style in this way. Also being able to abstract out the control flow allows you to reuse that flow in different contexts. It would be difficult to use the Template or Strategy pattern and have one strategy stop the computation returning a boolean and another continue the computation accumulating Integers without repeating the code the iterates over lists. This example creates more code than it removes and introduces a functional style that is foreign to most Java code bases. Both good reasons not to use it. Other languages are different. Part II covers how the Java 7 prototype will improve this pattern, and Part III shows how Groovy would simplify it. But before moving on, let's consider how Java 5 Generics will make this code look.

I, for one, prefer this pattern with generics. Mostly because it makes me feel smarter than those poor suckers stuck on Java 1.4. We all have our own motives in life.
interface Continuation<T, U> {
U call(T first, List<T> rest);
}

<T, U> U findFirst(List<T> list, Continuation<T, U> continuation) {
if (list.isEmpty()) return continuation.call(null, list);
T first = list.get(0);
return continuation.call(first, list.subList(1, list.size()));
}

The Continuation is declared as a function which takes a T and a List of T's, and returns a U. Think forward and see how exists( ) will take an Integer and a List of Integers, and return a Boolean. Evens( ) will take the same parameters but return a List of Integers.

The implementation of findFirst is really no different, except that it now scares off novice Java developers. Its parameters are a list of T's and a continuation that accepts T's and returns U's. The benefit here is that Continuation interface can now be used type safely in many more scenarios than the previous version.

As far as the implementations of exists( ) and evens( ) goes, they really don't change at all except to pick up some more angle brackets:
Boolean exists(final Integer needle, List<Integer> haystack) {
return findFirst(haystack, new Continuation<Integer, Boolean>(){
public Boolean call(Integer first, List<Integer> rest) {
if (first == null) return false; //never found
if (first.equals(needle)) return true; //match!
return exists(needle, rest); //process rest of list
}
});
}

List<Integer> evens(final List<Integer> numbers) {
return findFirst(numbers, new Continuation<Integer, List<Integer>>(){
public List<Integer> call(Integer first, List<Integer> rest) {
if (first == null) return Collections.emptyList(); //no more elements
List<Integer> result = new ArrayList<Integer>();
if (first % 2 == 0) result.add(first); //match!
result.addAll(evens(rest));
return result; //process rest of list
}
});
}
I love generics, but I'm at a bit of a loss trying to explain how this version is better. Maybe there is a message in that. And maybe the Java 7 version will fare better.

Next up is the Java 7 Prototype version of this, along with the Groovy version.

Full executable source is available at: http://www.hamletdarcy.com/files/2008/continuations.zip (requires Junit 4 to run). Have fun!

Saturday, January 13, 2007

Quine Hunt 2007

A few weeks ago a co-worker was kind enough to introduce me to the idea of a quine. A quine is a program that produces its complete source code as its only output. Note that programs which take input are not considered quines (so no reading your source).

For fun, I challenge the readers here to produce their own quine: a shorter quine in Java, or a quine of any length in any other programming language. Hint: I think you can use printf to make this shorter. And if you search the Internet for your answer then you are a CHEATER. A Cheater I say!

Here was the quine my co-worker produced:

public class Replicant { private static final String quote = new String(new byte[]{34}); private static final String self ="public class Replicant { private static final String quote = new String(new byte[]{34}); private static final String self =; public static void main(String[] args){ System.out.println( self.substring(0,123) + quote + self + quote + self.substring(123) ); }}"; public static void main(String[] args){ System.out.println( self.substring(0,123) + quote + self + quote + self.substring(123) ); }}

The lack of whitespace is not a formatting error... whitespace in the quine is just extra garbage to try to work around. For a warm-up, I took this basic quine format and removed unnecessary keywords to produce a shorter version. It was definitely an exercise in escape characters and substring methods!

class Replicant { static String quote = new String(new byte[]{34}); static String self ="class Replicant { static String quote = new String(new byte[]{34}); static String self =; public static void main(String[] args){ System.out.println( self.substring(0,88) + quote + self + quote + self.substring(88) ); }}"; public static void main(String[] args){ System.out.println( self.substring(0,88) + quote + self + quote + self.substring(88) ); }}

Just for fun I tried to do this in Perl (it's a fun language, right?). I went round and round on this digging myself deeper and deeper holes. I finally had my a-ha moment when I realized that variable declarations in Perl include the $ sign, which in turn requires an escape character. The Java format of declaring variables will not work in a Perl quine, and a Perl quine needs to be essentially a functional program. I make the slash() and quote() functions and it worked awesome. Here is my Perl quine:

sub slash{chr(92);}sub quote{chr(34);}sub my_substr{substr(self(), $_[0], $_[1]);}sub self{"sub slash{chr(92);}sub quote{chr(34);}sub my_substr{substr(self(), \$_[0], \$_[1]);}sub self{;}print(my_substr(0, 91) . quote() . my_substr(0, 67) . slash() . my_substr(67, 7) . slash() . my_substr(74, 164) . quote() . my_substr(91, 900));";}print(my_substr(0, 91) . quote() . my_substr(0, 67) . slash() . my_substr(67, 7) . slash() . my_substr(74, 164) . quote() . my_substr(91, 900));

It felt so great to accomplish this, especially when the solution turned out to be functional programming. C'mon, let's see your quine. I'll give the shortest Java quine 100 Linden dollars (Lord knows I don't need any SecondLife currency!).