Tuesday, January 16, 2007

Primate Parts

Recently Chris Lamb and friends wrote about their experience adding a feature to javac, a pastime slightly more popular than it used to be now that javac's sources have been opened under the GPL. And he found something strange:

"Anyway, it turns out that the javacc [sic] code is messy. Really really messy. But it’s the source of great amusement though, not only from the scary amount of no-op casts, misleading indenting and undocumented functions, but the lexical token for the ‘@‘ symbol is ‘MONKEYS_AT‘. No, we have no idea either."
I responded with a conditional promise to tell him the story:
"Actually, the indentation is consistent if you have your tabs set at 8 spaces, where God intended them.
"There’s a story behind MONKEYS_AT, and if you know it this little piece of code is a funny inside joke. But if you want me to tell you, you’ll have to take back your assertion that javac’s code is messy and tell me that it’s a work of art."
Chris quickly buckled to the pressure, responding by email:
"Yes, it is true - when set it to 8 spaces, it seems to look a bit
better. What I mean to say is, it's now a work of art and any blemishes
are my fault. :)

"Anyway, yes, my friends and I would really like to know this story
behind the naming of the token though -- we found it at about 3AM whilst
hacking on the javac code and it put us off our stride somewhat. ^_^"
Here's the formerly untold story of MONKEYS_AT.

During the development of the JDK5 language features, the Sun team had regular meetings with a team from Denmark that was designing and implementing the variance feature (since renamed wildcards, which is a longer and more interesting story).  You can find the team member names in the paper describing the work. The Danish team was led by Mads Torgersen, and we all enjoyed a number of evenings chatting over beer. During one such session, I was discussing the work I was doing to implement annotations, and I mentioned that, unlike the "#" character that seems to have many names, there don't seem to be any alternative names for the "@" character. Mads told us that in Denmark, there are a number of names for this character, including the archaic "monkey's ass", which refers to the similarity in appearance of this character to the rear end of a monkey. We all thought this was hilarious, but perhaps a bit too risque to put in corporate-developed and publicly-visible sources. But it was just too funny to leave out. Thus I came up with the pun MONKEYS_AT. To this day that little inside joke in the sources reminds me of the team and the time we spent together.
http://www.gafter.com/~neal/p1055.jpg
Mads, by the way, is the one on the left. ;-)
There you have it: a disinterested observer, inclined to believe otherwise, comes to appreciate the beauty and humor of javac.

Tuesday, January 09, 2007

MethodNamesInPieces

In Smalltalk, the name of a method being invoked is interleaved with the arguments passed to the method. Consequently it is difficult to confuse the order of arguments. In Java, on the other hand, when you invoke a method that accepts three integers it is easy to get the order wrong. The compiler has no way to detect the problem, so APIs must be carefully designed with the artificial constraint that one should avoid "too many" arguments of "compatible" types. In the context of closures, Smalltalk's syntax allows "built-in" statement forms such as if-then-else to be expressed as an ordinary method call. When we were putting together the original version of the closures proposal James Gosling suggested this idea to support do-while and if-else style syntax of user-defined control abstraction methods, something that was mentioned in the further ideas section. We placed this issue on the back burner once we found a nice syntax that works for many of the control-invocation use cases, but a recently submitted comment by Stefan Schulz on my blog reminded me of this issue. His use case is that he'd like to be able to write an API that allows him to refactor this

public String toString() {
    StringBuilder sb = new StringBuilder("[");
    boolean first = true;
    for (String s : someCollection) {
     if (first) {
            first = false;
        } else {
            sb.append(", ");
        }
        sb.append(s);
    }
    return sb.append("]").toString();
}

into this

public String toString() {
    StringBuilder sb = new StringBuilder("[");
    for each(String s : someCollection) {
        sb.append(s);
    } inBetween {
        sb.append(", ");
    }
    return sb.append("]").toString();
}

Presumably, the API method would be defined something like this:

<T> void for each(Iterable<T> it, {T=>void} body) inBetween({=>void} between) {
    boolean first = true;
    for (T t : it) {
     if (first) {
            first = false;
        } else {
            between.invoke();
        }
        body.invoke();
    }
}

A related advantage of the Smalltalk syntax is that operator overloading comes almost for free. If operator overloading is on the table for JDK7, perhaps we can kill two birds with one stone, by making the name before the first argument optional:

static BigDecimal (BigDecimal left) plus (BigDecimal right) {
    return left.add(right);
}
static BigDecimal (BigDecimal left) times (BigDecimal right) {
    return left.multiply(right);
}

This would allow you to write code like this:

static BigDecimal f(BigDecimal x, BigDecimal y, BigDecimal z) {
    return (x) plus ((y) times (z));
}

It's probably a small step from here to allowing arbitrary symbols as operator names and eliding some parens. I don't think anything is required in the VM, as we can encode these method names using some non-identifier character in the VM signature. For example, the above methods could be translated in the VM to methods with the names "each~~inBetween", "~plus", and "~times" (the number of tilde characters is the number of arguments before "parts" of the name in the method signature).

There are difficult syntax issues (for example, the each-inBetween example can also be parsed as two separate statements) and I'm not sure I would recommend any of this, but I wanted to share the idea.