Sunday, February 08, 2015

Getting rid of compareTo for ==

NOTE:This is article is thought as prelude to a discussion on the mailing list about a possible removal of the general compareTo path for the equality operator.


As many may know Groovy has quite the complicated logic for the == operator. Which is to call equals unless the left side implements Comparable, in which case we use compareTo... well simplified...

To illustrate the logic:

 class A implements Comparable {
boolean equals(Object o) {false}
int compareTo(Object o) {0}
}
def xa = new A()
def ya = new A()
assert xa==xa && ya==ya // referential identity override
assert !xa.equals(ya) // direct call to equals
assert xa.compareTo(ya)==0 // direct call to compareTo
assert xa==ya // ignores equals, since it implements Comparable

class B implements Comparable {
boolean equals(Object o) {false}
int compareTo(Object o) {0}
}
def xb = new B()
assert !xa.equals(xb) && !xb.equals(xa)
assert xa.compareTo(xb)==0 && xb.compareTo(xa)==0
assert xa!=xb // ignores equals as well as Comparable

assert 1==1l // compare primtive long and int
assert !(1.0G.equals(1.00G))
assert 1.0G==1.00G // compare BigDecimals with differing scale
assert 1.0G==1l // compare primitive long with BigDecimal
assert 1G==1.0G // compare BigInteger and BigDecimal
assert 1!=new Object() // compare primitive with incompatible instances

In Java you know that this operator allows you for example to compare ints and longs and does this in the given case by transforming the int into a long to then compare the numeric values. Similar things happen for the other primitives. Since Java5 the operator does even allow you to compare a primitive int and an Integer by using autoboxing. Where the equality operator in Java fails is if you compare for example a Long and an Integer. Fail in the sense that it does not the same as for the primitive counter parts.

Now in Groovy the equality operator traditionally had to handle comparing the boxed versions as if they are not boxed. This is because in versions of Groovy before 1.8 every primitive declared variable actually used the boxed version. Only in 1.8 I introduced actual primitives, but the ability of the equality operator to compare for example Integer and Long stayed. It had to stay, because we don't only compare those, we have also those 1-char Strings, that are supposed to be equal to Strings, GString logic and of course BigInteger and BigDecimal logics.

BigDecimal now does something that is not really advised when implementing the interface Comparable, it returns false using equals for a case that is seen as equal for compareTo. For example "1.0" and "1.00" is such a case. They are not equal, because the scale is not, even if the value is projectable without precision loss to the other to do an actual compare.

Since people do also things like `1==new Object()` and since this is not supposed to throw a ClassCastExpcetion, even though the compareTo method will do that here, we had also to add a special logic doing the compareTo call only, iff the right side type is a subtype of the left side type.

This causes all kinds of confusion to people. And my suggestion is to remove the compareTo path.

Instead I suggest adding a path special to BigDecimal to handle the equals problem. This should remove a lot of confusion in the future.

Now this will of course have more impact than some people may think. Obviously classes implementing Comparable may now behave different. But especially custom Number implementations may do that now. So it is a loss of feature to some extend. But if the usage of those features is causing more problems than abilities it allows, then we have to rethink this. And I think this is the case here. My intended change would also change the behavior of the program above. The referential equality override would stay, but "assert xa==ya" would then fail, since equals returns always false. Also if equals did return always true "assert xa!=xb" would fail, since before it did not call equals and now does.

Monday, January 12, 2015

Indy and CompileStatic as tag team to defeat array access times

Micro Benchmarks are Evil

They are evil, aren't they? You test a very specialized case that may have no relation to your everyday application at all. But they can show some weaknesses here and there. If they are relevant or not is a different question, that is most often answered with a "they are not". Still there are sometimes cases on which a language can improve upon. And one such case in Groovy is array access.

Array access in Groovy

For those not being aware of it, but Groovy does array access not like Java. Groovy allows the usage of a negative array index, in which case we go from the first element to the last. So a -1 denotes always the last element. Using -array.length on array will again result in a ArrayIndexOutOfBoundsException.

Benchmarking a little

To measure the extend of the problem I am using this little benchmark named fannkuch. It is based on the alioth shootouts Groovy version for fannkuch. Since I know Groovy will not perform very well on this, even with primitive optimization I don't expects too much checking this with none-static code. For those not knowing what primitive optimizations is... it is letting the compiler generate an alternative bytecode execution path based on primitives and the assumption that there are no meta class changes affecting primitives. To ensure this assumption is legal I am using guards.

fannkuch microbenchmark times in ms (JDK8_u25):
primopts Groovy12889.2358718+2718.9594152/-787.6735898
static Groovy3325.5838752+270.7189528/-266.2819292
Java561+85/-65

Which means even Groovy with primitive optimizations is slower by factor 23. Switching to @CompileStatic makes things look better, but there is still a factor 5.

Analyzing the results

Analyzing the generated bytecode will show us, that the @CompileStatic version is not doing anything strange compared to Java, only the array access parts are done different by using BytecodeInterface8 methods to access the arrays. primopts on the other hand show that besides the BytecodeInterface8 method usage, there is also dynamic access to arrays. This of course then means bad times, since beating primitives on the JVM is difficult with code, that cannot handle primitives all that well... like for example reflection.

So my next was to try if invokedynamic can improve the situation. It may at first look strange to use invokedynamic in static compiled code for something as static as this. We know all the types at compile time so a method call should be faster than any fancy thing invokedynamic could do, right? Wrong. Or I should say it depends. What we can do here is to give a very short path for the optimistic case of the array index being positive. In the original code this is done with a try-catch. But in terms of MethodHandles used by invokedynamic we can use a guard that checks the index for a positive value instead. MethodHandles do also provide an exception catching guard of some kind, but this has issues in terms of performance and how far the code can be optimized. In total the guard version has the big advantage of doing something the JVM would do anyway and thus potentially just remove the second check, making the first check very very cheap. The fallback of course is still as complex as before and there is no real speed improvement to be expected. Another part that should deserve consideration is that in invokedynamic a static call site is no where to be compared to a mutable callsite. Thanks to Java8 lambdas a lot of performance optimization effort has been going in making static callsites fast. And we have one here.

New results

This then resulted in PR #587 and updated times in our table:
primopts Groovy12889.2358718+2718.9594152/-787.6735898
static Groovy3325.5838752+270.7189528/-266.2819292
static Groovy with indy878.0258219+328.9714071/-134.1179639
Java561+85/-65

This indicates a mere slow done of 57% now. I think this is a great improvement... And while it would have been nice to be actually on par with Java here, I assume this can only be done by using Java's array access logic in the end. A slowdown like this is something I found already occurring if you check for a boolean in an if for example. So I doubt there is much more room of improvement.

As for primitive optimizations, after GROOVY-7249 and GROOVY-7251 we can also look forward to improvements in indy and normal primitive optimizations.

I will make a new blogpost of the results, once those are implemented

Tuesday, December 09, 2014

A deeper look at default methods conflicts and proxies

So what are default interface methods?

(All my examples will use a very syntax very near to Java.)

In essence they are a way to add methods to an interface, without requiring the implementing class to define the implementation. Example:

 interface A{
default int foo() {return 1}
}
class B implements A{}
assert new B().foo() == 1
The default keyword is here used to start the method declaration and as a flag for the resulting method. B will not have its own implementation of foo, still I will be able to call foo() though an instance of B.

All fine?

Well... what happens if there are conflicts? Example:
 interface A {
default int foo(){return 1}
}
interface B {
default int foo(){return 2}
}
class C implements A,B{}
This results in "error: class C inherits unrelated defaults for foo() from types A and B" when you compile it in Java. The problem is easily solved, by writing a foo method in C and then call the implementation you want with A.super.foo() or B.super.foo()

And that's the point where most tutorials end.

I would like to go further. The "promise" was interface evolution, under which I understand that you can add methods to interfaces without having to worry too much about the implementing class not working anymore. So let me first describe the situation we are really coming from:
 interface A{
void foo()
}
class B implements A{
void foo(){System.out.println("B.foo");}
}
Let us assume A is coming from a library, B is your code, that implements the library interface and B is then supposed to be used in the library and will call foo, which here then results in B.foo being printed.
Now imagine the library gets to a new version and A is changed to this:
interface A{
void foo()
void bar()
}
And because your B is compiled against an older version of the library, you don't have an implementation of bar(). As long as bar() is not called, there won't be a problem. But of course the method was added for a purpose, so the library will call it, resulting a injava.lang.AbstractMethodError. The "evolution" part in Java 8 default methods now is, that you can make this method a default method
 interface A{
void foo()
default void bar() {System.out.println("A.bar");}
}
Now the library code can call bar on B and have a fallback to bar in A, thus avoiding the AbstractMethodError

But I mentioned conflicts. For this we make the example slightly bigger
 interface A{}
interface B{}
class C implements A,B{}
Again we say A comes from a library, let us call it for simplicity A as well. Let us also say B comes from library B, and C is your code that happens to implement the interfaces from both libraries, maybe to produce some kind of adapter. That's the starting situation. Now both libraries take a liking in Java8 and add default methods to their interfaces:
 interface A{
default int foo(){1}
}
interface B{
default int foo(){2}
}
Remember, C would now not compile anymore, but since you compiled against older versions, C stays precompiled, so there is no chance for javac to complain here. But what happens to the code in library A calling A.foo? Well... that would be: "java.lang.IncompatibleClassChangeError: Conflicting default methods: A.foo B.foo" And as far as I know, there is no way around this. That's where the evolution fails.

Another problem case are Java's reflective proxy. There you create a class implementing a series of interfaces by providing an invocation handler, which itself does not implement those interfaces. All method calls will then end up in an invoke method, with a Method instance, describing the method that is supposed to be called. The problem with this environment is, that you cannot call the default method at all. To be able to reflectively call the default method, you need an instance. But since your proxy is the instance and since the proxy will delegate those calls to the invoke method, but you have no such instance. Meaning, Proxy is essentially broken with reflection...

In theory there is a workaround with MethodHandles. This blog here describes how: https://rmannibucau.wordpress.com/2014/03/27/java-8-default-interface-methods-and-jdk-dynamic-proxies But the text there is really incomplete without the code in the second comment. To call a default method, you need to do an invokespecial. The standard invokevirtual Java does to call normal instance methods will result in the errors I mentioned already. Invokespecial allows to bypass inheritance to some extend and target a specific implementation in a specific class. It is for example used for super based method calls. But the usage of this instruction is restricted. The call must be done from the same class... which is not accessible to us. The second comment in the blog of rmannibucau is now using reflection to access a hidden constructor, which allows for private access. That means all handles produced from that lookup object will be treated as if they are from the class and not from outside. This allows calling private methods (hence private access), but also access to invokespecial (unreflectSpecial ensures that).

But what if you have a security manager that does not allow this private access? Simply allowing this access would be a problem, since with that logic you can call anything, that is supposed to be private. The logic for MethodHandles will do a security check only once, and if that check is already passed when the lookup object is created, then a SecurityManager really has no other choice, does it? The only way your proxy can work then is by redirecting the call to the proxied instance. If that it is not the intention to do this for all calls, then you lost.

So what do we do then? Produce our own proxy class at runtime? Well... ignoring that generating classes like this easily leads to class loader related problems, the very same security manager can easily prevent that as well. After all, you have to define your class somewhere.

I guess in total, there is no sure way for the Proxy version to work properly. And the conflict case is obviously also not covered.

I would say, that if default methods had been intended as a safe way to evolve interfaces, then they are a failure. Everything is fine, as long as you stay with a single interface only. If there are multiple interfaces, then things can still go fine, if the interfaces are all from the same library. But if you have to bridge interfaces of different libraries, then you can expect trouble. For me this means I cannot use default methods whenever I write a library, without giving that change the same considerations as if it would be a normal interface method. That means for me it is to be seen as a breaking change. That's a -1 on maintenance.

So in general I actually cannot advice their usage to evolve existing interfaces unless you really, really have thought this through.

Tuesday, November 25, 2014

A Joint Compiler for Groovy and Java using the Processing API?

We had this year a Google Summer of Code project (actually 2) with the goal to write a stubless joint compiler for Groovy using the javac API or at least finding out if it can be done. The idea was to have a two way communication between the compilers, adapting the AST to what each compiler needs. This would allow to compile both languages at once in a single pass, without creating a lot of potentially unused files with potential errors in them as well.

Well, since that did not work out particularly well, I started with a more simple approach leveraging the Java processing API. It surely is no secret that a @SupportedAnnotationTypes("*") will cause the annotation processor described with this to be applied to all classes javac is going to compile. Interesting is how javac behaves if a class cannot be resolved. In that case the symbol gets an error marking, but you can still get information about it.

So I thought, it could be a good idea to just use what javac offers in the processing API to produce a bunch of ClassNode instances our Groovy compiler will understand to then compile first the Groovy code and later use the produced final class files to compile the Java code in a second javac run. The big advantages are: no stubs and java can see the effects of ast transformations in Groovy.

Simple tests showed the approach working well. I just used dummies for the missing classes and let the Groovy compiler fill them later. This worked well till the point where I made a bigger test using the Groovy build itself... and spending hours working myself through the sparse documentation of that API. When I tried out the final version I did find the big flaw in this approach...

Let us assume we have a "import x.y.*; package foo; class FromJava extends FromGroovy {}" where FromJava is Java code and FromGroovy is groovy code. While I can know the full name for FromJava as foo.FromJava and while javac is so kind to tell me that name, we have a big problem with FromGroovy. FromGroovy could have the full name x.y.FromGroovy or foo.FromGroovy. Since javac cannot resolve the missing FromGroovy class, all I will get is a vanilla FromGroovy. In the Groovy compiler on the other hand I only have the full name. And since the vanilla name is not clear enough to find the correct class with the full name, I would need package and imports to maybe create a lookup of some kind myself. But the processing API does not provide information about imports. And that's where this approach gets a burial.

So either to use javac internal apis to get the needed information or no joint compilation.

But using the javac internal api is something I wanted to avoid for this approach. For one it is an internal API and as such not really adapted for outside use and for a number two... the API is complex and difficult to work through. Pairing up with somone knowing javac very well I could probably write a proper joint compiler within a few days. But that's no option here.


Tuesday, January 28, 2014

What class duplication is and how it happens

From time to time we get a question on the lists that turns out to be related to class duplication. In short class duplication is the problem of having two classes of the same name, that are not equal. And that gives all sorts of problems.

In a command line Java application, you usually don't have that sort of problem, because there is only on significant class loader. You have there several loaders too - like the bootstrap loader and the application loader, but what you care about are usually classes given to the JVM by the class path. And these classes are then loaded with the application loader. In my time with Groovy I really had to fight some ugly class loader problems that go beyond mere duplication. They are sometimes so difficult to debug, that I count them to the worst kind of bugs you can have actually. But here we concentrate on class duplication.

Some Basics

First of all you have to imagine that all class loaders together form a tree. The application and bootstrap loaders will be on top forming the root, any other created loader will be a node or leave in that tree. Every class loader has a parent class loader, which the class loader is supposed to delegate loadClass calls to. If the parent doesn't know the class and is not able to create it, then a ClassNotFoundException is thrown and caught by the child node, that requested the class. This child node then has the opportunity to create the class itself, or throw the exception again. In the worst case this goes down to the node doing the original request and then may ultimately throw a ClassNotFoundException for the user.  The class loader creating the class is called defining loader. If you have a Class object from that, you can get the class loader of that class and you will get the defining class loader. For example in Groovy, if your are executing a script in source form from the command line, then this.getClass().getClassLoader() will return an instance of InnerLoader or GroovyClassLoader. I have to mention, that if you don't set the parent, the parent might be null if you request it with classLoader.getParent(), but that does not mean there is no parent. Instead the parent is then the bootstrap class loader. It depends on the implementation though if null is used.

Class loader constraints

Class loader constraints ensure the well behaving of libraries and Java applications.They are described in for example http://docs.oracle.com/javase/specs/jvms/se7/html/jvms-5.html#jvms-5.3.4 But I will try to form this in a bit less complicated and mathematic language. Basically, in the JVM a class as an Class object is not the same as the class you have in source code. Instead you have an object basically defined by a pair of name and loader. A different name with the same loader means a different Class object. A different defining loader but same name, means also a different Class object (class duplication!). The constraints basically translate to this for loadClass calls:
  • A class loader that returned the Class object c for a given name String n, has to always return the same c (referential identity) for a name equal to n (equals)
  • A class loader has to ask the parent to load a class first
The first point goes beyond the defining loader. Of course there should be always
c.getClassLoader().loadClass(c.getName()) == c
but also
 Class c1 = loader.loadClass("Foo")
Class c2 = loader.loadClass("Foo")
assert c1==c2
at any time, even if c1.getClassLoader()!=loader.

Class duplication example

Trouble comes in an environment with complex class loader setup. But before we get into that let me try to illustrate the problem a bit:
 // loader1 will be able to load the class Foo from a jar
def loader1 = new URLClassLoader(...)
// loader2 will be able to load a class of the same name from the same jar
def loader2 = new URLClassLoader(...)

def class1 = loader1.loadClass("Foo")
// loader1 is the defining loader for Foo in class1
assert class1.classLoader == loader1

def class2 = loader2.loadClass("Foo")
// loader2 is the defining loader for Foo in class2
assert class2.classLoader == loader2

// class 1 and class 2 are not the same !
assert class1!=class2

In this example we have the loaders loader1 and loader2, which each can load the class named Foo from a jar.This is not a violation of the constraints I mentioned above. And this example alone does not yet illustrate the full scope of the problem.

When Foo is not Foo

Imagine you have written Java code like this:
 public class Bar {
public void foo(Foo f){}
}
The important part here is that loading the class Bar, will require loading of class Foo as well, since Bar depends on Foo. The loader used to load Foo will be the same that defines Bar. that means the defining loader for Foo will be either a parent of the loader for Bar, or the loader for Bar itself. Let us now come back to our class duplication example from before, but slightly modified:
 // loader1  and loader2 will be able to load the classes
// Bar and Foo from a jar
def loader1 = new URLClassLoader(...)
def loader2 = new URLClassLoader(...)

def class1 = loader1.loadClass("Bar")
// loader1 is the defining loader for Bar in class1 and for Foo
assert class1.classLoader == loader1

def class2 = loader2.loadClass("Foo")
// loader2 is the defining loader for Foo in class2
assert class2.classLoader == loader2

// create a Bar instance
def bar = class1.newInstance()
// create a Foo instance
def foo = class2.newInstance()
// call Bar#foo(Foo)
bar.foo(foo) // Exception!!
The last line here fails, because the Foo we give as argument in the method call is no Foo for the Bar in bar. the Foo known to Bar is one with the defining loader loader1, and the Foo we give in has the defining loader2. This is not limited to method calls, setting fields or even casts have the same behavior. In case of a cast Groovy will then maybe report something like this: GroovyCastException: Cannot cast object 'Foo@1234abcd' with  class 'Foo' to class 'Foo'

This is no problem in Groovy (or Java), this is a problem of your classloader setup.

Diagnose and Solve

Of course a simple test like
foo.getClass.getName().equals("Foo") && Foo.class!=foo.getClass()
can already give some hint for a class duplication problem, since the condition is only true if foo is an instanceof Foo, but not the Foo we used here. One program that can shed some light on the structure is this:
 def printLoader(Class c) {
def loader = c.classLoader

while (loader!=null) {
println "class loader is an instance of ${loader.getClass()} called $loader"
loader = loader.parent
}
println "<bootstrap loader>"
}
If applied here to foo.getClass() and Foo.class you can compare the outputs and should see that at least the first line differs. The fix is more easy said than done. Only a loader common to both should define Foo. Either that has to be done introducing a new class loader, or a class loader that takes URLs has to handle the jar containing Foo (and all dependencies).

Sunday, October 14, 2012

Open Blocks and MOP 2

In my last post I was describing how owner, delegate and this are needed in open blocks and how builders and resolve strategies change the behaviour of an open block. I also stated that this situation is not very satisfying for the next MOP.

The question is then, if there is a deeper principle we can use. The reason why many see a groovy.lang.Closure not as a closure in the more functional sense is partially, because it is resolving names dynamically. Dynamically in the sense, the a closure, once created at runtime, has all names resolved. We on the other hand resolve the names on demand. I will call the imaginary part doing this "dynamic resolver". This dynamic resolver is currently either owner or delegate or a mix between them. But the important part is, that we have something that resolves the name for us.

Now I had the following idea for the resolving of the implicit this part, consisting of two parts... Part one is that I want to let each sub block share the same resolver unless a new resolver is set. Newly created Closures then would get that resolver. The standard resolver would resolve everything against the surrounding class. And if I say "standard" and "new", this means you can set a new resolver, which all sub blocks will then share. The question is if we then can still cover all the bases with this approach and if it is actually better than the old way.

Not-nested Builder

Let us assume we have a builder, that captures each method call, thus if the Closure delegate is set with delegate-only, we will never call the surrounding class. This is actually very much the resolver idea. But why was this not enough in the past? Because we may want to go to the class as well, either after or before the delegate is asked. In my idea the resolver would have to do that. To be able to do this though, we need somehow a resolver for the class too, which I will declare to be always available through the API and thus can be called from the custom resolver as well. This way we have the default OWNER_ONLY and can create DELEGATE_FIRST, DELEGATE_ONLY and OWNER_FIRST quite easily.

Open Sub Blocks

For the usage of an open block in a Closure the idea says, we just reference the resolver from the owner. This realizes by default OWNER_ONLY/OWNER_FIRST. Just like today, we would kind of go to the owner and resolve the call there in a further step, which may end up in a delegate or another owner to continue from there. The difference with the approach here is, that we don't actually go to the owner. Instead we give the resolver to our new Closure and the Closures's behaviour will then be defined by this. If this sub Closure had no delegate set, then OWNER_ONLY and OWNER_FIRST in the old way are equal. Since if there is no delegate set, we don't go that path at all. If the delegate is set we have nested builder, which I will handle later. So for now it is important to note, that regardless what the parent Closure has set as resolver, we effectively realize a OWNER_ONLY/OWNER_FIRST.

Nested Builder

So going back to the sub Closure with a delegate set, we first to note that a builder of some kind will set that delegate, meaning we can set a new resolver here as well. Now with OWNER_ONLY we would not realize any builder since the builder would never be called. With OWNER_FIRST we ask first the parent and then maybe the delegate, depending on if there was a response to the request before or not. If we set a resolver, that first asks the old resolver and then the builder, we get this strategy. With DELEGATE_ONLY we have a builder, that does not ask the parent at all. Here using a resolver, that resolves everything against the builder only will realize this. DELEGATE_FIRST means we first ask the builder and then the parent. Here it is clear, that if we use a resolver that first asks the old resolver and then the builder, we get that as well.

To Self

TO_SELF is the only strategy I did not mention yet. I see no use for this strategy in terms of builders. But that one can be made too, by a resolver, that resolves against the Closure class, ignoring owner and delegate of course.

Differences

The most obvious difference between this way and the old way is, that instead of potentially going up the tree if Closures to the surrounding class and in worst case going it back down again, we have a chain of resolvers, the amount depending on the amount of nested builders. The other most obvious difference is that instead of depending on a predefined strategy and a combination of owner and delegate we get rid of the owner completely and have a resolver instead of the delegate.

Further differences come of course with the details of the resolver. For the MOP2 the basis should be something that answers to the request of a method call with a method, not with the result of the method call. And it should answer if the method call is allowed to be cached or not. With the current way of piping everything through the MOP methods on Closure those two goals are impossible. A general meta class as for a normal class is not enough here, since we have to handle the "implicit this" different. Anything I came up with so far, looked quite complicated. Complex as diagram and difficult to explain too. With this concept I can say that everything goes to the resolver and be done. I think that is better understood.

For the resolver itself the question is open if it should be simply an object and we go by the mop methods (returning methods now) comparable to today, or if it should be an explicit resolver thingy, maybe even as general MOP2 element.

I guess you can call this a draft so far ;)

Monday, October 08, 2012

Owner, Delegate and (implicit) this in an Open Block

Many know of course groovy.lang.Closure, they know about delegates and such, but maybe not so many know why these things exist.

What is a Open Block?

That is basically what is at runtime represented as groovy.lang.Closure or short Closure. Please note that I don't use closure, since an open block can be a kind of closure, but is less limited than a closure. Open Blocks are no lambda expressions either, since they can contain 0-n statements. For more syntactical information please go to http://groovy.codehaus.org/Closures.

The captured call

http://groovy.codehaus.org/Builders shows some examples about what builder are, but essentially they are hierarchical structures, with a capturing ability. You don't really need to nest one open block into another to get that of course. But those Builder are exactly the reason why we have owner, delegate and "this".

To explain this in more detail, let us start with a normal Java block
{ foo () }
You notice, this is a simple call to a method named foo, that is supposed to be defined somewhere outside of the code part we look at. For example foo might be defined in the same class, that contains this code block. It is clear, that foo() is equal to this.foo() here. In the case above I call that to have an "implicit this". There are languages that don't have that of course. Smalltalk and JavaScript come to my mind. In Java "this" and the "implicit this" always refer to the enclosing class.

"Implicit this" in Groovy

Groovy now does this a bit different for open blocks. In Groovy the implicit this is like a reference to the capturing mechanism built into the open block, realized by the Groovy MOP and for a builder. Therefore the call will in Groovy be resolved to the owner, the delegate or to "this". Actually, Groovy is the only programming language I know in which "this" and "implicit this" are essentially different. I know of differing type variants for them in other languages, many don't even have an "implicit this"... but if they have it normally means they are aligned.

Coming from wanting to support the Groovy builder structure, it is clear we want some kind of capturing, thus it is clear, that we cannot simply do the call on the class outside. On the other hand, assume you have an XML-Builder, that is supposed to turn all calls into xml tags. How would you distinguish a programmer wanting to produce <foo/> (whatever sense that may have) from wanting to call a method from the class, to for example increase a counter, prepare a state or something similar. Led by this thought we decided to make the "implicit this" different from the explicit one.

People knowing the pre 1.0 history of Groovy a bit, can tell, that in the early days a "this" in such a block referred to the groovy.lang.Closure instance. This was at another point changed into having to have the builder instance passed around and making "this" and "implicit this" equal. Both versions seemed not to be what we wanted. The first version conflicts with the Java style, making that a quite phony structure. The second version is even more phony and it made builders absolutely not a nice experience. The compromise solution was then to have the explicit "this" bypass the MOP of the groovy.lang.Closure part and let it call into the surrounding class directly, while the "implicit this" goes through the groovy.lang.Closure MOP part. Ignoring owner and differing resolve strategies the MOP at this point is simply, look if a delegate is set, and if it is, try calling the method on the delegate. If the call succeeds, we are done, if not, fall back to "this".

Open Blocks interacting with Builders

Having a delegate we can realize many builder already quite easily. But the devil comes with the details. Assume we use our xml builder like this:
xml.outerElement {
  10.times { innerElement () }
}
This is supposed to produce an element named outerElement, containing 10 times an <innerElement/> part. You may ask why this is difficult. There is one thing about builder you have to know, and that is for each element of the hierarchy, that means for each Closure, you have to set a delegate. You can do this only, if you are capturing the method call. But since we capture only calls with "implicit this", the times call will not be captured. It is a qualified call through the usage of the number 10. Still we want to refer to the builder delegate "xml" outside from with the block given in the times call. Since we cannot set the delegate for that one from the builder, we need a different MOP here. And this is the point where "owner" comes into play. The "owner" is the "structure" containing/owning our Closure. That is either a class, or another Closure. So in the example above the Closure in the outerElement call is the owner of the Closure in the times call. We then change our MOP to not simply fall back to "this", instead we fall back to "owner". Then our innerElement() call will at first be resolved to the delegate that times set. But since times did not set one, it will go to the owner, which is the Closure given to the outerElement call. The outerElement call has a delegate set through our xml builder. With that we then create our <innerElement/>, as we want it.

This means all four parts, owner, delegate, this and implicit this, are required elements for the Groovy MOP.

Resolving Strategies

In the part above I stated the delegate will be resolved against first. That is actually not always right. It actually depends on what the builder sets as strategy. And the default in Groovy is to resolve against the owner first. If you think back that at first "this" referred to the Closure itself and not the surrounding class, it should be clear, that this kind of strategy is a left over from back then. Because without it, you would not be able to call any method from the class if you have an "capture them all" builder, like a xml builder usually is. So the reason that this is the default is historic. There have been heated debates about what the default should be and there are multiple ways, all with pros and cons, but this default was the result back then. Later we added a set of strategies you can use... DELEGATE_FIRST to first look at the delegate and then at the owner, DELEGATE_ONLY, to stop after the delegate, OWNER_FIRST the default, OWNER_ONLY to stop resolving the call after looking at the owner and there is TO_SELF as well, which would resolve the call against the Closure itself only. Again I have to add a detail to the MOP here. Even though it is called OWNER_FIRST and DELEGATE_FIRST, the first thing we will do is to try to resolve the call against the Closure itself. That makes the default MOP a 3-step procedure: try to resolve method against Closure instance, owner, delegate. It is similar for DELEGATE_FIRST or the "only"-variants.

At this point you may have also noticed a practical difference between DELEGATE_FIRST and OWNER_FIRST. If you use an all-capturing builder inside of an all-capturing-builder structure and you use owner first, then the method call will be trapped by the outer builder. If you use the delegate first the inner builder will do that instead. It depends on your use cases what is better in your situation.


Static Type Checking

With Groovy 2.0, Groovy now also offers an optional static type checker, that allows static type checking for a subset of Groovy programs. Things like a builder are highly dynamic structures and difficult to check statically. Languages like Kotlin and Scala have a problem here. Sure you can make builders in them, but when it comes to nested builders you have to be more verbose and pass the builder around all the time and something similar to the early stages in Groovy. And I don't want even to mention that for a html builder for example you have to define somewhere methods for every element. Considering xml and its probable infinite amount of elements, you get into trouble here, even if you ignore owner, delegate this and implicit this as well es resolving strategies.

In Groovy++ the "solution" was to make a kind of mixed mode, that simply doesn't fail compilation if a method is not found in the normal static context. In fact that was always one of the conflict points between the Groovy team and Alex. We, that especially includes me, found that ignoring a missing method defies the purpose of a static compilation and that you loose most benefits from this. The only remaining one actually is that everything else is near Java speed. But static type safety is completely lost.

The idea Alex did not come up with was a helper annotation called @DelegatesTo from Peter Niederwieser. The idea is to mark the Closure parameter in the builder method with that annotation to allow it to tell the compiler, what kind of delegate this method will use, maybe including the resolve strategy. We already have a framework for "static type checker plugins", allowing to hook into the type checker and influence how method calls are resolved. we hope with a combination of both we can even solve cases like the xml builder. Of course this is current development and target for Groovy 2.1. We have to see with what Cedric will come up with in the end, but what we discussed so far sounded promising, and finally solves a longstanding problem

Groovy 3

The next major version will be in 2013 Groovy 3.0 and include a new MOP, that is still supposed to get its full shape. Implementation wise the way owner, delegate and (implicit this) are used together with the resolving strategy actually impose quite a problem to an efficient implementation. We don't want to force users using @DelegateTo. That annotation is only a helper for static compilation. The problem stems from the fact, that we have a quite long method resolving process here. we may have to go through a longer chain of Closure objects and have to test each time if a method is present or not. And if the delegate is changed later on, caching becomes almost impossible. This is a problem for the current implementation, for the invokedynamic port and for anything in Groovy 3 as well, as long as the capabilities should stay the same. Probably I will be using a series of SwitchPoint to solve this, I have yet to see, if that really gives me the desired speed. It is not only speed that matters to me here for Groovy 3. One goal in Groovy 3 is to make the MOP more easy and this kind of mechanism is not easy. I hope with the help of the community I will be able to solve this problem as well.