Friday, March 16, 2007

A Compact Object Comparator

Every now and then a problem arises where the right solution would be to impose an arbitrary total ordering on a collection of objects. The simplest example of this is when you need to sychronize on more than one object, all at the same time, to maintain some consistency condition across those objects. Using Closures, you might invoke a utility method like this:

Locks.withLocks(lock1, lock2) {
    // code protected by both locks
}

To avoid deadlock, every piece of code that locks the same set of locks should do so in the same order. Rather than forcing all callers of the withLocks method to worry about getting them in the right order, the implementation of withLocks can sort the incoming locks. Then the caller can just pass the locks in arbitrary order, knowing that they will be locked "in the right order". It doesn't actually matter what order we sort them in, as long as we always get the same order for the same objects. The implementation of withLocks can use Collections.sort to sort the incoming locks, but java.util.concurrent.locks.Lock is not naturally comparable, so we need to pass an appropriate comparator to sort. We need a java.util.Comparator<Lock>, but a java.util.Comparator<Object> would work just as well. Let's specify, and then implement, a suitable comparator. Here is what we need:

/**
 * Returns a comparator that imposes a complete order on all objects.
 * Each invocation of this method may yield a distinct comparator,
 * or may yield the same comparator.
 */
public Comparator<Object> totalOrder() { ... }

How are we going to do this? One idea is to create an assignment of long values to each object, as needed. That would look something like this:

public Comparator<Object> totalOrder() { return new TotalOrder(); }
private class TotalOrder implements Comparator<Object> {
    long nextNonce = 1;
    Map<Object,Long> codes = new IdentityHashMap<Object,Long>();
    public int compare(Object o1, Object o2) {
        Long l1 = getNonce(o1);
        Long l2 = getNonce(o2);
        return l1.compareTo(l2);
    }
    synchronized Long getNonce(Object o) {
        Long nonce = codes.get(o);
        if (nonce == null) {
            nonce = nextNonce++;
            codes.put(o, nonce);
        }
        return nonce;
    }
}

There are two major problems with this approach. First, it causes object retention. Objects whose space would otherwise be recovered by the garbage collector are retained because they are reachable as keys in the codes map. We can't fix this by simply using a WeakHashMap; without the identity semantics of IdentityHashMap the technique doesn't work. We really need WeakIdentityHashMap for this, but no such class exists in the JDK yet. Fortunately, "crazy" Bob Lee has come to the rescue with an implementation of this concept inside the recently open-sourced Guice dependency injection framework. I think this belongs in the JDK, and now is the time to propose it for JDK7.

The other problem with this implementation is that this utility takes up too much space. In general, every time you call the compare method one or two objects might be created and added to the map.

Another idea for implementing this utility is to sort the objects based on their identity hash code. Identity hash codes are well distributed, almost like random numbers. That is naturally thread-safe, and would look something like this:

private class TotalOrder implements Comparator<Object> {
    public int compare(Object o1, Object o2) {
        if (o1==o2) return 0;
        int i1 = System.identityHashCode(o1);
        int i2 = System.identityHashCode(o2);
        return (i1<i2) ? -1 : (i1==i2) ? 0 : 1;
    }
}

This is much more compact than the previous approach. But because identity has codes are not guaranteed to be unique, it occasionally treats two distinct objects as equal.

We can get the best of both worlds - a space-efficient comparator and a complete order - by combining the two approaches:

private class TotalOrder implements Comparator<Object> {
    long nextNonce = 1;
    Map<Object,Long> codes = new IdentityHashMap<Object,Long>();
    synchronized Long getNonce(Object o) {
        Long nonce = codes.get(o);
        if (nonce == null) {
            nonce = nextNonce++;
            codes.put(o, nonce);
        }
        return nonce;
    }
    public int compare(Object o1, Object o2) {
        if (o1==o2) return 0;
        int i1 = System.identityHashCode(o1);
        int i2 = System.identityHashCode(o2);
        if (i1 != i2) return (i1<i2) ? -1 : 1;
        Long l1 = getNonce(o1);
        Long l2 = getNonce(o2);
        return l1.compareTo(l2);
    }
}

By the way, if you haven't already checked it out, see "crazy" Bob Lee's Guice dependency injection framework. We use it extensively at Google. By really taking advantage of recent language features such as generics and annotations, the Guice framework is very flexible and yet much simpler than existing frameworks. Throw away your XML and write your Java code in Java!

thanks to "crazy" Bob Lee for contributing the Guice framework, and for reviewing this essay.

6 comments:

Anonymous said...

I wrote down something similar last year. Only using a map from identity hash code to list of weak references. I guess it slightly leaks over time.

I suspect it might be better to just go for a global lock the times identity hash code collide. Although that's probably dangerous if you go for nesting multiple locks.

Anonymous said...

Why not require locks to be naturally comparable with Java7?

Ricky Clarkson said...

With a syntax for method literals, such as FCM, but with a little currying, it would be possible to write a builder for comparators, so that actually creating a comparator looks like:

Collections.sort(nameList,builder.by(String#charAt(0).then(String#length()));

That would sort a list of Strings by their first character, then if the first character was the same, it would sort by the length. I suppose one could make it complete by adding .by(System#identityHashCode(Object)), except for this quote from this blog:

"But because identity has codes are not guaranteed to be unique, it occasionally treats two distinct objects as equal."

I thought that in current implementations, identity hashCodes were unique for existing objects, but could be reused after garbage collection; because they are just an integer version of the reference to the object.

Roberto Tyley said...

I thought that in current implementations, identity hashCodes were unique for existing objects, but could be reused after garbage collection; because they are just an integer version of the reference to the object.

I don't think you get a guarantee of that - for instance, as 'hashcode' is a 32-bit integer value, the pigeonhole principle would rule it out on a 64-bit JVM running with more than 2^32 object instances!

hlovatt said...

Wouldn't it be easier to add to J7 longIdentityHashCode that returned the same as identityHashCode on 32 bit systems but gauranteed uniqueness on 64 bit systems. And whilst making changes change the default toString method to use longIdentityHashCode.

Neal Gafter said...

No, Howard, having such core behavior guaranteed but only on on some platforms is worse than not having it at all.