behind the times: concurrency

Showing posts with label concurrency. Show all posts

Friday, February 19, 2010

Testing Asynchronous Code with GPars Dataflows

In my last post I showed how to use JConch 1.2 to unit test asynchronous code. It contains a locking/barrier mechanism that allows you to gracefully tell your unit test how to wait and proceed for off thread events without sleeping or polling. Whee!

As a small example, here is how the TestCoordinator would be used to test a mouse click event that happens on a different thread:

TestCoordinator coord = new TestCoordinator()

MyComponent component = new MyComponent()
component.addActionListener({ ActionEvent e ->
   assert "click" == e.actionCommand
   coord.finishTest()
} as ActionListener)

component.click()
coord.delayTestFinish(1, TimeUnit.SECONDS)

If you are living in a Groovy world, then you should consider using GPars Dataflow variables instead of the JConch TestCoordinator. The advantage of using Dataflows is that you get a similar thread coordination API but also get a variable capture API, allowing you to move assertion statements to the end of your unit test where they belong. If you are looking for a clean, minimalist approach to capturing off thread events, then dataflows might be exactly what you need:

import java.awt.event.*
import javax.swing.SwingUtilities
import groovyx.gpars.dataflow.DataFlowVariable

def event = new DataFlowVariable()

MyComponent component = new MyComponent()
component.addActionListener({ ActionEvent e ->
   event << e
   } as ActionListener)

component.click()

ActionEvent e = event.val
assert e.source != null

assert e.actionCommand == "click"

A DataFlowVariable is a little like a Java Future mixed with an immutable, thread safe object. The DataFlowVariable has an underlying value which can be set once, and only once, and trying to access that value will block the caller until the value is available. In this example, setting the value is done in the event handler with the "event << e" line, and accessing the value is done in the "event.val" line. Getting the "val" will block the caller until a value is available. (Remember, in Groovy .val property access is translated into a getVal() method call).

The advantage of using Dataflows in testing, when compared to Java, is that you get a nicer thread synchronization API than what the java.util.concurrent primitives provide. The advantage of Dataflows, when compared to JConch, is that you get to move your assertion methods back to the end of the unit tests. William Wake described a format for unit tests called Arrange-Act-Assert, and it is still one of the best guidelines to writing clear and understandable unit tests. Assertion methods belong at the end of your test and arrangement code belongs at the beginning (note how many mock object frameworks subvert this ordering!). Capturing a variable is a good way to move the assertions to the end of the method, but in most frameworks it requires copious amounts of accidentally complex code.

The downside of using Dataflows is that you once again have to set an @Timeout value on your unit test in the event that you never get a value (otherwise your tests will hang). OH WAIT, that is totally wrong. The GPars authors did provide a timeout value on DataFlowVariable#getVal():

ActionEvent e = event.getVal(4, TimeUnit.SECONDS)

I had you there for a second though, right? Be warned, if the timeout value is exceeded then getVal() returns null, it does not throw an exception.

The only downside to using DataFlow concurrency that I can see is that you actually need to understand a little bit about DataFlow concurrency. Who would have thought? DataFlowVariable is just the tip of the iceberg when it comes to GPars DataFlow support, and is not even the primary use case. The Dataflow User Guide is a good read and will explain a lot more about the concepts and GPars' implementation.

Feel free to try this yourself. You don't even need to download any jars or change your project files. Just add the correct Grapes to the top of your unit test:

@Grab(group='org.codehaus.gpars', module='gpars', version='0.9')
@GrabResolver(name='jboss', root='http://repository.jboss.org/maven2/')

If you need more help, then check out the full example.

Happy Testing!

Tuesday, February 9, 2010

Asynchronous Unit Test Coordination with JConch 1.2

Slowly but surely, JConch Java Concurrency Library is becoming a depot for multithreaded and asynchronous testing on the Java platform. First there was SerialExectorService, allowing you to test with a Java ExecutorService that never started threads (example, javadoc). Then there was assertSynchronized, allowing you to make easy unit test assertions about your object's synchronization policy (javadoc, example).

And now JConch 1.2 offers TestCoordinator: a tool for testing asynchronous code and multi-threaded callbacks. The TestCoordinator solves the problem of how to properly wait for asynchronous method calls without littering your unit tests with wait/join calls or CyclicBarrier/CountDownLatch API calls. In this regard, it provides a useful unit testing abstraction over the great Java 5 concurrency primitives.

Note: If you'd like to skip all the prose then go straight to the example and unit tests that are part of JConch 1.2.

Writing good unit tests requires the same input as writing good production code: practice, dedication, and about 10,000 hours hours experience. Almost any component framework that requires some sort of event listener invariably has a unit test that looks like this:

MyComponent component = new MyComponent()

ActionEvent event
component.addActionListener({ ActionEvent e ->
   event = e
} as ActionListener)

component.click()
Thread.sleep(1000)

assert event.source != null
assert "click" == event.actionCommand

OK, usually it's in Java and not Groovy, but you get the point. Life's too short to write Java at home. The simple format is 1) create a component with a listener that captures the input, 2) activate the component, and 3) put the thread to sleep for a while and hope that the event fires before your assertions get run. The obvious problem is that you end up with either slow or intermittently failing unit tests depending on how long you sleep (no, there really isn't a perfect sleep number that avoids both).

In the Java 4 days this was "solved" by monkeying with Object#wait and Object#notify method calls. Ewww. Java 5 introduced CyclicBarrier and CountDownLatch which solved the problem nicely albeit in a primitive way. It is an improvement, but we can do better than this:

@Test(timeout=5000)
public void testClick() {
 CyclicBarrier gate = new CyclicBarrier(2)

 MyComponent component = new MyComponent()

 component.addActionListener({ ActionEvent e ->
    assert "click" == e.actionCommand
    gate.await()
  } as ActionListener)

 component.click()
 gate.await()
}

This approach allows us to move the assertion methods inside the callback, because we're insured that either the callback will be called or the test will fail with a timeout (the @Test(timeout=5000) bit). It is simpler and hides some complexity, but there is still a bunch of primitives distracting the reader from the core content of the test... what is that (2) parameter on the barrier constructor? Is the timeout value clear or is it hidden within a method annotation? And just what does await() mean? There is a lot of primitiveness involved here, which is what TestCoordinator abstracts over:

import jconch.testing.TestCoordinator

TestCoordinator coord = new TestCoordinator()

MyComponent component = new MyComponent()
component.addActionListener({ ActionEvent e ->
  assert "click" == e.actionCommand
  coord.finishTest()
} as ActionListener)

component.click()
coord.delayTestFinish(1, TimeUnit.SECONDS)

As you can see, TestCoordinator acts alot like a barrier or latch, but without all the low level API. When you've activated your class under test then you call delayTestFinish(...); the unit test will wait unit the listener calls finishTest(). And if finishTest() has already been called (a frequent possibility with multithreaded systems), then delayTestFinish just proceeds. If the timeout value expires then the test fails. No funny timeout annotation. No funny API. Just a simple coordinator. You want to delay a test until a callback is finished, and the API is designed around this nomenclature.

There are two other options to delay as well:

coord.delayTestFinish(1000) // milliseconds
coord.delayTestFinish()     // requires @Timeout value!

For those of you coming from GWT... yes, this is almost exactly the API of GWTTestCase. Use what works, I say.

For more info on TestCoordinator, check out the UnitTest and the example. Happy Testing!

Saturday, December 12, 2009

Java 5 + GPars: Throttling Action Processing

An interesting question came up on the GPars mailing list today: In a system that generates events, what is the best way to throttle back event processing to one event per second? I thought about an answer... then thought some more... and finally decided to write it all up in this blog post. The example uses Groovy and GPars, but it is easily adapted to a generic Java solution. Don't let the actors scare you! (or the lack of semi-colons, for that matter).

The example is the classic "Sleeping Barber" problem (I hadn't heard of it either). Basically, there is a barbershop. The barber is asleep. Customers walk into the waiting room periodically, and the barber wakes up to give each of them a haircut. When he's done he returns to his slumber. It's a lesson in reaction: something is asleep, then awakens to do some work, then returns to sleep.

The GPars docs provide a decent Actor based solution to this problem: there is a waiting room and a barber, and both are actors. When the barber is free, a customer from the waiting room is moved into the barber's chair. The barber and waiting room communicate via actor messages. But what if our barber is a bit of a diva, and no matter how busy the shop gets, he wants to give one haircut every 15 minutes and never any more (otherwise, he might get burnt out you see). That is the throttling problem: how do you make sure events are processed (a haircut is given) no more than x number of times in a given time period?

My solution: keep a work queue, and have a scheduled executor pull work off the queue at a specified interval. Java 5 gives you all the tools to do this without resorting to busy waiting, polling, or writing scheduling code. The classes you need to know about are ArrayBlockingQueue and ScheduledThreadPoolExecutor.

ArrayBlockingQueue is a FIFO (First-In-First-Out) queue that supports blocking instead of busy waiting. When you take() an item from an ABQ, the call blocks until an item is available... no polling or sleeping to see if an item is ready to available. Just call take() and your code won't proceed until there is an element found.

The ScheduledThreadPoolExecutor supports executing both Runnable and Callable objects at a fixed interval. If you're looking to execute the same task every 1 second then STPE is what you need... Timer, for all intents and purposes, has been deprecated.

So here's the barber that just won't stand to be over-worked... setting it all up we need customers, a barber, and a waiting room:

class Customer {
 String name
}

// waiting room can't hold more than 12!
def waitingRoom = new ArrayBlockingQueue(12)

def barber = Executors.newScheduledThreadPool(1)
barber.scheduleAtFixedRate({
     println 'Barber: Next customer please!'
     Customer customer = waitingRoom.take()
     println "${customer.name} gets a haircut at ${new Date().format('H:m:s')}"
 } as Runnable, 0, 15, TimeUnit.MINUTES)

Customer is a simple bean; nothing interesting here. The waiting room is an ArrayBlockingQueue filled with customers that need a haircut. And the barber is an executor service with a scheduled task to give haircuts. The number of threads in the scheduled thread pool is 1 because there's only one barber. The barber takes customers from the waiting room and cuts their hair once every 15 minutes. The call to waitingRoom.take() is blocking... if there is a customer ready then he is serviced immediately, and if one is not, then the call blocks until someone is available. Once thing to note... the waitingRoom has a size of 12... if a 13th customer is added then the calling code will either block until there is enough room or throw an exception. There is an API to do either case.

So how do customers get into the waiting room? That's where GPars actors come in. The barber shop is a "reactor" in GPars terminology. Messages can be sent to the barbershop ("Enter" the waiting room), and the reactor adds the customer to the waiting room. A maître d' of sorts. Here it is in action:

def barberShop = new PooledActorGroup().reactor {message ->
 switch (message) {
     case Enter:
         println "${message.customer.name} waits for a haircut..."
         waitingRoom.add(message.customer)
         break
 }
}

class Enter {
 Customer customer
}

barberShop << new Enter(customer: new Customer(name: 'Jerry'))
barberShop << new Enter(customer: new Customer(name: 'Phil'))
barberShop << new Enter(customer: new Customer(name: 'Bob'))
barberShop << new Enter(customer: new Customer(name: 'Ron'))

The barberShop is a PooledActorGroup, a GPars object, and the "actor framework" just means adding a closure to the reactor() method of that group. The closure, or actor, and responds to Enter messages by adding the customer to the waitingRoom. At the bottom you see the nice << syntax for posting events to the ActorGroup.

So there you have it. There are many ways to do this, but I think the Java 5 Concurrency libraries are some of the best options. I'd be interested to hear other ideas too. Now go give those hippies some haircuts!

Sunday, August 23, 2009

Exploring Immutability

When did immutable objects become idiomatic Java? I suppose it depends on your frame of reference and what your experience brings to coding. It may not be idiomatic Java to you. In fact, Java Beans and fluent interfaces seem to rely on in-place side effects and mutability to function properly. Exactly what place do immutable objects have in the Java world?

The immutability of String surprised me when I was learning Java. One of the first programs I wrote assumed String#replace did an in-place edit of the String, one of the first debugger sessions I conducted clearly showed me that String#replace was not editing the String, and one of the first Sun Forum questions I posted was about String#replace being broken in Java. BROKEN, I say! This represents the one time when an answer to a Sun Forum question clearly solved the poster's problem without a lengthy chain of "please post your code" responses. The point is that immutability has been with Java forever. And now that the days of Object Pools and persistent performance concerns are behind us, it's time to embrace immutability in our own code.

The benefits of immutability are many and well-documented. Effective Java (either edition) is a good start. I'll touch the highlights with the example of a Person object:

class Person {
 def firstName
 def lastName
}

HA! Tricked you, it's Groovy. In the Groovy compiler, the Person class gains two methods for each property defined:

public void setFirstName(Object)
public Object getFirstName()
...

The problem with the Class (as the JVM sees it), is that many threads may access Person and see the same instance in different states. It is difficult to share this object correctly across threads. The problem with the Java source is the cruft of writing getters and setters. The problem with the API of Person is that working effectively with the object requires many calls to setX and setY mutators just to get simple interactions working. The problem with maintaining this code is that the object is somewhat non-deterministic and difficult to reason about. Groovy uniquely solves only one of these issues: the source level cruft. Sure, you don't have to write out getters and setters, but that is the least important problem to solve.

Fluent APIs solve the issue of having to munge up the source code with repetitive calls to setX and setY. Instead of following the Bean pattern of setters being void methods, instead have all your setters return a 'this' reference:

class Person {

 def firstName
 def lastName

 def setFirstName(firstName) {
   this.firstName = firstName
   this
 }

 def setLastName(lastName) {
   this.lastName = lastName
   this
 }
}

Now you have a nicer API to work with at the expense of not having a standard Java bean. Remember the Java language change proposal to make void methods always implicitly return a 'this' reference? I feel like I'm the only guy wanting that in Project Coin. Oh well. Here is what consuming the API now looks like:

def person = new Person()
   .setFirstName('hamlet')
   .setLastName('darcy')
assert 'hamlet darcy' ==  "$person.firstName $person.lastName"

Groovy has a solution to this same problem: the with block. All object have a with method, and when inside the closure parameter, all method and field references are resolved back to the object having with invoked. Visual Basic had this feature and I always missed it in Java:

def person = new Person()
person.with {
   firstName = 'hamlet'
   lastName = 'darcy'
}
assert 'hamlet darcy' ==  "$person.firstName $person.lastName"

So both Groovy and Java have an answer for the boilerplate involved with writing and consuming APIs. But again, these aren't the big issue with mutable objects: non-determinism and thread safety are. To help solve that you can use immutable objects.

Can we get nice, fluent interfaces in Java without sacrificing mutability? The go-to solution in Java is the Builder idiom. Make your Person object immutable (all final fields), and the constructor private. Then define a mutable Builder class that wraps the constructor in a better API. In the end, creating Person instances looks like this:

def person = new PersonBuilder()
       .setFirstName('hamlet')
       .setLastName('darcy')
       .build()
assert 'hamlet darcy' ==  "$person.firstName $person.lastName"

You instantiate the Builder and the builder instantiates the Person. This is nice... I guess maybe. Remember the idea of a "meaningless abstraction"? PersonBuilder is definitely a meaningless abstraction. It does not exist to solve any of the problems the system was designed to do. It exists solely to solve a problem in the implementation. Suspect. Very suspect.

The Builder has by far the longest implementation, and the DRY principle is glaringly violated in the duplicate fields:

class Person {
   private final def firstName
   private final def lastName

   Person(firstName, lastName) {
       this.firstName = firstName
       this.lastName = lastName
   }
}

class PersonBuilder {
   def firstName
   def lastName

   def build() {
       new Person(firstName, lastName)
   }

   def setFirstName(firstName) {
     this.firstName = firstName
     this
   }

   def setLastName(lastName) {
     this.lastName = lastName
     this
   }   
}

It should be said, though, that this idiom does address the determinism and concurrency issues with mutability. Person can now be shared freely across threads, cached safely, and behaves in a more deterministic manner.

The best approach has not yet been mentioned. Simply create an immutable object without a builder, and also provide a nice fluent API around it. The calling code can look exactly the same as the previous fluent API example:

def person = new Person()
       .setFirstName('hamlet')
       .setLastName('darcy')
assert 'hamlet darcy' ==  "$person.firstName $person.lastName"

You can do this by slightly tweaking the fluent API example. Don't return a 'this' reference from setters, just return an entirely new object. And don't worry about all those little objects you're creating and tossing away.

class Person {

   private final def firstName
   private final def lastName

   Person() { }

   Person(firstName, lastName) {
       this.firstName = firstName
       this.lastName = lastName
   }
  
   def setFirstName(firstName) {
       new Person(firstName, lastName)
   }

   def setLastName(lastName) {
       new Person(firstName, lastName)
   }
}

Is a Person instance threadsafe? Yes. It is immutable.
Is a Person instance deterministic? Yes. No race conditions here.
Is a Person instance API easy to consume? Yup, it has a fluent API.
Does the Person class introduce a meaningless abstraction? Nope, and good thing!
Does the Person class require an exotic programming language to implement? No, this works fine in Java.
Does the Person implementation require tons of boilerplate? Maybe. The good part is that every implementation of a setter is exactly the same. The bad part is that every implementation of a setter is exactly the same. Not very DRY, but it could be automated in Groovy at runtime with a little metaprogramming.

That's it, immutable exploration over. And what place should immutability have in the Java world? It should be front and center. I hope I've shown that immutability and good APIs do not need to conflict with each other. You can get both with a tiny bit of extra work.

Monday, January 19, 2009

Test Driven Synchronization Policies with assertSynchronized

As you're surely aware, JConch 1.1 was released yesterday... it includes a tool I've used over the last two years at work to test drive synchronization policies in Java code: assertSynchronized. Even if you're not practicing TDD, assertSynchronized has proven to be a great aide in replicating concurrency defects and verifying that fixes work correctly. And, it's great to have test coverage on synchronization policies during refactoring time. I, and several team members, have genuinely found this assert useful, and this post walks you through how it works.

Here's the idea behind assertSynchronized:

Wrap code snippets that should be thread safe in Callable objects

Create another Callable that returns all your code snippets as a Collection

then assertSynchronized runs your code snippets a whole bunch of times across several threads and makes sure there are no exceptions

This is not deterministic but is effective. My main concern is quickly replicating concurrency defects, which this assertion enables.

For clarity, let's name and describe a few things before seeing a code sample. Each (hopefully) synchronized code snippet is a "task". The Callable that produces the task list is a "task factory". assertSynchronized makes 1000 passes at the task factory. In each pass it gets the task list and queues up as many tasks as it can on multiple threads. When the queue is full, it releases the tasks and hopes for concurrency related exceptions. assertSynchronized moves on to the next iteration when all the tasks from the task factory are complete. Any errors are accumulated and reported at the end with an AssertionError taking the form of "java.lang.AssertionError: An exception was raised running the synchronization test. The test failed x out of y times. Last known error:" followed by a stack trace. There are zero dependencies on a testing framework, so it works in JUnit or TestNG.

So in practice, what does this look like? Answer: an orgy of generics and anonymous class cruft. Consider the case of ArrayList, whose add(T) methods are not synchronized. Calling ArrayList#add across multiple threads eventually breaks on my machine. The Groovy version later is much terser syntactically, but the Java version,although workable, might be a little shocking:

Assert.assertSynchronized(
  new Callable<List<Callable<Void>>>() {
  
    public List<Callable<Void>> call() throws Exception {
      final List<Object> unsafeObject = new ArrayList<Object>();
      final Callable<Void> unsafeInvocation = new Callable<Void>() {
      
        public Void call() throws Exception {
          for (int x = 0; x < 1000; x++) {
            unsafeObject.add(new Object());
          }
          return null;
        }
      };
      return Arrays.asList(
        unsafeInvocation,
        unsafeInvocation
      );
    }
  }
);

Starting from the bottom up... this attempts to add 2000 objects to an ArrayList on two different threads: 1000 on each thread. And assertSynchronized will do this all 1000 times by default. This fails roughly 40 out of 1000 times on my machine. The key to understanding assertSynchronized is to know that the task factory will be invoked 1000 times, so shared state needs to be instantiated within the task factory and not within the individual task.

A good IDE makes all these anonymous classes easy to write, but it's hard to deny that the Groovy version from the JConch examples is more elegant:

Assert.assertSynchronized(
  {
    def target = new Vector()   //fail if ArrayList 
    return [
      { (0..1000).each { target.add(new Object()) }  }  as Callable, 
      { (0..1000).each { target.add(new Object()) }  }  as Callable, 
    ]
  } as Callable
)

So the Twin Cities OTUG group started a concurrency SIG. I added assertSynchronized to JConch as preparation for this, hopefully it helps you out. Consider joining in the discussion here. I'd love to flesh out the JConch concurrent unit testing support more, so email any suggestions to me or leave a comment. I'd love to see what other people are doing for concurrent testing!

Wednesday, September 24, 2008

Declarative Synchronization with Java 5's ReadWriteLock

Don't be scared by the long post! It's only long because of the verbose Java syntax in the code samples... my apologies in advance.

Java 5 included some great concurrency primitives. One of which is the ReadWriteLock, implemented by ReentrantReadWriteLock. It's one object with two locks: a read lock which can be held simultaneously by multiple threads, and a write lock which is exclusive. So concurrency is increased because many threads can read a shared resource while only one thread can write to it. The traditional alternative, the synchronized keyword, provides an exclusive lock no matter what the context. The Javadocs are pretty interesting and worth a read.

Using the thing is quite simple. If you need a read or write lock, you request it and call lock()... making sure to unlock it in a finally block. Imagine a silly class providing String data to callers, backed by a HashMap, which can be refreshed. Getting data (reading) can be done by many clients... but refreshing (writing) should be performed exclusively by a single thread with no read locks held.

import java.util.concurrent.locks.ReentrantReadWriteLock;
import java.util.concurrent.locks.ReadWriteLock;

public class ResourceProvider {

    private final ReadWriteLock lock = new ReentrantReadWriteLock();
    private final Map<String, String> data = new HashMap<String, String>();

    public String getResource(String key) throws Exception {
        lock.readLock().lock();
        try {
            return data.get(key);
        } finally {
            lock.readLock().unlock();
        }
    }

    public void refresh() throws Exception {
        lock.writeLock().lock();
        try {
            //reload the resources into memory
        } finally {
            lock.writeLock().unlock();
        }
    }
}

Is this a good use of ReentrantReadWriteLock? Well, this ResourceProvider class is not immutable, so right off the bat I'm unimpressed with the design. Also, using the synchronized keyword would have been a lot simpler. I would ask, "Is their profiler data to support the additional complexity of the new lock?" Premature optimization and all that... Plus, the lock-finally-unlock idiom is concerning. Looks kinda easy to forget a step in there. Imagine a class with 10 read methods and one refresh method. That'd be a lot of lock-finally-unlocks! Maybe there is some guidance from the JDK...

You know what class I love? AccessController. Its API requires you to give it an Action, which it executes on your behalf. Kinda like how in Groovy you'd just give a closure to an object to execute rather than explicitly get its state. This is the Hollywood Principle: "Don't call us, we'll call you". Don't expose an iterator, expose a method that accepts a closure. Don't expose JVM security privileges, expose a method that accepts a PrivilegedAction. And, in the case of ReadWriteLock, don't expose the locks to be locked and unlocked by the callers; instead, let the caller pass you a function object to execute within the context of either a read or write lock. This is easily written and named ResourceController in this example:

import java.util.concurrent.Callable;
import java.util.concurrent.locks.ReentrantReadWriteLock;
import java.util.concurrent.locks.ReadWriteLock;

public class ResourceController {

    private final ReadWriteLock lock = new ReentrantReadWriteLock();

    public <T> T withReadLock(Callable<T> block) throws Exception {
        lock.readLock().lock();
        try {
            return block.call();
        } finally {
            lock.readLock().unlock();
        }
    }

    public <T> T withWriteLock(Callable<T> block) throws Exception {
        lock.writeLock().lock();
        try {
            return block.call();
        } finally {
            lock.writeLock().unlock();
        }
    }
}

This hides the complexity of ReentrantReadWriteLock from the callers... the only thing they need to know about is the Callable interface (which they should know about). And you'll never forget to unlock the lock, nor do you have crummy lock-finally-unlock boilerplate littering your codebase. Nice seperation of concerns... get to show off your mad generic skillz... blah, blah, blah. Whether or not the calling code takes a complexity/syntax cruft hit is debatable:

public String getResource(final String key) throws Exception {
    return controller.withReadLock(new Callable<String>() {
        public String call() throws Exception {
            return data.get(key);
        }
    });
}

public void refresh() throws Exception {
    controller.withWriteLock(new Callable<Void>() {
        public Void call() throws Exception {
            //reload the resources into memory
            return null;
        }
    });
}

Hmmm. This is just way more dense than the finally blocks in the first example. There's just a lot more to take in! There's four lines of boilerplate to support the one line method calls. The advantage of this strategy is that the compiler will check your correctness. You can't "forget" to create an anonymous class the way you might forget a finally block. Closures would possibly help us, if they were available. They are in Groovy, and the code cleans up considerably... the "as Callable" bit is in there to satisfy the ResourceController written in Java:

public String getResource(final String key) throws Exception {
    return controller.withReadLock({
            return data.get(key)
    } as Callable);
}

public void refresh() throws Exception {
    controller.withWriteLock({
        //reload the resources into memory
        return null;
    } as Callable);
}

Wouldn't it be nice if you could remove all that boilerplate and just declare your intent with annotations? This version of the ResourceProvider is the cleanest of the bunch... all the locking is moved out of the class and replaced with two annotations: @WithReadLock and @WithWriteLock:

public class DefaultResourceProvider implements ResourceProvider {

    private final Map<String, String> data = new HashMap<String, String>();

    @WithReadLock
    public String getResource(String key) throws Exception {
        return data.get(key);
    }

    @WithWriteLock
    public void refresh() throws Exception {
        // reload the settings
    }
}

The code to make all this work is fairly simple. You need to declare the two annotations and then create a wrapper class used with a Proxy that processes and applies the annotations at runtime. It sounds worse to implement than it is.

First step, create the annotations so they can be applied to methods and read back at runtime:

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface WithReadLock { 

}

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface WithWriteLock { 

}

Second step is to create an InvocationHandler that can wrap your object. When a method is called, the invoke() method of the invocation handler is called. It needs to examine the target (the object that was annotated) and look up whether or not any @WithReadLock or @WithWriteLock annotations are present... if they are, execute the method with the appropriate lock held...

import java.lang.reflect.InvocationHandler;
import java.lang.reflect.Method;
import java.util.concurrent.locks.ReentrantReadWriteLock;
import java.util.concurrent.locks.ReadWriteLock;

public class ResourceController implements InvocationHandler {

    private final Object target;
    private final ReadWriteLock lock = new ReentrantReadWriteLock();

    ResourceController(Object target) {
        this.target = target;
    }

    public Object invoke(Object proxy, Method method, Object[] args) {
        try {

            //try to find the method on the target
            final Method targetMethod = target.getClass().getMethod(
                method.getName(),
                method.getParameterTypes());

            if (targetMethod.isAnnotationPresent(WithReadLock.class)) {
                lock.readLock().lock();
                try {
                    return targetMethod.invoke(target, args);
                } finally {
                    lock.readLock().unlock();
                }
            } else if (targetMethod.isAnnotationPresent(WithWriteLock.class)){
                lock.writeLock().lock();
                try {
                    return targetMethod.invoke(target, args);
                } finally {
                    lock.writeLock().unlock();
                }
            } else {
                return targetMethod.invoke(target, args);
            }
        } catch (Exception ex) {
            throw new RuntimeException(ex);
        }
    }
}

There are a couple downsides to this approach. For one, instantiating a proxied version of ResourceProvider is just nasty:

InvocationHandler handler = new ResourceController(
        new DefaultResourceProvider()); 

ResourceProvider provider = (ResourceProvider) Proxy.newProxyInstance(
        DefaultResourceProvider.class.getClassLoader(),
        [ ResourceProvider.class ] as Class[],
        handler);

Yuck! But did you notice how ResourceProvider got renamed? It is now called DefaultResourceProvider and implements ResourceProvider. This is because only interfaces can be proxied. Not ideal. Groovy allows you to wrap non-final classes in a variety of ways (ProxyGenerator, Proxy-o-Matic, and invokeMethod), but it was not obvious to me how to cleanly process these annotations with these approaches. Anyway, you're not seriously looking at applying Groovy to a concurrency issue that has been proven to be so high performance that a ReadWriteLock is even needed, are you? A last option to consider is Spring AOP to proxy the thing... but locking is an implementation detail and hardly a cross-cutting concern. It seems a very bad fit for this bahavior.

But is it even a good idea to seperate the locking code from the implementation this way? It is both possible and easy to create a ResourceProvider that is unusable and does not fulfill the contract it's meant to provide. Without being proxied, the object is downright dangerous to your system. Do you want your system and compiler to allow you to create an object in this state? I'm very hesitant to say yes. The JPA transactional annotations are great because your framework generally handles enforcing it. And web service annotations work great for the same reason. But there is no framework that enforces @WithReadLock. It just seems incredibly dangerous and I have to recommend the middle approach... requiring an anonymous inner class and hiding the ReentrantReadWriteLock behing a ResourceController abstration. Sorry the syntax is so verbose, but it seems the right design decision.

So there you have it: Declarative Synchronization with Java 5's ReadWriteLock which I don't even recommend using. Do you feel slightly cheated?

Source and tests available

Sunday, September 14, 2008

Temporal Coupling in Java and Some Solutions

Temporal Coupling.

I first came across the term in Thomas and Hunt's The Pragmatic Programmer, where they encourage readers to consider "...the role of time as a design element of the software itself." Hmph. Most the chapter is about designing processes for parallelization, and not APIs, but the end of the chapter shows some code comparing C's strtok and Java's StringTokenizer. Reviewing this made me thankful that C++ is well behind me except for the odd bit of JNI code...

So StringTokenizer has a "hasNext" and "next" method on it, so callers can easily iterate over bits of a string. In the strtok days, there was only one method: strtok, and you passed it a string the first time you called it and null every other time to get the next token.

char input[] = "Hello World";
char* hello = strtok(input, " ");
char* world = strtok(NULL, " ");

I invite you to contemplate how the one, global strtok method works when accessed from multiple threads. So to the Prag guys, avoiding temporal coupling meant designing for concurrency. Strtok could never, ever work in a multi-threaded world because of the global state hidden behind method invocations. Whereas objects like StringTokenizer and Iterators work because their state is local to the object.

Bob Martin's Clean Code carries on the torch of avoiding Temporal Coupling, but with a new suggested fix. Consider an object with an execute() and getErrors() method (example from this comment). Which order should these method be called in? Well, the naming makes that fairly clear: first you execute and then you get the error messages. What happens if you call getErrors before execute? Should it perform the calculation or throw an exeception? There is no single answer. Bob's solution is the API equivalent of the bucket brigade:

final class Computation {
   public Future<Computation> execute() {
       // perform some computation, collecting errors, and returning a future
   }

   public static Collection<String> getErrors(Future<Computation> future) {
       // get the error list out of the computation
   }
}

execute() returns a Future of the result... this is a handle to the result of the computation, which may or may not be calculated yet. The getErrors method now requires one of these handles on it's parameter list. This example statically compiles the temporal coupling of the two methods: the getError method requires as input the output of the execute method. With this API it's very hard to invoke the methods out of order.

So what about Iterator?

It's certainly wasn't temporal coupling to the Prag Guys in 1999, seeing as they used an iterator-like API as a motivating and correct example. But clearly, calling iterator.next() repeatedly will result in an exception. There is state hidden behind it in the form of a cursor to the recordset. If an iterator is accessed by multiple threads then all callers of hasNext/next must have the same synchronization policy or risk having the iterator advanced after the hasNext check was performed but before next is called. This looks like temporal coupling to me, and it looks like a violation of the Design for Concurrency tip from the Prag programmer. So in 2008 we should probably start looking for ways to avoid this API.

Except that Effective Java's 2nd Edition (pub: 2008) recommends this API in some cases; disagree with Mssr. Bloch at your own peril. He spins the API as a way to avoid unnecessary use of checked exception. Checked exceptions just produce nasty code:

try {
   obj.action(args);
} catch (SomeException e) {
   //handle exception
}

The checking-API of providing a hasNext or isValid allows you to change a checked exception into a runtime exception because your callers have presumable validated that the correct preconditions hold before invoking the method:

if (obj.isActionAllowed(args) {
   obj.action(args);
}

And Bloch is careful to point out that this API is not appropriate in multi-threaded scenarios where the state of the object may change between invocations of the guard check. So either you're single threaded or your object is immutable. Yet another reason to prefer immutability. Now, how would you implement an immutable iterator? You're probably better off using an immutable list. Anyway...

Assuming we need iterators and their ilk, what can we do instead?

Sentinel values are the common approach: return null, or "", or NaN. But in the end, undermining your type system just isn't smart. If you broadcast to the world that you return a String, then reserving one particular String as an error condition is just mean. Someone, somewhere is going to forget to perform the check. And it will be me and I'll do it about once a month. NullPointerExceptions are just another variation of this, but instead of using one type to represent two different conditions ("" is error, while all other strings are not), you're using two types without declaring it to your compiler (String method returns null or String). Ricky Clarkson's Optional Values in Java contains a good discussion of this (including the comments). Returning an empty list is a great way to signal that no results were found. Returning an empty list to indicate an error occurred is a bad use of sentinels.

And now is the time when we come to the Option type.

F# and OCaml don't have null* (simplification alert); They have an Option type. An Option can either have a value or not. So a function returning a String or null in Java would instead return Option of String, and you have to ask the Option object whether or not there it has a value before using it. The compiler doesn't allow null values to masquerade as Strings, and it won't allow you to reference an optional String until you check that it is available. The result is an end to NullPointerExceptions. That I can live with. Does the API become more complicated?

Option<String> option = iterator.next();
if (option.hasValue()) {
   //do something with value
}

Iterator gets simpler because it loses the hasNext function. It's also no longer a concurrency issue to share iterators between threads. Each invocation of next() is atomic and there is no time window between hasNext and Next where the internal state can change. The calling code is really no different... you either make an unsafe check before grabbing an item or you grab an item and then make a safe check. The latter seems better.

The calling code could be cleaned up, of course. But that would require adding pattern matching to Java, which I suspect is unlikely. But the F# code to perform the checks and handle None vs. Some is concise and not alien once you get used to it:

let printName name =
   match name with
   | (fname, Some(lname)) -> printfn "Hello %A %A" fname lname
   | (fname, None) -> printfn "Hello %A" fname
  
printName ("Sonny" , Some("Bono"))
printName ("Cher", None)

But even with an odd calling syntax from Java, concurrency issues facing the industry make the Option type a welcome addition to codebases. The implementation can be quite simple:

public final class Option<T> {

   private final T value;

   public Option(T value) {
       this.value = value;
   }

   public boolean hasValue() {
       return value != null;
   }

   public T get() {
       if (value == null) throw new RuntimeException("Option not valued");
       return value;
   }

   public static <T> Option<T> Some(T value) {
       return new Option<T>(value);
   }
  
   public static <T> Option<T> None() {
       return new Option<T>(null);
   }
}

And here is a usage example:

final class StringUtils {
   public static Option<String> stringToOption(String input) {
       if (input == null) return Option.None();
       else return Option.Some(input);
   }
}

Functional Java has an Option type, documentation here. And this is called Maybe in other languages (Haskell?). And I should credit Reinier Zwitserloot (best... name... ever...) with getting the ball rolling with his initial comment.

Sorry for the length!

Tuesday, August 12, 2008

Java Fork/Join + Groovy

This is an executable blog post. Download the post here and run it using Groovy with jsr166y.jar in your classpath. This was created and presented for the Groovy Users of Minnesota group.

Introduction
Brian Goetz on the need for better Java support for parallelism: "The Java language has had support for threads and concurrency from day 1; the language includes synchronization primitives such as synchronized and volatile, and the class library includes classes such as Thread. However, the concurrency primitives that were sensible in 1995 reflected the hardware reality of the time: most commercially available systems provided no parallelism at all, and even the most expensive systems provided only limited parallelism. In those days, threads were used primarily for expressing asynchrony, not concurrency, and as a result, these mechanisms were generally adequate to the demands of the time."

"As multiprocessor systems started to become cheaper, more applications needed to exploit the hardware parallelism they provided, and programmers found that writing concurrent programs using the low-level primitives provided by the language and class library was difficult and error-prone."

"As we enter the many-core era, we will need to find finer-grained parallelism or risk keeping processors idle even though there is plenty of work to do.... Java 7 will include a framework for representing a certain class of finer- grained parallel algorithms: the fork-join framework."

About Fork/Join
Fork/Join for Java is a framework developed by Doug Lea, based on his original paper from June 2000. Fork/Join algorithms are parallel versions of divide-and-conquer algorithms. If a task is small, then it is calculated sequentially on a single thread. If a task is large, then it is divided into several parts and each part is added to a queue for later computation. Queued tasks are free to divide again, be worked on by any worker thread, or queue up waiting if the maximum number of threads has been reached. When all the pieces have been divided up and calculated, then a final result is calculated off the partial results of the distributed pieces. "Fork" refers to the division of tasks, and "Join" refers to the merging of results.

Fork/Join is similar to MapReduce in that they are both algorithms for parallelizing tasks. One difference is that Fork/Join tasks know how to subdivide themselves if too large, whereas MapReduce algorithms typically divide up all the work into portions before the algorithm starts.

The cleverness of fork/join lies in its worker scheduling method called "work stealing". Each worker thread maintains a double ended queue of tasks. Any forks of a task get pushed onto the head of that thread's deque, not back onto the general threadpool. Workers execute tasks in youngest first order. When a worker has no more tasks to run it attempts to steal a task from another worker by grabbing the tail of the other thread's deque. When a worker thread has no work and fails to steal any from others, it backs off.

One advantages of work stealing is reduced contention because stealers take tasks from the opposite end of the deque than workers. Another advantage occurs because recursive divide-and-conquer algorithms generate larger tasks early. Older, stolen tasks provide larger units of work leading to further recursive division.

Fork/Join is being developed by the JSR expert group using the name jsr166y. The main JSR 166 was included in Java 5 and the java.util.concurrent classes.

Useful Links
This file is a guide to using Groovy with Java Fork/Join (JSR 166y).
JSR Home: http://jcp.org/en/jsr/detail?id=166
Concurrency Interest: http://gee.cs.oswego.edu/dl/concurrency-interest/
Wiki: http://artisans-serverintellect-com.si-eioswww6.com/default.asp?W1
JSR API: http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166ydocs/
Downloadable Jar (Tested with Java 6): http://gee.cs.oswego.edu/dl/jsr166/dist/jsr166y.jar Original Paper: http://gee.cs.oswego.edu/dl/papers/fj.pdf
Brian Goetz on Fork Join:
Part 1: http://www.ibm.com/developerworks/java/library/j-jtp11137.html
Part 2: http://www.ibm.com/developerworks/java/library/j-jtp03048.html

Table of Contents

import jsr166y.forkjoin.ForkJoinPool
import jsr166y.forkjoin.ForkJoinExecutor
import jsr166y.forkjoin.RecursiveAction
import jsr166y.forkjoin.ParallelArray
import jsr166y.forkjoin.Ops.Predicate
import jsr166y.forkjoin.Ops.Procedure
import jsr166y.forkjoin.Ops.ObjectToDouble
import jsr166y.forkjoin.Ops.DoubleReducer
import jsr166y.forkjoin.ParallelDoubleArray.SummaryStatistics

A ForkJoinPool is a host for a group of ForkJoinWorkerThreads that perform ForkJoinTasks. ForkJoinPool does not implement the Java ExecutorService interface because it only executes ForkJoinTasks, not arbitrary Runnables.

This code produces a Fork/Join pool with 10 workers

ForkJoinExecutor pool = new ForkJoinPool(poolSize: 10);

Everyone loves the fibonacci sequence. It will be the basis for a few examples showing how the core of Fork/Join works. Here is a sequential implementation:

def fib
fib = {n ->
   if (n <= 1) return n
   else return fib(n - 1) + fib(n - 2)
}

println "Sequential fib: ${fib(15)}"
// Output: Sequential fib: 610

In order to use Fibonacci with Fork/Join, we need to model the algorithm as an object that can both solve the problem and subdivide the problem (that is, both conquer and divide). This class can do just that. The solve() method simple invokes the sequential fib(), while the fork() method returns a two element array representing parts which can be joined together later.

public class FibonacciProblem {
   int n;

   def solve() { fibonacci(n) }

   def fork() {
       return [new FibonacciProblem(n: n - 1), new FibonacciProblem(n: n - 2)]
   }

   def fibonacci(n) {
       if (n <= 1) return n
       else return fibonacci(n - 1) + fibonacci(n - 2)
   }
}

Once the problem is modeled, we can wrap it in a Fork/Join Task that controls when to fork and when to solve (when to divide and when to conquer). The task also needs to hold the result in mutable state so that it can be queried later.

public class FibonacciTask extends RecursiveAction {
   FibonacciProblem problem
   int result

   @Override protected void compute() {
       if (problem.n < 4)
           result = problem.solve()
       else {
           def subproblems = problem.fork()
           def subTasks = subproblems.collect { new FibonacciTask(problem: it) }
           forkJoin(subTasks)
           result = subTasks.collect { it.result }.sum()
       }
   }
}

In the above example, notice how any problem less than 4 steps deep is simply solved inline, otherwise the problem is fork into multiple parts and re- queued using the forkJoin method. forkJoin is a convenience for separate fork() and join() calls, adding a list of subtasks to the deque.

One problem with modeling the fibonacci algorithm this way is that the algorithm is spread out across two classes: the *Problem and the *Task. The FibonacciProblem.fibonacci() method is complete, in that it uses the + operator to join its two results up in its recursive final statement. But if the problem is split then that + sign is represented in the FibonacciTask at the point where sum() is called. This is a duplication of logic! Perhaps it would be better to model the Problem and the Task as one entity... but then that entity might be overloaded because it is has task and problem functions.

Regardless, we can now throw a new FibonacciProblem on the framework and watch it compute across threads. Now, much larger results can be calculated without running out of stack space. The result should also be calculated faster. Disclaimer: this is an admittedly naive implementation of fib.

def task = new FibonacciTask(problem: new FibonacciProblem(n: 15))
pool.invoke(task)
println "Forked fib: $task.result"
// Output Forked fib: 610

So far, it's fair to make two observations: modeling a problem as a recursive fork/join task is hard, and using Groovy to do it offers little value. Luckily, JSR166y offers some higher level abstractions so that we don't have to deal with the low level tasks and actions directly, and using these higher level constructs is where Groovy closures pay dividends. Many of the divide-and-conquer algorithms that fit a fork/join framework operator on array and list types. Because of this, it is possible to build up a large and useful library of functions and types to hide the messy details of the fork/join tasks.

The following examples are all going to operate on a "Student" type. We'll generate some random test data for 1000 students. This usage of adding methods to java.util.Random is probably not the best use case for meta-programming but it sure is cute.

class Student {
   int id;
   int graduationYear;
   double gpa;           // 4 point scale
   double adjustedGpa;   // 5 point scale
}

Random.metaClass
Random.metaClass.randomYear = { 2000 + delegate.nextInt(11) }
Random.metaClass.randomGPA = { delegate.nextInt(41) / 10 }
Random rand = new Random();

def allStudents = (1..1000).collect {
   new Student(id: it, graduationYear: rand.randomYear(), gpa: rand.randomGPA())
}

One of the core helper classes is the ParallelArray, which is an array which maintains an F/J executor in order to provide parallel aggregate operations. Creating one is pretty easy but requires a reference to the executor pool:

ParallelArray students = new ParallelArray(pool, allStudents as Student[])

Filtering selects a subset of the elements from the array. This is like a parallel version of Groovy's Collection#findAll(Closure). Note: The closure is not executed when withFilter() is called. It is lazily executed as needed, which in this case is when size() is called. Each invocation of the closure will potentially occur on a different thread. The return type is itself a functional array, so many of these methods may be chained, as you'll see later.

def graduates = students
   .withFilter({ it.graduationYear <= 2008 } as Predicate)

println "${graduates.size()} of the ${students.size()} students have graduated."
// Output: 831 of the 1000 have graduated.

Application applies a void function to each element in the array, modifying the array in the process. It is a bit like Groovy's List#collect(Closure) except that it operates on the array itself and does not construct a new result. Again, each invocation of the closure potentially occurs on a different thread. However, in this case the closure is executed at the time apply() is invoked, not lazily.

students
   .apply({ it.adjustedGpa = it.gpa * 1.25 } as Procedure)

def s = students.get(0)
println "Student $s.id has GPA $s.gpa of 4.0 and $s.adjustedGpa of 5.0"
// Output: Student 1 has GPA 2.6 of 4.0 and 3.25 of 5.0

Mapping converting elements from the original array type to a new type, exactly as you would expect Groovy's List#collect(Closure) to work. In this case, each closure invocation happens on a different thread and the invocation occurs lazily as needed. In this particular example, however, each closure executes on the same thread as the script. Why? The sequentially() method returns the ParralelArray as an Iterable without parallel evaluation. Oops! Be careful when mixing Collection and ParallelArray functions.

def allGpas = students
   .withMapping({ it.gpa } as ObjectToDouble)

def perfectScores = allGpas.sequentially().inject(0, {acc, item ->
   if (item == 4.0) return acc + 1
   else return acc
 })

println "$perfectScores students have a 4.0 average"
// Output: 24 students have a 4.0 average

Reduce returns a reduction of the ParallelArray. The Groovy analog is Collection#inject(Object, Closure). In this example you see how easy it is to chain a bunch of aggregate operators together. Again, each closure is invoked on a different thread as needed.

double bestGpa = students
   .withFilter({ it.graduationYear == 2008 } as Predicate)
   .withMapping({ it.gpa } as ObjectToDouble)
   .reduce({previous, current -> Math.max(previous, current) } as DoubleReducer, 0)

println "The best GPA is $bestGpa"
//Output: The best PGA is 4.0

Summarization is a convenience for all the common aggregate operators, like size(), min(), max(), indexOfMin(), indexOfMax(), sum(), and average().

SummaryStatistics summary = students
   .withFilter({ it.graduationYear == 2008 } as Predicate)
   .withMapping({ it.gpa } as ObjectToDouble)
   .summary()

ParallelArray.metaClass     //add [] operators for convenience
ParallelArray.metaClass.getAt = { x -> delegate.get(x) }

def worstStudent = students[summary.indexOfMin()]
def bestStudent = students[summary.indexOfMax()]
println "Worst student: $worstStudent.id had GPA $worstStudent.gpa"
println "Best student: $bestStudent.id had GPA $bestStudent.gpa"
println "Average GPA: ${summary.average()}"

Summary
The Fork/Join ParallelArray type and libraries are very cool, even if it lacks some of the conveniences expected from Groovy libraries. The tasks and actions, on the other hand, were much harder to grasp and work with. Defining an algorithm recursively is one thing to learn, but then having to figure out how to fork the work is another, more difficult problem.

There's also the issue of the verbose types the framework uses to represent function types. The public class Ops, for instance, is a convenience class that defines 132 different function types. Ouch, this is too much to remember, and it seems like you could use generics to define 3 generic function types rather than the 132 shown in the docs. I shudder to think how hard this would be to write without an IDE with auto- import and type discovery. Hooray for static typing tools.

Working with the ParallelArray class was a breeze from Groovy, given that the IDE helps you cast closures to the correct type. I can't imagine enjoying using the framework from Java with the proliferation of anonymous inner classes that it would require. I hesitate to say that the BGGA closure proposal would be that much better... the example closures fit in one line of code now, but adding generics and type definitions would probably push these to multiple lines, at which point it might become a mess to read. A version of the framework was made compatible with the BGGA proposal, but recent signs point to no closures in Java 7. Hooray for Groovy for supporting this great today.

The ParallelArray class is wonderfully useful, but where are ParallelList and ParallelMap? Tieing the library to an array type limits the usefulness, and pretty much rules infinite lists out (I know, I know, it's an unbearable loss). I'd like to see these two types added to the library before release.

Also, why the inconsistent naming convention? The reduce function is called reduce, apply apply, but map is called withMapping and filter is called withFilter. This is another area I humbly submit for cleanup before release.

None of this solves the issue of mutable state within a multi-threaded environment. I heard second-hand that deadlock issues doomed the CILK project, with F/J is partly based on. Tasks are mutable by nature because the result is filled in at a later date, and it is left up to the developer to code defensively to avoid deadlocks, which pushes the developer back down into the assembly language of concurrency: synchronized and volatile. Is a parallelization framework not based on immutability doomed to failure?

Also, my intent was to use Terracotta to create a distributed Fork/Join executor. But the executors lack the fine grain task control that distributed systems would need. In the case of a failing fork, there is no way for the executor to recall the forked problem piece and reassign it to a different worker or add it back into the deque. The CommonJ framework seems up to the task, which is what Terracotta recommended using on their forum and docs. Is a parallelization framework not based on distributed tasks doomed to failure?

AND ANOTHER THING! I easily downloaded the framework, browsed the docs, and started using it with Java 6. Why does this need to be in the JDK? With Bea's CommonJ and Functional Java there seems to be movement in this space from many sources. I have no idea why F/J needs to be part of the JDK rather than just another 3rd party library.

I didn't mean for the gripes section to get so long, and it might be unfair to criticize a project that is still in the formalization process. It's a cool project and it's great to see some functional programming coming to the JDK. The ExecutorServices were a giant leap forward when included in Java 5, and perhaps the finished version of F/J will be the same thing for Java 7!

Thursday, April 3, 2008

10 Best IDEA Inspections You're Not Using

Let's clarify. By "Best" I mean the ones I like. By "You're Not Using" I mean they aren't enabled by default. By "Inspections" I mean those little code warnings that IDEA gives you which can be configured under Settings (Ctrl+Alt+S) Errors (6). You have gone an edited your inspection settings, haven't you?

Anyway, here's my list of my most favorite sweet awesome inspections.

1. Illegal package dependencies
Got a "common" package where StringUtils sits? So you've told everyone not to create dependencies out of common back into your product specific packages, yet they still do it, don't they? This inspection allows you to define package dependencies that are allowed or not allowed, and you'll receive warnings when you violate them. So common.StringUtils can be referenced from product.MyClass, but referencing product.MyClass from common.StringUtils creates an error. This is actually enabled by default, but no hierarchy rules are defined. Learning about IDEA scopes is a good thing... go play with this.

2. 'this' reference escaped in object construction
Letting a reference to 'this' escape inside your constructor opens the door to allowing other objects to start using you before your constructor has completed. If you have an immutable object, whose member fields are all set within its constructor, and you let a 'this' reference escape... then it is entirely possible that some other object will see some of those private final fields with null values. This is not good for concurrency (yet sometimes not easily avoided).

3. Field accessed in both synchronized and unsynchronized contexts
Partial synchronization leads to inconsistent data structures. Sometimes your field is updated from multiple threads correctly and sometimes it isn't. As far as I know, partial synchronization has no real world uses and is almost always a mistake. This inspection is a life saver for a condition that is easily overlooked.

4. non private field accessed in synchronized context
This is another case of partial synchronization. Maybe you've synchronized perfectly within your class... yet if you let field references escape out of your synchronized object then its likely that the callers will not update that field in a way consistent with your synchronization policy. Simply put: there's no guarantee that external writes to fields will be seen within your object. Partial synchronization leads to inconsistent data structures, as said eariler.

5. Synchronization on 'this' and 'synchronized' method
Hey, it's a two-fer! These two separate inspections are closely related. Some say it's a style issue, but I feel strongly that 'this' should not be part of your synchronization policy. It effectively makes your synchronization policy public, rather than private and encapsulated, and it opens the door for denial of service attacks on your object. In my opinion, synchronization on 'this' and synchronized methods (which are the same thing, really) should be avoided, and this inspection can help you clean that up.

6. return of collection or array field
Everyone goes through phases where you are introduced to a new idea, fall in love with it, overuse it, and then finally back off from using it so extensively. With immutable objects, I'm still waiting for the phase where I learn not to use them so much. So far it hasn't happened. Well, returning a collection or array out of a public method is a great way to take your nicely designed immutable object and let anyone at all monkey with its state. Collections.unmodifiable* is your friend. Too bad that in legacy codebases returning arrays is so ubiquitous that turning on this inspection produces more noise than signal.

7. call to 'Thread.run()'
I do this all the time. Thread.run() either runs the enclosed Runnable once on the current thread, or it does nothing (if the Thread was not constructed with a Runnable). You want Thread.start(). This issue has never made it into production code for me, its more of a code-time "oh yeah, oops" check.

8. expression.equals("literal") rather than "literal".equals(expression)
The first example might result in a NullPointerException, the second never will. Simple and useful.

9. equals method does not check class of parameter
I open Effective Java every time I have to write an equals() method. And when I'm done I flip pages so I can write a proper hashCode(). Shouldn't it be easier than this? Anyway, this inspection has kicked in a couple times when I wrote bad equals() methods. For me, it's worth turning on.

10. method may be static
I saved the best for last. This is listed under the category "Performance Issues" in IDEA, but I consider it a design issue. If a private method can be made static, then go ahead and make it static. But realize it is a code smell. Once your object contains a handful of static private methods you should start wondering what they're all doing there. If your object contains a bunch of private static methods, then how cohesive is it really? Chances are you have a new class waiting to be extracted. This is the gist of emergent design: adhere to principles and refactor as necessary. So what about public methods? What in the world is a public method doing on your object that has no dependency on any fields within the object? This is certainly a code smell. The problem is that the "auto-fix" for the inspection is to apply the static keyword. Nooooo. That's not what you want to do. A public method without any dependency on object state can't possibly be part of an object that has one clearly stated charter. It's just not cohesive and should be placed somewhere else. So: if the method is private, accept the auto-fix, but if the method is public then don't. The irony is that I've never actually received this warning on a public method I've written. If you're doing test driven development then I don't quite see how this scenario would arise, barring a lapse in judgment. But not everyone is doing that and there are exceptions to every rule.

Well, that's 10 tips. I hear posts with top 10 lists generate more traffic, so let the pageview flood begin!

Monday, January 7, 2008

'this' is not the synchronization you're looking for

I'm no expert on multi-threaded applications. But I am quite good at juggling and making balloon animals. So if this post about Java synchronization strikes you as amateur, misguided, or wrong, then you have every right to call me a clown in the comments. That's what they're there for.

I suspect I'm not unique among Java developers in how I was introduced to concurrency issues. It started with a bug report, this time from some terribly misguided attempt at using threads to increase performance. Ugh. I worked out the easiest solution, which was to add method synchronization. It worked, that's good, so what's the issue? Liveness is the issue. Liveness and information hiding.

Consider this simple example of a class with a synchronized method:

class Counter {
 private int count = 0

 public synchronized void countUp() {
   (1..10000).each {
     count++
   }
 }
}

In Groovy the statement (1..10000).each {...} will perform something 10,000 times, so this method atomically increases the counter to 10000. (Also, the Java semi-colons aren't needed). So what's this about liveness? Consider the naive usage by two different threads:

def counter = new Counter()

//start a 2nd thread that synchronizes on counter
new Thread(
 {
   synchronized(counter) {
     //pretend to do some work
     Thread.sleep(5000);
   }
 }).start()

//try to count up on this thread
counter.countUp()

The problem is that the synchronized method from the first example and the synchronized statement in the second example are using the same lock (or "monitor"). If the worker thread that sleeps for 5 seconds acquires the lock first then it will starve the thread trying to call count Up( ). Sometimes the code here completes in 1 second, and sometimes it takes 6. Probably not the type of error you wanted to expose when you originally added the synchronized keyword. So how does this work?

A synchronized method uses the lock associated with the "this" reference. So any synchronized method:

synchronized void foo() {
 //do something 
};

Can just as equally be written like this:

void foo() {
 synchronized (this) {
   //do something
 }
};

Using this approach it becomes pretty obvious that anyone with a reference to the object could create a synchronized block on that object, and thus interfere with the liveness of the method call. Any time you use method synchronization you are exposing your synchronization policy publicly. Is how your object is synchronized really something you want publicly visible? I'd say not. This is surely an implementation detail. And just like you don't expose object state by not using public fields, I suggest you hide your synchronization policy by not using method synchronization. This is simple information hiding, but I overlooked it for years.

So what do I do now? I synchronize on an internal field like so:

class Counter {
 private int count = 0
 private Object lock = new Object()

 public void countUpSafely() {
   synchronized (lock) {
     (1..10000).each {
       count++
     }
   }
 }
}

The additional benefit is that your synchronization token can now carry with it an intent revealing name. Trust me, the maintenance programmers in the future will thank you. And if you ever want to find the usage of the object's lock, then the private variable is there for the searching. And that means you're free to change the implementation knowing that users of the code won't be affected.

I'm not sure who said it, but a reviewer of Brian Goetz's Java Concurrency in Practice said that, after reading the book, he was convinced that the only person in the world that really did understand concurrency was Goetz himself. It's an amazing book, worth several reads, and I echo the reviewer's thoughts entirely.

Now it's your turn to sound off. I'll give you a start. "Yo clown, stick to your circus skills... [insert concurrency edge case here]." And for this wanting to tinker with the code, the entire Groovy script is available here. Thanks!

Tuesday, September 18, 2007

Java Performance Myths

I recently looked at the Jave Performance Myths presentation that Cliff Click gave at JavaOne. Some of the results were not surprising: final keyword does not help performance, autoboxing is nearly free. But other results were more interesting...

1. Try Catch Blocks are free - Ah, the smug feeling associated with quoting Joshua Block's Effective Java #39, Use Exceptions Only for Exceptional Conditions. This Effective Java entry says to use exceptions only when an exception has truly occurred. The rationale behind this is twofold: exception handling on JVMs is slow, and the code produced is obscure and confusing. It looks like the first reason is now longer true, but I'd still say that using exceptions to exit looks is poor code because it does not reveal intentions the way normal exiting of loops does.

2. Synchronization is cheaper than bugs - In Java Concurrency in Practice, Brian Goetz describes a proper synchronization policy as one that manages concurrent access to an object's state without violating any invariants or postconditions of that object. The speed at which critical sections can be entered and exited really shouldn't effect this policy. While I'm glad that the speed of synchronization has been increased, I don't see how this increase would effect software design. Originally, this myth seemed a lot more important than I think it is.

3. Garbage Collection Works, Use Object Pools sparingly - Garbage collection on lightweight and medium weight objects (think Hashtable) is now faster than if object pools and caching were used. Only under rare circumstances should we be using something like the flyweight pattern. This is what Neal Ford and Stuart Halloway have been saying for some time... many of the GoF design patterns are recipes for Java and C++ that don't have relevance in other languages. And now some of the patterns don't have relevance in Java.

4. Enum values() calls are expensive - The enum data type is Java 5 was supposed to be syntactic sugar and hold no performance implications. It looks like if-blocks and for-loops invoke enum.values() under the covers and this method holds a significant performance hit when called repeatedly. This can be avioded by moving calls to enum.values() out of the loop. Of course, profiling data should support this move before any decision to optimize is made (Effective Java 37 - Optimize Judiciously).

Check out the original Jave Performance Myths presentation for more info!

behind the times