Ruminations of a Programmer

Tuesday, May 29, 2007

Parameterizing Test Methods - Revisited

In my last post on refactoring and unit tests, Steve Freeman noted in one of his blog comments ..

I can't see the point in the data provider. Why don't you just call the method three times with different arguments?

In the process of refactoring, I had used TestNG's parameterized test method feature to decouple test data from test methods. While invoking a test method multiple times with different parameter gives me the same functional result, parameterized test methods provide great scalability and a neat separation between the test logic and test data suite. Complex business methods may involve lots of complicated calculations and conditionals - a unit test of the method needs to ensure all conditionals are exercised to ensure full coverage. This may result in lots of data combinations, which if handled directly through separate invocations of the same test methods may result in real explosion of test cases.

The feature of parameterizing test methods in TestNG through DataProviders provide a great way to DRY up the test methods. I can have the test data combination either in testNG.xml or in separate factory methods with specific signatures that provide huge extensibility. The test method can then have parameters and an annotation that specify the data source. Take the following example from the earlier post :

public class AccruedInterestCalculatorTest extends MockControl {
  //..
  @DataProvider(name = "test1")
  public Object[][] createAccruedInterestCalculationData() {
    return new Object[][] {
      {20, BigDecimal.valueOf(0.5), BigDecimal.valueOf(1000), BigDecimal.valueOf(30)},
      {20, BigDecimal.valueOf(0.5), BigDecimal.valueOf(2000), BigDecimal.valueOf(60)},
      {30, BigDecimal.valueOf(0.5), BigDecimal.valueOf(2000), BigDecimal.valueOf(90)},
    };
  }

  //..

  // the test method now accepts parameters
  //..
  @Test(dataProvider = "test1")
  public void normalAccruedInterest(int days,
    BigDecimal rate, BigDecimal principal, BigDecimal interestGold) {
    new CalculatorScaffold()
      .on(principal)
      .forDays(days)
      .at(rate)
      .calculate()
      .andCheck(interestGold);
  }
  //..
}

The calculation of the accrued interest is an extremely complicated business method which needs to consider lots of alternate routes of computation and contextual data. A typical data set for testing the calculation effectively will result in repeating the test method invocation lots of times. Instead of that, the parameterization technique allows me to declare all the varying components as parameters of the test method. And the data source comes from the method createAccruedInterestCalculationData(), suitably annotated with a data source name.

Making the test method parameterized has the additional side-effect of decoupling the data set from the logic - the entire data set can be populated by non-programmers as well, typically domain guys who can enrich the test coverage by simply putting in additional data points in the factory method that generates the data. We have actually used this technique in a real life project where domain experts took part in enriching parameterized unit test cases through data fulfillment in data provider methods.

There's actually more to it. The data provider method can be used to fetch data from other complicated data sources as well e.g. database, XML etc. and with logic to generate data also. All these can be encapsulated in the DataProvider methods and transparently fed to the test methods. The logic of test data generation is completely encapsulated outside the test method. I found this feature of TestNG a real cracker ..

Sunday, May 20, 2007

Refactoring Unit Test Methods to speak the Domain

I do TDD, but I do not start writing unit tests before writing production code. It's not that I didn't try - I gave it an honest attempt for quite some time, before I gave up. Somehow the approach of writing tests first does not seem intuitive to me. At least I need to figure out the collaborators and the initial subset of public contracts before my TestNG plugin kicks in. Since then, it's all a game of collaborative pingpong between my production code and test code. Having said that, I honestly believe that an exhaustive unit test suite is no less important than the code itself. I take every pain to ensure that my unit tests are well organized, properly refactored and follow all principles of good programming that I espouse while writing the actual production code.

Refactoring is one practice that I preach, teach and encourage vigorously to my teammates. Same for unit tests - if I strive to make my production code speak the language of the domain, so should be my unit tests. This post is about a similar refactoring exercise to make unit tests speak the DSL. At the end of this effort of vigorous iterations and cycles of refactoring, we could achieve a well engineered unit test suite, which can probably be enhanced by the domain guys with minimum of programming knowledge.

The Class Under Test

The following is the fragment of a simplified version of an AccruedInterestCalculator, a domain class, which calculates the interest accrued for a client over a period of time. For brevity, I have a very simplified version of the class with only very simple domain logic. The idea is to develop a fluent unit test suite for this class that speaks the domain language through an iterative process of refactoring. I will be using the artillery consisting of the state of the art unit testing framework (TestNG), a dynamic mocking framework (EasyMock) and the most powerful weapon in programming - merciless refactoring.

public class AccruedInterestCalculator {
  //..
  //..
  private IAccruedDaysCalculator accruedDaysCalculator;
  private IInterestRateCalculator interestRateCalculator;
 
  public final BigDecimal calculateAccruedInterest(final BigDecimal principal) 
    throws NoInterestAccruedException {
    int days = accruedDaysCalculator.calculateAccruedDays();
    if (days == 0) {
      throw new NoInterestAccruedException("Zero accrual days for principal " + principal);
    }
    BigDecimal rate = interestRateCalculator.calculateRate();
    BigDecimal years = BigDecimal.valueOf(days).divide(BigDecimal.valueOf(365), 2, RoundingMode.UP);
    return principal.multiply(years).multiply(rate);
  }

  AccruedInterestCalculator setAccruedDaysCalculator(IAccruedDaysCalculator accruedDaysCalculator) {
    this.accruedDaysCalculator = accruedDaysCalculator;
    return this;
  }

  AccruedInterestCalculator setInterestRateCalculator(IInterestRateCalculator interestRateCalculator) {
    this.interestRateCalculator = interestRateCalculator;
    return this;
  }
}

and the two collaborators ..

public interface IAccruedDaysCalculator {
  int calculateAccruedDays();
}

public interface IInterestRateCalculator {
  BigDecimal calculateRate();
}

Mocks are useful, but Noisy ..

I am a big fan of using mocks for unit testing - EasyMock is my choice of dynamic mocking framework. Mocks provide the most seamless way of handling collaborators while writing unit tests for a class. However, often, I find mocks introducing lots of boilerplate codes which need to be repeated for every test method that I write ..

public class AccruedInterestCalculatorTest {
  protected IMocksControl mockControl;
 
  @BeforeMethod
  public final void setup() {
    mockControl = EasyMock.createControl();
  }
 
  @Test
  public void normalAccruedInterestCalculation() {
    IAccruedDaysCalculator acalc = mockControl.createMock(IAccruedDaysCalculator.class);
    IInterestRateCalculator icalc = mockControl.createMock(IInterestRateCalculator.class);
  
    expect(acalc.calculateAccruedDays()).andReturn(2000);
    expect(icalc.calculateRate()).andReturn(BigDecimal.valueOf(0.5));
  
    mockControl.replay();
  
    BigDecimal interest =
      new AccruedInterestCalculator()
        .setAccruedDaysCalculator(acalc)
        .setInterestRateCalculator(icalc)
        .calculateAccruedInterest(BigDecimal.valueOf(2000.00));
  
    mockControl.verify();
  }
}

Refactoring! Refactoring!

Have a look at the above test method - the mock framework setup and controls pollute the actual business logic that I would like to test. Surely, not a very domain friendly approach. As Howard has pointed to, in one of his NFJS writings, we can abstract away the mock stuff into separate methods within the test class, or still better in a separate MockControl class altogether. This makes the actual test class lighter and free of some of the noise. Here is the snapshot after one round of refactoring ..

// mock controls delegated to class MockControl
public class AccruedInterestCalculatorTest extends MockControl {
 
  @BeforeMethod
  public final void setup() {
    mockControl = EasyMock.createControl();
  }
 
  @Test
  public void normalAccruedInterestCalculation() {
    IAccruedDaysCalculator acalc = newMock(IAccruedDaysCalculator.class);
    IInterestRateCalculator icalc = newMock(IInterestRateCalculator.class);
  
    expect(acalc.calculateAccruedDays()).andReturn(2000);
    expect(icalc.calculateRate()).andReturn(BigDecimal.valueOf(0.5));
  
    replay();
  
    BigDecimal interest =
      new AccruedInterestCalculator()
        .setAccruedDaysCalculator(acalc)
        .setInterestRateCalculator(icalc)
        .calculateAccruedInterest(BigDecimal.valueOf(2000.00));
  
    verify();
  }
}

and the new MockControl class ..

public abstract class MockControl {
  protected IMocksControl mockControl;
 
  protected final <T> T newMock(Class<T> mockClass) {
    return mockControl.createMock(mockClass);
  }
 
  protected final void replay() {
    mockControl.replay();
  }
 
  protected final void verify() {
    mockControl.verify();
  }
}

The test class AccruedInterestCalculatorTest is now cleaner and the test method is lighter in baggage from the guts of the mock calls. But still it is not sufficiently close to speaking the domain language. One of the litmus tests which we often do to check the domain friendliness of unit tests is to ask a domain guy to explain the unit test methods (annotated with @Test) and, if possible, to enhance them. This will definitely be a smell to them, with the remaining litterings of mock creation and training still around. The domain guy in this case, nods a big NO, Eclipse kicks in, and we start the next level of refactoring.

One thing strikes me - each test method is a composition of the following steps :

create mocks

setup mocks

replay

do the actual stuff

verify

And refactoring provides me the ideal way to scaffold all of these behind fluent interfaces for the contract that we plan to test. How about a scaffolding class that encapsulates these steps ? I keep the scaffold as a private inner class and try to localize all the mockeries in one place. And the scaffold can always expose fluent interfaces to be used by the test methods. Here is what we have after a couple of more rounds of iterative refactoring ..

public class AccruedInterestCalculatorTest extends MockControl {
  private IAccruedDaysCalculator acalc;
  private IInterestRateCalculator icalc;
 
  @BeforeMethod
  public final void setup() {
    mockControl = EasyMock.createControl();
    acalc = newMock(IAccruedDaysCalculator.class);
    icalc = newMock(IInterestRateCalculator.class);
  }
 
  // the scaffold class encapsulating all mock methods
  private class CalculatorScaffold {
    private BigDecimal principal;
    private int days;
    private BigDecimal rate;
    private BigDecimal interest;
  
    CalculatorScaffold on(BigDecimal principal) {
      this.principal = principal;
      return this;
    }
  
    CalculatorScaffold forDays(int days) {
      this.days = days;
      return this;
    }
  
    CalculatorScaffold at(BigDecimal rate) {
      this.rate = rate;
      return this;
    }
  
    CalculatorScaffold calculate() {
      expect(acalc.calculateAccruedDays()).andReturn(days);
      expect(icalc.calculateRate()).andReturn(rate);

      replay();
   
      interest =
        new AccruedInterestCalculator()
        .setAccruedDaysCalculator(acalc)
        .setInterestRateCalculator(icalc)
        .calculateAccruedInterest(principal);
   
      verify();
      return this;
    }
  
    void andCheck(BigDecimal interestGold) {
      assert interest.compareTo(interestGold) == 0;
    }
  }

  //..
  //.. the actual test methods
 
  @Test
  public void normalAccruedInterest() {
    new CalculatorScaffold()
      .on(BigDecimal.valueOf(1000.00))
      .forDays(days)
      .at(BigDecimal.valueOf(0.5))
      .calculate()
      .andCheck(BigDecimal.valueOf(30.0));
  }

  @Test
  public void normalAccruedInterest() {
      new CalculatorScaffold()
      .on(BigDecimal.valueOf(2000.00))
      .forDays(days)
      .at(BigDecimal.valueOf(0.5))
      .calculate()
      .andCheck(BigDecimal.valueOf(60.0));
  }

  //..
  //.. other methods
}

This class has test methods which now look more closer to the domain, with all mock controls refactored away to the scaffold class. Do you think the domain guys will be able to add more methods to add to the richness of coverage ? The accrued interest calculation is actually quite complicated with more collaborators and more domain logic than what I have painted here. Hence it is quite natural that we need to enrich the test cases with more possibilities and test coverage. Instead of making separate methods which work on various combinations of days, rate and principal (and other factors in reality), why not DRY them up with parameterized tests ?

Parameterized Tests in TestNG

TestNG provides a great feature for providing parameters in test methods using the DataProvider annotation. DRY up your test methods with parameters .. here is what it looks like in our case ..

public class AccruedInterestCalculatorTest extends MockControl {
  private IAccruedDaysCalculator acalc;
  private IInterestRateCalculator icalc;
 
  @BeforeMethod
  public final void setup() {
    mockControl = EasyMock.createControl();
    acalc = newMock(IAccruedDaysCalculator.class);
    icalc = newMock(IInterestRateCalculator.class);
  }
 
  @DataProvider(name = "test1")
  public Object[][] createAccruedInterestCalculationData() {
    return new Object[][] {
      {20, BigDecimal.valueOf(0.5), BigDecimal.valueOf(1000), BigDecimal.valueOf(30)},
      {20, BigDecimal.valueOf(0.5), BigDecimal.valueOf(2000), BigDecimal.valueOf(60)},
      {30, BigDecimal.valueOf(0.5), BigDecimal.valueOf(2000), BigDecimal.valueOf(90)},
    };
  }

  private class CalculatorScaffold {
    //..
    //.. same as above
  }
 
  // the test method now accepts parameters
  //..
  @Test(dataProvider = "test1")
  public void normalAccruedInterest(int days, 
    BigDecimal rate, BigDecimal principal, BigDecimal interestGold) {
    new CalculatorScaffold()
      .on(principal)
      .forDays(days)
      .at(rate)
      .calculate()
      .andCheck(interestGold);
  }
 
  @Test(expectedExceptions = {NoInterestAccruedException.class})
  public void zeroAccruedDays() {
    new CalculatorScaffold()
      .on(BigDecimal.valueOf(2000))
      .forDays(0)
      .at(BigDecimal.valueOf(0.5))
      .calculate()
      .andCheck(BigDecimal.valueOf(90));
  }
}

The test methods look DRY, speak the domain language and do no longer have to administer the mock controls. All test conditions and results are now completely decoupled from the test methods and fed to them through the annotated provider methods. Now the domain guy is happy!

Thursday, May 10, 2007

XML - Not for Human Consumption

I have never been a big fan of using XML as a language for human interaction. I always felt that XML is too noisy for human comprehension and never a pleasing sight for your eyes. In an earlier post I had pitched in with the idea of using Lisp s-expressions as executable XML instead of the plethora of angle brackets polluting your comprehension power. XML is meant for machine interpretation, the hierarchical processing of XML documents gives you the raw power when you are programming for the machine. But, after all, Programs must be written for people to read, and only incidentally for machines to execute (from SICP).

With the emergence of the modern stream of scripting languages, XML is definitely taking a backseat in handling issues like software builds and configuration management. People are no longer amused with the XML hell of Maven and have been desperately looking for alternatives. Buildr, a drop-in replacement for Maven uses Ruby (inspired by Rake), SCons uses Python scripts, Groovy has also been used in the configuration space for Jini-based ComputeCycles project (via Artima). XML has also started looking like just yet another option for configuring your beans in the Spring community. We are just spoilt with other options of configuration - Spring JavaConfig uses Java configuration, Springy does it with JRuby, Groovy SpringBuilder does it with Groovy.

Configuration and builds of systems are nontrivial activities and need a mix of both declarative and procedural programming. Using XML as the brute force approach towards this has resulted in complex hierarchical structures, too complex for human comprehension. A perfect example in hand is Maven which attempts to do too many things with XML. The result is extremely complex XML hell much to the anguish of developers and build managers. A big build system configured using Maven becomes a maintenance nightmare in no time.

People are getting increasingly used to the comforts of DSLs and configurations of large systems have never been too trivial an artifact. Hence it is logical that we would like to have something which provides a cool humane interface. And XML is definitely not one of them ..

Wednesday, May 02, 2007

Unit Tests - Another Great Tool for Refactoring

In my last post on unit testing, I had talked about some techniques on how to make your classes unit test friendly. Unit testing is the bedrock of writing good software - never compromise on writing unit tests for the next class that you design. Unit testing is about testing classes in isolation - get them out of the natural environment and take them for a ride. That way unit tests give you a feedback about the correctness of the behavior of the *unit* as a whole. The other aspect of unit tests which I find extremely useful while designing classes is the support for refactoring. Unit tests are great vehicles for refactoring object collaborations - thus there is a 2-way feedback loop between class design and its unit test design. And once you have cycled through rounds of iteration in this feedback loop, you can expect to have stabilized your abstractions and collaborations.

Here's an example ..

class Payroll {
  // ..
  // ..

  void calculateSalary(Employee emp) {
    BigDecimal charge = emp.getSalary();
    // calculate total salary including taxes
  }
}

When I look at the possible unit test for the class Payroll, the first thing that strikes me is the argument Employee in calculateSalary(). The first attempt tries to instantiate an Employee and invoke the method calculateSalary() from the test case :

class PayrollTest extends TestCase {
  // ..
  // ..

  public void testSalaryCalculation() {
    Employee emp = new Employee(..);
    // .. 
  }
}

A Mockery of sorts ..

Instantiating an Employee object brings into action lots of other machineries like the database layer, the ORM layer etc., which we already know should be best kept away from the unit test harness. Think Mocks ..

When you need to mock a concrete class, take a second look, if necessary take a third look as well. Mock roles, not objects and roles are best specified through interfaces. In the above example, we are really exploiting the fact that an employee is salaried and has a basic salary component for being employed in a company. In this method calculateSalary(), we are not concerned with the other details of the Employee class, which may be used in other context within the software.

Extract Interface

Now I have a new interface Salaried as :

interface Salaried {
  BigDecimal getBasicSalary();
}

and ..

public class Employee implements Salaried {
  // ..
}

The method calculateSalary() now takes an interface Salaried instead of the concrete class Employee - and the mock becomes idiomatic in the test case (I am using EasyMock as the mocking framework) :

class Payroll {
  // ..
  // ..

  public void calculateSalary(Salaried emp) {
    // ..
  }
}

class PayrollTest extends TestCase {
  // ..
  // ..

  public void testSalaryCalculation() {
    IBillable mock = createMock(Salaried .class);
    expect(mock.getBasicSalary()).andReturn(BigDecimal.valueOf(7000));
    replay(mock);
    // .. 
  }
}

Thus unit testing helped me use the Extract Interface principle and improve the collaboration between my classes. Notice how the class Payroll now collaborates with a more generic role Salaried instead of a concrete class Employee. Tomorrow if the company decides to employ contractors with different salary computation, we will have a separate abstraction :

public class Contractor implements Salaried {
  // ..
  // ..
  BigDecimal getBasicSalary() {
    // .. different implementation than Employee
  }
  // ..
}

but the class Payroll remains unaffected, since it now collaborates with Salaried and NOT any concrete implementation.

Monday, April 23, 2007

Executable XML (aka Lisp)

In a project that I have been working on for quite some time, the back office system receives XML messages from the front and middle office systems for processing. It is a securities trading and settlement system for one of the big financial houses of the world - typical messages are trades, settlements, position etc. which reach the back-office after the trade is made. Like any sane architect we have designed the system based on the Java EE stack (nowadays you never get fired for choosing Java ..) centered around a message oriented middleware transporting XML messages with gay abandon. The system has gone live for many implementations and has been delivering satisfactory throughput all over.

No complaints whatsoever, on the architecture, on the Java EE backbone, on the multitudes of XML machinery that goes behind the engineering harness of the millions of messages generated every day. If I were to architect the same system today (the existing one had been architected 3 years back), I, possibly would have gone for a similar stack, just for the sheer stability and robustness of the XML based technology and the plethora of toolset that XML offers today.

Why am I writing this blog then ?

Possibly I have been having extra caffeine of late, which has been taking away most of my sleep at night. It is 1 AM in the morning and I am still glued to two of my newest possessions in my bookshelf.

I had read some parts of SICP long back - rereading it now is like a rediscovery of many of the Aha! moments that I had last time and of course, lots of ruminations and discoveries this time as well. Based on the new found lights of Lispy (and s-expy) knowledge, I hope that some day I will be able to infuse my today's dreams and rumblings into a real life architecture. I am not sure if we will ever reach the stage of human evolution when Lisp and Scheme will be considered the bricks and mortar of enterprise architecture. Till then they will exist as the sexy counterparts of Java and C++, and will continue to allure all developers who have once committed the sin of being there and done that.

The XML Message - Today's Brick and Mortar

Here is how a sample trade message (simplified for clarity) looks in our system :

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE trd01 SYSTEM "trd01.dtd">
<trd01>
  <id>10234</id>
  <trade_type>equity</trade_type>
  <trade_date>2005-02-21T18:57:39</trade_date>
  <instrument>IBM</instrument>
  <value>10000</value>
  <trade_ccy>usd</trade_ccy>
  <settlement_info>
    <settle_date>2005-02-21T18:57:39</settle_date>
    <settle_ccy>usd</settle_ccy>
  </settlement_info>
</trd01>

We use XML parsers to parse this message, use all sorts of XPath expressions, XQuery and XSLT transformations to do all processing, querying and tearing apart the hierarchical structures that embody an XML message. The above XML message is *data* and we have tonnes of Java code processing the XML data, transforming them into business logic and persisting them in the database. So, we have the *code* (in Java) strictly separated from the *data* (in XML) using a small toolbox comprising of a bundle of XML parsers, XPath expressions, XSLT transformations, all packaged in a couple of dozens of third party jars (aka frameworks). The entire exercise is meant to make these *data* executable.

Executable Data - aka Lisp

Why not use a power that allows you to execute your data directly instead of creating unnecessary abstraction barriers in the name of OOP ? Steve Yeggey summarises it with elan :

The whole nasty "configuration" problem becomes incredibly more convenient in the Lisp world. No more stanza files, apache-config, .properties files, XML configuration files, Makefiles — all those lame, crappy, half-language creatures that you wish were executable, or at least loaded directly into your program without specialized processing. I know, I know — everyone raves about the power of separating your code and your data. That's because they're using languages that simply can't do a good job of representing data as code. But it's what you really want, or all the creepy half-languages wouldn't all evolve towards being Turing-complete, would they?

In Lisp, I can make the above data much more readable, clearer in intent, easier for the eyes and at the same time make it executable ..

(trd01
  (id 10234)
  (trade_type "equity")
  (trade_date "2005-02-21T18:57:39")
  (instrument "IBM")
  (value 10000)
  (trade_ccy "usd")
  (settle_info
    (settle_date "2005-02-24T18:57:39")
    (settle_ccy "usd"))))

Lisp is a language which was intended to be small with *no* syntax, where you have the power of macros to create your own syntax and roll it out into the language. I made each of the above tags separate Scheme functions (Oh! I was using Scheme btw), each one of which is capable of transforming itself into the desired functionality. As a result, the above data is also my code and directly executes. Another of those Aha! moments.

But my entire system is based on Java! Surely you are not telling me to change all of the guts to Scheme - are you ? My job will be at stake and I will never be able to convince my pointy haired boss that the trading back office system is running on Lisp. In fact, many people who dare to use Lisp in daytime projects are often careful to keep this a secret in the industry. And unless you have PG on your company board or blessed enough to get the favor of Y, this is a very useful tip.

Enter SISC

SISC is a lightweight, platform independent Scheme system targetting the Java Virtual Machine. It comes as a lightweight distribution (the core jar is 233 KB) and offers Scheme as a scripting language for Java. In SISC bridging is accomplished by a Java API for executing Scheme code and evaluating Scheme expressions, and a module that provides Scheme-level access to Java objects and implementation of Java interfaces in Scheme.

I can write Scheme modules and load it using Java api from my Java code, once I have bootstrapped the SISC runtime. Here is how I can initialize SISC from my Java application so as to enable my application use the Scheme functions :

// bootstrapping the SISC runtime
SeekableInputStream heap = new MemoryRandomAccessInputStream(
    getClass().getResourceAsStream("/sisc.shp"));
AppContext ctx = new AppContext();
ctx.addHeap(heap);
Interpreter interpreter = Context.enter(ctx);

and then I can use on the fly evaluation of Scheme functions as follows :

interpreter.eval("(load \"trd01.scm\")");
String s = interpreter.eval("(trd01 (id 10234) (trade_type "equity") ...)").toString();

There are quite a few variants of eval() that SISC offers, along with multiple modes of execution of Scheme code from the Java environment. For details have a look at their documentation. SISC also offers calling Java code from Scheme accessing Java classes through the extensible type system of SISC.

I do not dream about using SISC in any production code in the near foreseeable future. But just wanted to share my rants with all of you. In today's world, it is really raining programming languages - all scripting languages like Ruby, Python, JRuby, Groovy etc. are making inroads as the preferred glue language of today's enterprise architecture. But Lisp stands out as a solid robust language, with the exceptionally powerful code-as-data paradigm - I always felt Lisp was way ahead of its time. Possibly this is the time when Lisp needs to be reincarnated, all incompatibilities should be rubbed off the numerous versions and dialects of Lisp. Lisp code is executable data - it makes perfect sense to replace all frameworks that execute reams of code to process hierarchical structures as data by a single language.

Monday, April 16, 2007

Competition is healthy! Prototype Spring Bean Creation: Faster than ever before ..

In my last post I had mentioned about some performance benchmarks of Guice and Spring. In one of the applications which I had ported from Spring to Guice, I had an instance of a lookup-method injection, where a singleton service bean contained a prototype bean that needed to be looked up from the context. Here is the sample configuration XML :

<bean id="trade"
  class="org.dg.misc.Trade"
  scope="prototype">
</bean>

<bean id="abstractTradingService"
  class="org.dg.misc.AbstractTradingService"
  lazy-init="true">
  <lookup-method name="getTrade" bean="trade"/>
</bean>

I ran a performance benchmark suite to exercise 10,000 gets of the prototype bean :

BeanFactory factory = new XmlBeanFactory(
    new ClassPathResource("trade_context.xml"));
  
ITradingService ts = (ITradingService) factory.getBean(
    "abstractTradingService");
StopWatch stopWatch = new StopWatch();
stopWatch.start("lookupDemo");
    
for (int x = 0; x < 10000; x++) {
  ITrade trade = ts.getTrade();
  trade.calculateValue(null, null);
}
stopWatch.stop();
    
System.out.println("10000 gets took " + 
    stopWatch.getTotalTimeMillis() + " ms");

Spring 2.0.2 reported a timing of 359 milliseconds for the 10,000 gets. I performed the same exercise in Guice with a similar configuration :

public class TradeModule extends AbstractModule {
  @Override
  protected void configure() {
    bind(ITrade.class).to(Trade.class);
    bind(ITradingService.class).to(TradingService.class).in(Scopes.SINGLETON);
  }
}

and the corresponding test harness :

Injector injector = Guice.createInjector(new TradeModule());
  
long start = System.currentTimeMillis();
for(int i=0; i < 10000; ++i) {
  injector.getInstance(ITradingService.class).doTrade(injector.getInstance(ITrade.class));
}
long stop = System.currentTimeMillis();
System.out.println("10000 gets took " + (stop - start) + " ms");

For this exercise of 10,000 gets, Guice reported a timing of staggering 31 milliseconds.

Then a couple of days back Juergen Hoeller posted in the release news for Spring 2.0.4, that repeated creation of prototype bean instances has improved up to 12 times faster in this release. I decided to run the benchmark once again after a drop-in replacement of 2.0.2 jars by 2.0.4 ones. And voila ! Indeed there is a significant improvement in the figures. The same test harness now takes 109 milliseconds on the 2.0.4 jars. Looking at the changelog, you will notice several lineitems that have been addressed as part of improving bean instantiation timings.

This is what competition does even for the best .. Keep it up Spring guys ! Spring rocks !

Updated: Have a look at the comments by Bob and the followups for some more staggering benchmark results.

Monday, April 09, 2007

Guiced! Experience Porting a Spring Application to Guice

Yes, I got one of my Spring-Hibernate-JPA applications ported to use Guice for Dependency Injection. The application is a medium sized one and did not contain many of the corner features which have been discussed aggressively amongst the blogebrities. But now that I have the satisfaction of porting one complete application to Guice, I must say there are truly *lots of* goodies that this crazy bob creation has in it. Some of them I had mentioned in my earlier rants on Guice, many of them were hiccups, which came up primarily because I had only been a newbie with Guice - many of my questions and confusions were clarified by the experts, in my blog comments as well as in the Guice developer's mailing list. Guice is definitely an offering to look out for in the space of IoC containers. I liked what I saw in it .. in this post I will share some of my experiences.

Disclaimer : I will only focus on issues that I faced and solved in course of my porting of the application. The application did not have many blocker features for Guice - hence I do not claim that *all* applications can be ported completely using Guice 1.0.

Really Guicy!

Before I go into the details, here are some of the guicy attributes of Guice as a Java framework ..

The Java 5 usage - I always believe that backward compatibility is not a do-all end-all in a framework evolution. At some stage u need to educate the users as well, to migrate to newer versions and use the advanced features of your framework, which will make their applications more performant and maintainable. This has been one of my complaints against Java as well. It is really heartening to see Guice designers base their engine on Java 5 and use all advanced features like metadata and generics to the fullest. This has definitely made Guice more concise, precise and DRY.

Type-safety - This is possibly the loudest slogan of Guice as a DI container. It's Java generics all the way and although you can subvert the typesystem (more on this later) and hide some of your bindings from Guice, it is more of an exception. All api s in Guice are strongly typed, hence your application remains blessed with the safety of typed injections that Guice encourages.

Concise, minimal, well-designed api set with extremely verbose and explanatory error messages.

Let's Guice it up ..

Here it is. The application has been running happily in a Spring-Hibernate-JPA architecture. I took up the porting exercise purely out of academic interest and to get a first hand feel of trying to validate Guice against a real life non trivial application. I was somewhat aware of the nuances that I needed to figure out beforehand, and I classified my injection points into the following three groups :

points that I had complete control of and where I knew I would be able to inject my annotations

services with multiple implementations being used in the same application - luckily I did not have many such instances

third party POJOs that I could not invade into

The first ones were pretty cool and I happily added @Inject with appropriate bindings in the module. The injection points became very explicit and the class became more readable as far as external dependencies were concerned.

I did not have many occurences of multiple implementations of the same service being used in the same application deployment. As far as the application is concerned, we needed different implementations for different deployments, and hence I had different modules in place for them. In one of the cases, I needed to address the problem within the same instance of the application, which I did the usual way, using annotations like the following :

bind(IBarService.class)
  .to(BarService.class)
  .in(Scopes.SINGLETON);
  
bind(IBarService.class)
  .annotatedWith(Gold.class)
  .to(SpecialBarService.class)
  .in(Scopes.SINGLETON);

Injecting into third party POJOs is one of the issues that has been debated over a lot in the various blogs and forums. Here are some of the cases and how I addressed them in my application :

Case 1: Use Provider<> along with constructor injection : I used this pattern to address POJOs which we were using as part of another component and which used constructor injection. e.g.

public class ThirdPartyBeanProvider implements Provider<ThirdPartyBean> {
 
  final private IFooService fooService;
  final private IBarService barService;
 
  @Inject
  public ThirdPartyBeanProvider(final IFooService fooService, final IBarService barService) {
    this.fooService = fooService;
    this.barService = barService;
  }

  public ThirdPartyBean get() {
    return new ThirdPartyBean(fooService, barService);
  }
}

Case 2: These beans were using setter injection and I had some cases where multiple implementations of a service where being used in the same deployment of the application. Use Provider<> along with annotations to differentiate the multiple implementations of an interface. Luckily I did not have many of these cases, otherwise it would have been a bit troublesome with annotation explosion. But, at the same time, I think there may not be lots of use cases which need this in typical applications for a single deployment. e.g.

public class AnotherThirdPartyBeanProvider implements Provider<AnotherThirdPartyBean> {

  final private IFooService fooService;
  final private @Inject @Gold IBarService barService;
 
  @Inject
  public AnotherThirdPartyBeanProvider(final IFooService fooService, final IBarService barService) {
    this.fooService = fooService;
    this.barService = barService;
  }

  public AnotherThirdPartyBean get() {
    AnotherThirdPartyBean atb = new AnotherThirdPartyBean();
    atb.setFooService(fooService);
    atb.setBarService(barService);
    return atb;
  }
}

Case 3: Here I had lots of POJOs using setter injections that needed to be handled the same way. I would have to write lots of providers, but for this dynamic gem from Kevin which I dug up in a thread in the developer's mailing list. Here the type system is a bit subverted, and Guice does not have full information of all bindings. But, hey .. for porting applications, AutowiringProvider<> gave me a great way to solve this issue. Here's straight out of the class javadoc :

A provider which injects the instances it provides using an "auto-wiring" approach, rather than requiring {@link Inject @Inject} annotations. This provider requires a Class to be specified, which is the concrete type of the objects to be provided. It must be hand-instantiated by your {@link com.google.inject.Module}, or subclassed with an injectable constructor (often simply the default constructor).

And I used it like a charm to set up the bindings of my POJOs.

Finally, here is a snapshot of a representative Module class, with actual class names changed for demonstration purposes :

public class MyModule extends AbstractModule {

  @Override
  protected void configure() {
    bind(IFooService.class)
      .to(FooService.class)
      .in(Scopes.SINGLETON);
  
    bind(IBarService.class)
      .to(BarService.class)
      .in(Scopes.SINGLETON);
  
    bind(IBarService.class)
      .annotatedWith(Gold.class)
      .to(SpecialBarService.class)
      .in(Scopes.SINGLETON);
  
    bind(ThirdPartyBean.class)
      .toProvider(ThirdPartyBeanProvider.class);
  
    bind(YetAnotherThirdPartyBean.class)
      .toProvider(new AutowiringProvider<YetAnotherThirdPartyBean>(YetAnotherThirdPartyBean.class));
  
    bind(AnotherThirdPartyBean.class)
      .toProvider(AnotherThirdPartyBeanProvider.class);
  }
}

Injecting the EntityManager

In an implementation of the Repository pattern (a la Domain Driven Design), I was using JPA with Hibernate. I had an implementation of a JpaRepository, where I was injecting an EntityManager through the annotation @PersistenceContext. This was working with normal Java EE application servers where the container injects the appropriate instance of the EntityManager.

public class JpaRepository extends RepositoryImpl {
 
  @PersistenceContext
  private EntityManager em;
  // ..
  // ..
}

Spring also supports this annotation both at field and method level if a PersistenceAnnotationBeanPostProcessor is enabled. Using Guice I had to write a Provider<> to have this same functionality implemented in my Java SE application.

@Singleton
public class EntityManagerProvider implements Provider<EntityManager> {
 
  private static final EntityManagerFactory emf =
    Persistence.createEntityManagerFactory("GuiceJpaGettingStarted");
 
  public EntityManager get() {
    return emf.createEntityManager();
  }
}

The Guice Way

Guice is opinionated .. yes, it really is. And through this porting exercise I have learnt it. It encourages some practices and adds syntactic vinegars trying to subvert the recommendations. e.g. For injection, you either annotate with @Inject or write Providers. Provider<> is a wonderful tiny abstraction and it's amazing how powerful it can get in real life applications. Use Providers to implement custom instantiation policies, multiple injections per dependency and even can have custom scopes for injecting providers. For porting applications, you can use AutowiringProvider<>, but that's not really what Guice encourages a lot.

Guicy Performance

In the Spring based version, I had been using quite a few lookup-method-injections to design singleton services that have prototype beans injected. While porting, I didn't have to do anything special in Guice, apart from specifying the appropriate scopes during binding in modules. And these prototype beans were heavily instantiated within the application. I did some benchmarking and found that Guice proved to be much more performant than Spring for these use cases. In some cases I got 10 times better performance in injection and repeated instantiation of prototype beans within singleton services. I admit that Spring has much richer support for lifecycle methods and I would not venture into the hairy territory of trying to compare Spring with Guice. But if I would have to select an IoC container just for dependency injection, Guice will definitely be up there as a strong contender.

Friday, March 30, 2007

Making Classes Unit-Testable

I had been working on the code review of one of our Java projects, when the following snippet struck me as a definite smell in one of the POJOs :

class TradeValueCalculator {
  // ..
  // ..

  public BigDecimal calculateTradeValue(final Trade trade, ..) {
    // ..
    BigDecimal tax = TradeUtils.calculateTax(trade);
    BigDecimal commission = TradeUtils.calculateCommission(trade);
    // .. other business logic to compute net value
  }
  // ..
  // ..
}

What is the problem with the above two innocuous looking Java lines of code ? The answer is very simple - Unit Testability of the POJO class TradeValueCalculator ! Yes, this post is about unit testability and some tips that we can follow to design classes that can be easily unit tested. I encountered many of these problems while doing code review of a live Java project in recent times.

Avoid Statics

When it comes to testability, statics are definitely not your friends. In the above code snippet, the class TradeValueCalculator depends on the implementation of the static methods like TradeUtils.calculateTax(..) and TradeUtils.calculateCommission(..). Any change in these static methods can lead to failures of unit tests of class TradeValueCalculator. Hence statics introduce unwanted coupling between classes, thereby violating the principle of easy unit-testability of POJOs. Avoid them, if you can, and use standard design idioms like composition-with-dependency-injection instead. And while using composition with service components, make sure they are powered by interfaces. Interfaces provide the right level of abstraction for multiple implementations and are much easier to mock while testing. Let us refactor the above snippet to compose using service components for calculating tax and commission :

class TradeValueCalculator {

  // .. to be dependency injected
  private ITaxCalculator taxCalculator;
  private ICommissionCalculator commissionCalculator;

  // ..
  // ..

  public BigDecimal calculateTradeValue(final Trade trade, ..) {
    // ..
    BigDecimal tax = taxCalculator.calculateTax(trade);
    BigDecimal commission = commissionCalculator.calculateCommission(trade);
    // .. other business logic to compute net value
  }
  // ..
  // ..
}

interface ITaxCalculator {
  BigDecimal calculateTax(..);
}

interface ICommissionCalculator {
  BigDecimal calculateCommission(..);
}

We can then have concrete instances of these service contracts and inject them into the POJO TradeValueCalculator :

class DefaultTaxCalculator implements ITaxCalculator {
  // ..
}

class DefaultCommissionCalculator implements ICommissionCalculator {
  // ..
}

Using standard IoC containers like Guice or Spring, we can inject concrete implementations into our POJO non-invasively through configuration code. In Guice we can define Modules that bind interfaces to concrete implementations and use Java 5 annotation to inject those bindings in appropriate places.


// define module to configure bindings
class TradeModule extends AbstractModule {

  @Override
  protected void configure() {
  bind(ITaxCalculator .class)
      .to(DefaultTaxCalculator .class)
      .in(Scopes.SINGLETON);
  
    bind(ICommissionCalculator .class)
      .to(DefaultCommissionCalculator .class)
      .in(Scopes.SINGLETON);
  }
}

and then inject ..

class TradeValueCalculator {

  // .. 
  @Inject private ITaxCalculator taxCalculator;
  @Inject private ICommissionCalculator commissionCalculator;

  // ..
  // ..
}

How does this improve testability of our class TradeValueCalculator ?

Just replace the defined Module by another one for unit testing :

// define module to configure bindings
class TestTradeModule extends AbstractModule {

  @Override
  protected void configure() {
    bind(ITaxCalculator .class)
      .to(MockTaxCalculator .class)
      .in(Scopes.SINGLETON);
  
    bind(ICommissionCalculator .class)
      .to(MockCommissionCalculator .class)
      .in(Scopes.SINGLETON);
  }
}

What we have done just now is mocked out the service interfaces for tax and commission calculation. And that too without a single line of code being changed in the actual class! TradeValueCalculator can now be unit-tested without having any dependency on other classes.

Extreme Encapsulation

I have come across many abuses of FluentInterfaces, where developers use chained method invocations involving multiple classes. Take this example from this Mock Objects paper, which discusses this same problem :

dog.getBody().getTail().wag();

The problem here is that the main class Dog is indirectly coupled with multiple classes, thereby violating the Law of Demeter and making it totally unsuitable for unit testing. The situation is typically called "The Train Wreck" and has been discussed extensively in the said paper. The takeway from this situation is to minimize coupling with neighbouring classes - couple only with the class directly associated with you. Think in terms of abstracting the behavior *only* with respect to the class with which you collaborate directly - leave implementation of the rest of the behavior to the latter.

Privates also need to be Unit-Tested

There is a school of thought which espouses the policy that *only* public api s need to be unit-tested. This is not true - I firmly believe that all your methods and behaviors need unit testing. Strive to achieve the maximum coverage of unit testing in your classes. Roy Osherove thinks that we may have to bend some of the rules of pure OOD to make our design implementations more testable e.g. by exposing or replacing private instances of objects using interfaces, injection patterns, public setters etc. Or by discouraging default sealing of classes allowing overriding in unit tests. Or by allowing singletons to be replaced in tests to break dependencies. I think, I agree to many of these policies.

Fortunately Java provides a useful access specifier that comes in handy here - the package private scope of access. Instead of making your implementation members *private*, make them *package private* and implement unit test classes in the same package. Doing this, you do not expose the private parts to the public, while allowing access to all unit test classes. Crazy Bob has more details on this. Another useful trick to this may be usage of AOP. As part of unit test classes, you can introduce additional getters through AOP to access the implementation artifacts of your class. This can be done through inter-type declarations, and the test classes can access all private data at gay abandon.

Look out for Instantiations

There are many cases where the class that is being unit tested needs to create / instantiate objects of the collaborating class. e.g.

class TradeController {

  // ..
  // ..

  public void doTrade(TradeDTO dto, ..) {
    Trade trade = new Trade(dto);
    // .. logic for trade
  }
  // ..
}

Increase the testability of the class TradeController by separating out all creation into appropriate factory methods. These methods can then be overridden in test cases to inject creation of Mock objects.

class TradeController {
  TradeDTO dto;

  // ..
  // ..

  public void doTrade() {
    Trade trade = createTrade(dto);
    // .. logic for trade
  }
  // ..

  protected Trade createTrade(TradeDTO dto) {
    return new Trade(dto);
  }
}

and create MockTrade in test cases ..

class TradeControllerTest extends TestCase {

  // ..

  public void testTradeController(..) {
    TradeController tc = new TradeController() {
      protected Trade createTrade(TradeDTO dto) {
        return new MockTrade(dto);
      }
    }
    tc.doTrade();
  }
}

The Factory Method pattern proves quite helpful in such circumstances. However, there are some design patterns like The Abstract Factory, which can potentially introduce unwanted coupling between classes, thereby making them difficult to unit-test. Most of the design patterns in GOF are built on composition - try implementing them using Interfaces in Java, so that they can be easily mocked out. Another difficult pattern is the Singleton - I usually employ the IoC container to manage and unit-test classes that collaborate with Singletons. Apart from static methods, which I have already mentioned above, static members are also problematic cases for unit testing. In many applicatiosn they are used for caching (e.g. ORMs) - hence an obvious problem child for unit testing.

Wednesday, March 21, 2007

Using Guice as the DI Framework - some hiccups

Finally got the time to lay my hands on Guice, the new IoC container from Google. I ported one of my small applications based on Spring to Guice - nothing much too complicated, but just to explore the features of Guice and some user level comparison with Spring. I have been a long time Spring user (and an admirer too), hence the comparison is just an automatic and involuntary sideeffect of what I do with any of the other IoC containers on earth.

My first impression with Guice has been somewhat mixed. It is really refreshing to work with the extremly well-designed api's, packed with the power of generics and annotations from Java 5. Fluent interfaces like binder.bind(Service.class).annotatedWith(Blue.class).to(BlueService.class)
make ideal DSLs in Java and give you the feeling that you are programming to Guice rather than to Java. This is a similar feeling that you get when programming to Rails (as opposed to programming in Ruby). However, I came across some stumbling blocks which have been major irritants for the problem that I was trying to solve. Just to point out that I have only fiddled around with Guice for a couple of days, and may have missed out lots of details which can offer better solutions to the problems. Any advice, suggestions from the experts will be of great help. Here are some of the rants from my exercise with porting an existing application into Guice :

A Different way to look at Configuration

Many people have blogged about the onerous XML hell of Spring and how Guice gets rid of these annoyances. I think the main difference between Guice and Spring lies in the philosophy of how they both look at dependencies and configuration. Spring preaches the non-invasive approach (my favorite) and takes a completely externalized view towards object dependencies. In Spring, you can either wire up dependencies using XML or Spring JavaConfig or Groovy-Spring DSL or some other option like using Spring-annotations. But irrespective of the techniques you use, dependencies are always externalized :

@Configuration
public class MyConfig {
  @Bean
  public Person rod() {
    return new Person("Rod Johnson");
  }

  @Bean(scope = Scope.PROTOTYPE)
  public Book book() {
    Book book = new Book("Expert One-on-One J2EE Design and Development");
    book.setAuthor(rod());  // rod() method is actually a bean reference !
    return book;
  }
}

The above is an example from Rod Johnson's blog post - the class MyConfig is an externalized rendition of bean configurations. It uses Java 5 annotations to define beans and their scopes, but, at the end of the day, all it does is equivalent to spitting out the following XML :

<bean id="rod" class="Person" scope="singleton">
  <constructor-arg>Rod Johnson</constructor-arg>
</bean>

<bean id="book" class="Book" scope="prototype">
  <constructor-arg>Expert One-on-One J2EE Design and Development</constructor-arg>
  <property name="author" ref="rod"/>
</bean>

Guice, on the other hand, treats configuration as a first class citizen of your application model and allows them right into your domain model code. Guice modules indicate what to inject, while annotations indicate where to inject. You annotate the class itself with the injections (through @Inject annotation). The drawback (if you consider it to be one) is that you have to import com.google.inject.* within your domain model. But it ensures locality of intentions, explicit semantics of insitu injections through metadata programming.

// what to inject : a sample Module
public class TradeModule extends AbstractModule {
  protected void configure() {
    bind(Trade.class).to(TradeImpl.class);
    bind(Balance.class).to(BalanceImpl.class);
    bindConstant().annotatedWith(Bond.class).to("fixed income");
    bindConstant().annotatedWith(I.class).to(5);
  }
}

// where to inject : a sample domain class
public class TradingSystem {
  @Inject Trade trade;
  @Inject Balance balance;

  @Inject @Bond String tradeType;

  int settlementDays;

  @Inject
  void setSettlementDays(@I int settlementDays) {
    this.settlementDays = settlementDays;
  }
}

Personally I would like to have configurations separated from my domain code - Guice looked to be quite intrusive to me in this respect. Using Spring with XML based configuration allows a clean separation of configuration from your application codebase. If you do not like XML based configurations, use Spring JavaConfig, which restricts annotations to configuration classes only. Cool stuff.

Annotations! Annotations!

Guice is based on Java 5 annotations. As I mentioned above, where-to-inject is specified using annotations only. The plus with this approach is that the intention is explicit and locally specified, which leads to good maintenability of code. However, in some cases, people may jump into overdose of annotations. Custom annotations should be restricted to minimum and should be used *only* as the last resort. Guice provides a Provider<T> abstraction to deal with fine grained instantiation controls. In fact Provider<T> is an exceptionally simple abstraction, but can be used very meaningfully to implement lazy variants of many design patterns like Factory and Strategy. In my application I have used Provider<T> successfully in implementing a Strategy, which I initially implemented using custom annotations. Lots of custom annotations is a design smell - try refactoring your design using abstractions like Provider<T> to minimize them.

Problems with Provider<T>

However, I hit upon a roadblock while implementing some complex strategies using Provider<T>. In many cases, my Strategy needed access to contextual information in order to decide upon the exact concrete strategy to be instantiated. In the following example, I need different strategy instances of CalculationStrategy depending on the trade type.

interface CalculationStrategy {
  void calculate();
}

public class TradeValueCalculation {

  private CalculationStrategy strategy;
  private Trade trade;

  // need different instances of strategy depending on trade type
  public TradeValueCalculation(Trade trade, CalculationStrategy strategy) {
    this.trade = trade;
    this.strategy = strategy;
  }

  public void calculate() {
    strategy.calculate();
  }
}

I cannot use any custom annotation on constructor argument strategy, since I need the polymorphic behavior for different instances of the same class. I tried with the Provider<T> approach :

public class TradeValueCalculation {

  private Provider<CalculationStrategy> strategy;
  private Trade trade;

  @Inject
  public TradeValueCalculation(Trade trade, Provider<CalculationStrategy> strategy) {
    this.trade = trade;
    this.strategy = strategy;
  }

  public void calculate() {
    strategy.get().calculate();
  }
}

Still the Provider does not have the context information .. :-( and my problem is how to pass this information to the Provider. Any help will be appreciated ..

On the contrary, solving this in Spring is rather simple by declaring multiple bean configurations for the same class. And this works like a charm in the XML as well as the Java variant of configuring Spring beans.

Some other pain points ..

Guice user guide recommends using Provider<T> to inject into third party classes. This looked quite obtuse to me since it goes against the philosophy of less code that Guice preaches. Spring provides much elegant solutions to this problem because of its non-invasiveness property. Guice, by virtue of being an intrusive framework, had to provide this extra level of indirection to DI into classes for which I do not have the source code.

One specific irritant in Guice is the literal injection, which forced me to use a custom annotation everytime I wanted to inject a String literal.

Another feature which would have been very useful for me is the ability to override bindings through Module hierarchies. In one of my big applications, I have multiple components where I thought I can organize my Guice Modules in a similar hierarchy with specific bindings being overridden in specific modules. This is definitely the DRY approach towards binding implementations to interfaces. Guice did not allow me to do this - later I found the similar topic discussed in the development mailing list, where a patch is available for overriding bindings. I am yet to try it out though ..

It will be extremely helpful if some of the Guice experts address these issues and suggest workarounds. I like the terseness of the framework, the api's are indeed very intuitive and so are the error messages. The Javadocs are extremly informative, though we need more exhaustive documentation on best practices of Guice. Guice is really lightweight and is published to be very fast (I am yet to test on those benchmarks with Spring though). I hope the Google guys look into some of the pain points that early adopters have been facing with Guice ..

Wednesday, March 07, 2007

Programming Passion and the FizzBuzz Madness

Of late there have been lots of buzzing around with FizzBuzz in many of the programming blogs. It all started with Imran on Tech posing the following fizzbuzz question to interview developers who grok coding ..

Write a program that prints the numbers from 1 to 100. But for multiples of three print “Fizz” instead of the number and for the multiples of five print “Buzz”. For numbers which are multiples of both three and five print “FizzBuzz”.

The programming community took it up from there and started a deluge of self proclamation trying to establish the most efficient way of FizzBuzzing. Have a look at the comments section of all these blogs - passionate programmers have used all *official* languages to get their version of FizzBuzz going.

And the madness continues .. someone positions FizzBuzz as the the programming task of choice for discriminating hackers. And guess what ? One hacker responded with a Ruby code interpreting Prolog interpreting Lisp interpreting Lisp solving FizzBuzz.

The intent of the initial post was definitely to bring into light some of the concerns and issues that interviewers face hiring good programmers. The community responded otherwise, as Giles Bowkett aptly mentions ..

if you were absurdly cynical, you might expect programmers to start coding FizzBuzz in every language from Haskell to SQL, and even to start holding FizzBuzz coding contests.

Are programmers by nature too passionate ? I guess yes, and that is why we find such madness in the community when someone throws in some problem which has a programming smell. But is this passion justified ? Now, this is cynical, since you can never justify passion. But I think we would have been much more pragmatic focusing on the main problem that Imran on Tech had raised - the issue of hiring good programmers through a pragmatic process of interviewing. Is asking candidates to solve screening programming assignments the right way to judge a programmer in an interview ? We follow these practices rampantly, we eliminate candidates through FizzBuzz problems, yet many of the software projects fail because of bad programming and inefficient programmers. Isn't it time we step back and introspect ? Have a look at this heartfelt critique of the FizzBuzz screening question .. (reference from RaganWorld)

Monday, February 26, 2007

Rails Applications and Law of Demeter

Jay Fields talks about Law of Demeter violations in Rails programs and suggests a fix using the Ruby module Forwardable, which allows you implement delegation without too much littering in your codebase. Along the same line Jeffrey Allan Hardy introduces the class method delegate in Ruby on Rails to automate delegation tasks. Both of these techniques allow syntax extensibility to keep your code clean from the manual delegate methods.

My question is .. does this really fix the violation of the Law of Demeter in Rails applications ?

The basic philosophy behind the concepts of Adaptive Programming or the Law of Demeter is to provide a better separation between the behavior and object structure in OO programs. A typical Rails application is tied strongly to the underlying ActiveRecord model. The domain model for a Rails application follows the same structure as the database model - hence the navigation of the object graph is a replica of the underlying relational table structure. To put it bluntly, there is no domain object model, which abstracts the behavior of the system. The constructs like Forwardable or class method delegate offer syntactic sugars that may help in code refactoring, but do not add to the reusability of the model. In fact the principle of least knowledge that LoD preaches is violated the moment you make your behavioral model knowledgable about the underlying persistence structure.

I am not an expert in Ruby or Rails - I would like to know what others feel about this ..

Tuesday, February 20, 2007

Domain Driven Design : Use ORM backed Repository for Transparent Data Access

In my last post, I had discussed about having Repositories as a higher level of abstraction in domain driven design than vanilla DAOs. Many people have questioned about how the getOutstationEmployees() method in EmployeeRepositoryImpl could have been more performant using plain old SQL instead of having it abstracted behind the layers of DAOs and Repositories. Actually, the main idea behind the post was to establish the fact that Repositories are a more natural way to interface domain Aggregates with the underlying database than DAOs. Let us see why ..

The Data Access Object pattern evolved as part of the core J2EE design patterns as a means of handling bean managed persistence, where every business object was mapped to a DAO. And for relational databases, every DAO had a natural mapping to the database table. Hence the DAO pattern enforced a stronger coupling with the underlying database structure. This strategy encourages the Transaction Script pattern of modeling a domain, which is definitely not what DDD preaches.

Repository provides a more domain centric view of data access, where the client uses the Ubiquitous Language to access the data source. The DAOs, OTOH, provide a more database centric view, which is closer to the implementation than the domain.

Repositories provide controlled access to the underlying data in the sense that it exposes only the Aggregate roots of the model, which the clients should be using. When we model an Order domain entity, it makes sense to expose LineItems only in the context of the Order, and not as separate abstractions. Repositories are mapped to the Aggregate root level and ensure that the client gets a coherent view of the Order entity.

Using ORM solutions with DDD

DDD advocates a strong domain model and ORM encourages transparent persistence of domain objects. In case of an RDBMS backed application, both the paradigms aim towards decoupling the relational data layer from the object oriented domain layer. Hence it is more natural that they will complement each other when we think of scalable application architectures with a strong domain model. And when we consider the combination of DDD and ORM, the DAO paradigm looks deprecated, because the domain layer is no longer concerned about bean level persistence - it deals with Aggregate level persistence. We talk about persisting an Order as a whole, not individual LineItems. Hence we talk about data access and persistence in terms of Repositories, which is a higher level of abstraction than DAOs. And when we talk about transaction control, synchronization of a unit of work and transparent persistence of domain entities, naturally we think of Hibernate Sessions or JPA Entity Managers. So, the concept of a Repository fits this deal like a glove - suddenly you feel like programming at your domain level, relational tables are something that can be managed by the DBA.

Towards a Generic Repository Implementation

How would you like to design a Repository, which can participate in multiple implementations across various ORMs, exposing domain contracts in the Ubiquitous Language ? Clearly the design needs to be extensible on both sides -

On the abstraction side, we need to have extensibility for the domain. All domain repositories will be part of this hierarchy.

On the implementation side, we need to have extensibility for multiple implementations, e.g. JPA, Hibernate etc.

Think Bridge design pattern, which allows us to decouple an abstraction from its implementation so that the two can vary independently.

On the abstraction side we have

public interface IRepository<T> {
  List<T> read(String query, Object[] params);
}

and a base class, which delegates to the implementation ..

public class Repository<T> implements IRepository<T> {
 
  private RepositoryImpl repositoryImpl;
 
  public List<T> read(String query, Object[] params) {
    return repositoryImpl.read(query, params);
  }

  public void setRepositoryImpl(RepositoryImpl repositoryImpl) {
    this.repositoryImpl = repositoryImpl;
  }
}

On the implementation side of the Bridge, we have the following base class

public abstract class RepositoryImpl {
  public abstract <T> List<T> read(String query, Object[] params);
}

and an implementation based on JPA ..

public class JpaRepository extends RepositoryImpl {

  // to be injected through DI in Spring
  private EntityManagerFactory factory;

  @Override
  public <T> List<T> read(String query, Object[] params) {
    JpaTemplate jpa = new JpaTemplate(factory);

    if (params == null) {
      params = ArrayUtils.EMPTY_OBJECT_ARRAY;
  }

    try {
      @SuppressWarnings("unchecked")
      List<T> res = jpa.executeFind(new GenericJpaCallback(query, params));
      return res;
    } catch (org.springframework.dao.DataAccessException e) {
      throw new DataAccessException(e);
    }
  }
}

Similarly we can have a Hibernate based implementation ..

public class HibernateRepository extends RepositoryImpl {
  @Override
  public <T> List<T> read(String query, Object[] params) {
    // .. hibernate based implementation
  }
}

But the client can work based on the contract side of the Bridge, with the implementation being injected through Spring ..
Here's a sample client repository contract based on the domain model of Richardson in POJOs in Action ..

public interface IRestaurantRepository {
  List<Restaurant> restaurantsByName(final String name);
  List<Restaurant> restaurantsByStreetName(final String streetName);
  List<Restaurant> restaurantsByEntreeName(final String entreeName);
  List<Restaurant> restaurantsServingVegEntreesOnly();
}

for the aggregate root Restaurant having the following model :

@Entity
public class Restaurant {
  /**
   * The id.
   */
  @Id
  @GeneratedValue(strategy = GenerationType.AUTO)
  private long id;

  /**
   * The name.
   */
  private String name;

  /**
   * The {@link Address}.
   */
  @OneToOne(cascade = CascadeType.ALL)
  private Address address;

  /**
   * Set of {@link Entree}.
   */
  @ManyToMany
  @JoinTable(inverseJoinColumns = @JoinColumn(name = "ENTREE_ID"))
  private Set<Entree> entrees;

  // all getters and setters removed for clarity
}

It uses JPA annotations for the object relational mapping. Note the one-to-one relationship with Address table and the many-to-many relationship with Entree. Clearly, here, Restaurant is the Aggregate root, with Entree and Address being part of the domain entity. Hence the repository is designed from the Aggregate root, and exposes the collections of Restaurants based on various criteria.

Provide an implementation of IRestaurantRepository using the abstraction side of the Bridge ..

public class RestaurantRepository extends Repository<Restaurant> 
  implements IRestaurantRepository {
  public List<Restaurant> restaurantsByEntreeName(String entreeName) {
    Object[] params = new Object[1];
    params[0] = entreeName;
    return read(
      "select r from Restaurant r where r.entrees.name like ?1", 
      params);
  }
  // .. other methods implemented
}

finally the Spring beans configuration, that injects the implementation ..

<bean id="repoImpl"
  class="org.dg.inf.persistence.impl.jpa.JpaRepository">
  <property name="factory" ref="entityManagerFactory"/>
   </bean>

   <bean id="restaurantRepository"
  class="org.dg.repository.impl.jpa.RestaurantRepository"
  lazy-init="true">
  <property name="repositoryImpl" ref="repoImpl"/>
   </bean>

Here is the complete implementation model of the Repository pattern :

Note how the above design of RestaurantRepository provides access to the Aggregate root Restaurant as collections without exposing the other entities like Entree and Address. Clearly the Repository pattern, if implemented properly, can actually hide the underlying database structure from the user. Here RestaurantRepository actually deals with 3 tables, but the client is blissfully unaware of any of them. And the underlying ORM makes it even more transparent with all the machinery of automatic synchronization and session management. This would not have been possible with the programming model of DAOs, which map naturally 1:1 with the underlying table. This is what I mean when I say that Repositories allow us to program the domain at a higher level of abstraction.

Monday, February 12, 2007

Domain Driven Design - Inject Repositories, not DAOs in Domain Entities

There are some discussions in Spring forum, of late, regarding injection of repositories in the domain objects. And in the context of the data access layer, there appears to be some confusion regarding the difference between DAOs and repositories. A data access object(DAO) is the contract between the relational database and the OO application. A DAO belongs to the data layer of the application, which encapsulates the internals of CRUD operations from the Java application being developed by the user using OO paradigms. In the absence of an ORM framework, the DAO handles the impedance mismatch that a relational database has with OO techniques.

A Look at the Domain Model

In the domain model, say, I have an entity named Organization, which needs to access the database to determine statistics regarding its employees ..

class Organization {

  private String name;
  private Address corporateOffice;
  // .. other members

  public long getEmployeeCount() {
    // .. implementation
  }

  public List<Employee> getOutstationEmployees() {
    // .. access Employee table and Address table in database
    // .. and find out employees who are in a diff city than corporate office
  }
}

and I need access to the employee details deep down in my database to get the above details mandated by the Organization entity. A typical EmployeeDao will look like the following :

public interface EmployeeDao {
  List<Employee> getAllEmployees();
  Employee getEmployeeById(long id);
  List<Employee> getEmployeesByAddress(Address address);
  List<Employee> getEmployeesByName(String name);
  // .. other methods
}

Using this DAO definition and an appropriate implementation, we can have the method getOutstationEmployees() as follows :

@Configurable("organization")
class Organization {

  // implementation needs to be injected
  private EmployeeDAO employeeDao;

  // ..
  // ..

  public List<Employee> getOutstationEmployees() {
    List<Employee> emps = employeeDao.getEmployeesByAddress(corporateOffice);
    List<Employee> allEmps = employeeDao.getAllEmployees();
    return CollectionUtils.minus(allEmps, emps);
  }
  // ..
}

Note the usage of the Spring annotation @Configurable to ensure dependency injection of the employeeDao into the domain object during instantiation. But how clean is this model ?

Distilling the Domain Model

The main problem with the above model is that we have lots of unrelated concerns polluting the domain. In his book on Domain Driven Design, Eric Evans says in the context of managing the sanctity of the domain model :

An object should be distilled until nothing remains that does not relate to its meaning or support its role in interactions

In the above design, the code snippet that prepares the list of outstation employees contains lots of logic which deals with list manipulations and data fetching from the database, which do not appear to belong naturally to the domain abstraction for modeling an Organization. This detailed logic should be part of some other abstraction which is closer to the data layer.

This is the ideal candidate for being part of the EmployeeRepository, which is a separate abstraction that interacts with the data accessors (here, the DAOs) and provides "business interfaces" to the domain model. Here we will have one repository for the entire Employee aggregate. An Employee class may collaborate with other classes like Address, Office etc., forming the entire Aggregate, as suggested by Eric in DDD. And it is the responsibility of this single Repository to work with all necessary DAOs and provide all data access services to the domain model in the language which the domain understands. So the domain model remains decoupled from the details of preparing collections from the data layer.

public interface EmployeeRepository {
  List<Employee> getOutstationEmployees(Address address);
  // .. other business contracts
}

public class EmployeeRepositoryImpl implements EmployeeRepository {

  private EmployeeDAO employeeDao;

  public List<Employee> getOutstationEmployees(Address address) {
    List<Employee> emps = employeeDao.getEmployeesByAddress(corporateOffice);
    List<Employee> allEmps = employeeDao.getAllEmployees();
    return CollectionUtils.minus(allEmps, emps);
  }
}

The main differences between the Repository and the DAO are that :

The DAO is at a lower level of abstraction than the Repository and can contain plumbing codes to pull out data from the database. We have one DAO per database table, but one repository per domain type or aggregate.

The contracts provided by the Repository are purely "domain centric" and speak the same domain language.

Repositories are Domain Artifacts

Repositories speak the Ubiquitous Language of the domain. Hence we contracts which the repositories publish must belong to the domain model. OTOH the implementation of a Repository will conatin many plumbing codes for accessing DAOs and their table specific methods. Hence it is recommended that the pure domain model should depend *only* on the Repository interfaces. Martin Fowler recommends the Separated Interface pattern for this.

Injecting the Repository instead of the DAO results in a much cleaner domain model :

@Configurable("organization")
class Organization {
  private String name;
  private Address corporateOffice;
  // .. other members

  // implementation needs to be injected
  private EmployeeRepository employeeRepo;

  // ..
  // ..

  public List<Employee> getOutstationEmployees() {
    return employeeRepo.getOutstationEmployees(corporateOffice);
  }
  // ..
}

In the next part, I will have a look at the Repository implementations while using an ORM framework like Hibernate. We do not need DAOs there, since the object graph will have transparent persistence functionality backed up by the ORM engine. But that is the subject for another discussion, another day ..