Pages

13 March 2008

PDD: Properties Driven Development

Quick quizz: 1 + 2 =? and 1 + 2 + 3 =? and 1 + 2 + 3 + 4 =?

In other words, what is the result of the sumN function, which sums all integers from 1 to n?

If we had to use a "direct" (some would say "naive") TDD approach, we could come with the following code:
[Warning!! The rest of the post assumes that you have some basic knowledge of Scala, specs and Scalacheck,... ]

"the sumN function should" {
def sumN(n: Int) = (1 to n) reduceLeft((a:Int, b:Int) => a + b)
"return 1 when summing from 1 to 1" in {
sumN(1) must_== 1
}
"return 2 when summing from 1 to 2" in {
sumN(2) must_== 3
}
"return 6 when summing from 1 to 3" in {
sumN(3) must_== 6
}
}

But if we browse our mental book of "mathematical recipes" we remember that

sumN(n) = n * (n + 1) / 2

This is actually a much more interesting property to test and Scalacheck helps us in checking that:

"the sumN function should" {
def sumN(n: Int) = (1 to n) reduceLeft((a:Int, b:Int) => a + b)
"return n(n+1)/2 when summing from 1 to n" in {
val sumNInvariant = (n: Int) => sumN(n) == n * (n + 1) / 2
property(sumNInvariant) must pass
}
}

Even better, Scalacheck has no reason to assume that n is strictly positive! So it quickly fails on n == -1 and a better implementation is:

"the sumN function should" {
def sumN(n: Int) = {
assume(n >= 0) // will throw an IllegalArgumentException if the constraint is violated
(1 to n) reduceLeft((a:Int, b:Int) => a + b)
}
"return n(n+1)/2 when summing from 1 to n" in {
val sumNInvariant = (n: Int) => n <= 0 || sumN(n) == n * (n + 1) / 2
property(sumNInvariant) must pass
}
}


This will be ok and tested for a large number of values for n. Using properties is indeed quite powerful. Recreation time! You can now have a look at this movie, where Simon Peyton-Jones shows that Quickcheck (the Haskell ancestor of Scalacheck) detects interesting defaults in a bit packing algorithm.

Fine, fine, but honestly, all those examples look very academic: graph algorithms, mathematical formulas, bits packing,... Can we apply this kind of approach to our mundane, day-to-day development? Tax accounting, DVD rentals, social websites?

I am going to take 2 small examples from my daily work and see how PDD could be used [yes, YAA, Yet Another Acronym,... PDD stands for Properties Driven Development (and not that PDD)].
  1. From the lift framework (courtesy of Jamie Webb): a 'camelCase' function which transforms underscored names to CamelCase names
  2. From my company software: some pricer extension code for Swaps where fees values have to be subtracted from the NPV (Net Present Value) under certain conditions
I'll develop the examples first and try to draw conclusions latter on.

Example 1: camelCase

So what are the properties we can establish for that first example? Can we describe it informally first?

The camelCase function should CamelCase a name which is under_scored, removing each underscore and capitalizing the next letter.

Try this at home, this may not be so easy! Here is my proposal:

def previousCharIsUnderscore(name: String, i: Int) = i > 1 && name.charAt(i - 1) == '_'
def underscoresNumber(name: String, i: Int) = {
if (i == 0) 0
else name.substring(0, i).toList.count(_ == '_')
}
def indexInCamelCased(name: String, i: Int) = i - underscoresNumber(name, i)
def charInCamelCased(n: String, i: Int) = camelCase(n).charAt(indexInCamelCased(n, i))

val doesntContainUnderscores = property((name: String) => !camelCase(name).contains('_'))

val isCamelCased = property ((name: String) => {
name.forall(_ == '_') && camelCase(name).isEmpty ||
name.toList.zipWithIndex.forall { case (c, i) =>
c == '_' ||
indexInCamelCased(name, i) == 0 && charInCamelCased(name, i) == c.toUpperCase ||
!previousCharIsUnderscore(name, i) && charInCamelCased(name, i) == c ||
previousCharIsUnderscore(name, i) && charInCamelCased(name, i) == c.toUpperCase
}
})
doesntContainUnderscores && isCamelCased must pass

This property says that:
  • the CamelCased name must not contain underscores anymore
  • if the name contains only underscores, then the CamelCased name must be empty
  • for each letter in the original name, either:
    • it is an underscore
    • it is the first letter after some underscores, then it becomes the first letter of the CamelCased word and should be uppercased
    • the previous character isn't an underscore, so it should be unchanged
    • the previous character is an underscore, so the letter should be uppercased
Before running Scalacheck, we also need to create a string generator with some underscores:

implicit def underscoredString: Arbitrary[String] = new Arbitrary[String] {
def arbitrary = for { length <- choose(0, 5) string <- vectorOf(length, frequency((4, alphaNumChar), (1, elements('_')))) } yield List.toString(string) }


This works and in the process of working on the properties I observed that:
  • the full specification for CamelCasing name is not so easy!

  • it is not trivial to relate the resulting name to its original. I had to play with indices and the number of underscores to be able to relate characters before and after. However, if that code is in place, the testing code is almost only 1 line per property to check

  • the properties above specify unambiguously the function. I could also have specify weaker properties with less code, by not avoiding to specify that some letters should be unchanged or that the CamelCased name contains an uppercased letter without checking its position.

Example 2: Pricer extension

The logic for this extension is:
  1. to have the NPV (NetPresentValue) being calculated by the Parent pricer
  2. to collect all fees labeled "UNDERLYING PREMIUM" for that trade
  3. to subtract the fee value from the NPV if the valuation date for the trade is >= the fee settlement date
  4. to apply step 3 only if a pricing parameter named "INCLUDE_FEES "is set to true, while another pricing parameter "NPV_INCLUDE_CASH" is set to false
This looks certainly like a lot of jargon for most of you but I guess that it is pretty close to a lot of "Business" requirements. What would a property for those requirements look like (in pseudo Scala code)?

(originalNPV, fees, valuation date, pricing parameters) =>

if (!INCLUDE_FEES || NPV_INCLUDE_CASH)
newNPV == originalNPV
else
newNPV == originalNPV - fees.reduceLeft(0) { (fee, result) => result +
if (fee.isUnderlyingPremium && fee.settlementDate <= valuationDate)
fee.getValue
else
0
}

The most remarkable thing about this property is that it looks very close to the actual implementation. On the other hand, Scalacheck will be able to generate a lot of test cases:
  • an empty fee list
  • a list with no underlying premium fee
  • a list with a fee which settle date is superior to the valuation date
  • a list with a fees which settle date is inferior to the valuation date
  • the 4 possible combinations for the values of the pricing parameters
You can also notice that I use as the first parameter the originalNPV, which doesn't directly come from a generator but which would be the result of the original pricer with the other generated parameters (fees, valuation date, pricing parameters).


Conclusion


As a conclusion, and at the light of the 2 previous examples, I would like to enumerate the result of my recent experiments with Properties-Driven-Development:
  • First of all is that PDD is TDD, on steroids. In PDD, we also have data and assertions but data are generated and assertions are more general.
  • I don't believe that this replaces traditional TDD in all situations. There are situations where generating even 4 or 5 cases manually is easier and faster. Especially when we consider that making an exact oracle (the savant word for verification of test expectation) is sometimes tedious as in the camelCase function. In that situation developping the cases manually using the == method would have been much faster
  • PDD on the other hand allow the specify very clearly what is the rule. This is something that you would have to infer reading several examples when using TDD
  • On the other hand having several examples also facilitate the understanding of what's going on. "foo_bar" becomes "FooBar" is more easy to catch than "if a letter is preceded by,..."
  • PDD is very good at generating data you wouldn't think of: empty list, negative numbers, a string with underscores only,...
  • A tip: sometimes, it is useful to include in the generated parameters the result returned by the function you want to test. For example, in the second example, my parameters could be: (originalNPV, newNPV, fees, valuation date, pricing parameters). That way, when Scalacheck reports an error, it also reports the actual value you got when showing a counter-example
  • Sometimes the properties you want to check will almost mimic the implementation (as in example 2). I think that this is may be very often the case with business code if written properly or that this may show that your code is missing a key abstraction
  • It really gets some time to get your head wrap around finding properties. And soon you'll start thinking things like: "I know that property A, B and C characterize my function, but are they sufficient?" and you realize that you coming close to Programming == Theorem Proving
As a summary, PDD is not the long awaited Silver Bullet (sorry ;-) ,...) but it is indeed a wonderful tool to have in your toolbox. It will help you test much more thoroughly your programs while seeing them yet another way.

3 comments:

rickynils said...

Hi Eric,

I think you're article is great, and shows the benefits of using a ScalaCheck-like approach when testing your software.

I would like to point out a couple of things that you might find useful when using ScalaCheck.

* Instead of writing this:

val sumNInvariant = (n: Int) =>
n <= 0 || sumN(n) == n * (n + 1) / 2

... you can write an implication like this:

val sumNInvariant = (n: Int) =>
(n > 0) ==> (sumN(n) == n * (n + 1) / 2)

That way, ScalaCheck will recognize the trivial cases where n <= 0, and not count them as passed tests. Otherwise, ScalaCheck could (theoretically) generate a bunch of all-negative integers and consider the property true without even checking the interesting second part of the OR-expression. Check out the User Guide for more details on the implication operator.


* Instead of defining a new Arbitrary[String] instance for your underscore strings, you could just define a generator like this:

val usStr = for {
length <- choose(0,5)
string <- vectorOf(length,
frequency(
(4, alphaNumChar),
(1, elements('_'))
)
)
} yield List.toString(string)

and then write your property like this:

val isCamelCased = Prop.forAll(usStr)(
(name: String) => {
...
}
)

The Arbitrary instance is really meant to be a 'default' generator for a certain type. I think it makes things clearer to use custom generators and the forAll-method when you need data on a special format like the underscore-srtings. Also, if you need to specify different things about strings you will probably get conflicting implicit definitions.

* A last thing, when defining new Arbitrary instances it is better to use the following way:

implicit val arbString: Arbitrary[String] =
Arbitrary(
for {
length <- choose(1,5)
...
} yield ...
)

That way you will not get into troubles when you want to use the "arbitrary"-method inside your definition. In the next release of ScalaCheck, you will only be able to create Arbitrary instances this way, you will get a compilation error if you try to use "new Arbitrary[String] { ... }". See http://code.google.com/p/scalacheck/issues/detail?id=11 for more info.

Eric said...

Thank you very much Ricky for those tips, I will update my lift to reflect this.

Eric said...

(re-reading my previous comment,..)
my lift what?

my lift tests of course!