Saturday, December 29, 2012

The Best of 2012 - Looking Back

The year 2012 was a bit different for me in terms of readings and enlightenment. While once again Scala proved to be the most dominant language that I used throughout the year, I also had some serious productive time and mind share to learning machine learning and related topics. Online courses at Coursera were of immense help in this regard and proved to be the catalyst of inspiration. Being in India, staying within the comfort zones of my happy home, I could attend the classes of professors like Daphne Koller, Martin Odersky and Geoffrey Hinton. Thanks a lot Coursera for this enlightening experience ..

online courses that I attended ..

Probabilistic Graphical Models (Daphne Koller)
Functional Programming Principles in Scala (Martin Odersky)
Neural Networks in Machine Learning (Geoffrey Hinton)

the best of the papers that I read ..

I usually read lots of papers on my subjects of interest. The total number are too many to list here. But here are a few of them, mostly from the domain of programming languages and type theory. In this regard I must confess that I am not a programming language theory person. But I sincerely believe that you need to know some theory to make sense of its implementation. That's the reason I think programmers (yes. serious programmers using a decent programming language with a static type system) will do better to learn a bit of category theory.

As I mentioned at the beginning, currently my programming language of choice is Scala and I have been exploring how to write better code with Scala's multi-paradigm capabilities. Though I tend to use the functional power of Scala more than its object oriented features, there are quite a few idioms that I use where I find its object oriented capabilities useful. Scala 2.10 will now be released any time and I am even more excited to try out many of the new features that it will offer.

Ok .. here are some of the papers that I enjoyed reading in 2012 ..

An Introduction to Programming and Proving with Dependent Types in Coq - Adam Chlipala
Theorems for free! - Philip Wadler
Totality - Conor McBride
Proofs are Programs: 19th Century Logic and 21st Century Computing - Philip Wadler
Data types à la carte - Wouter Swierstra
Agda-curious?: an exploration of programming with dependent types - Conor McBride
Ideal Hash Trees - Phil Bagwell

a few good books ..

This is also a different list that what I expected at the beginning of the year. It's mostly due to my focus on machine learning and AI related topics that I took up in course of my journey.

Causality: Models, Reasoning and Inference - Judea Pearl
Probabilistic Graphical Models: Principles and Techniques - Daphne Koller
A World Without Time: The Forgotten Legacy of Godel and Einstein - Palle Yourgrau
Neural Networks for Pattern Recognition - Christopher M. Bishop
Turing (A Novel about Computation) - Christos H. Papadimitriou
Practical Foundations for Programming Languages - Robert Harper (Read the freely available ebook)
Haskell: The Craft of Functional Programming (3rd Edition) - Simon Thompson

3 conferences in a year - my best so far ..

This was my second consecutive PhillyETE and I spoke on functional approach to domain modeling. In case you missed it earlier, here's the presentation up on slideshare. I combined my PhillyETE trip with a trip to London to attend ScalaDays 2012. It was a great experience listening to 3 keynotes by Guy Steele, Simon Peyton Jones and Martin Odersky. I also got to meet a number of my friends on Twitter who rock the Scala community .. it was so nice getting to know the faces behind the twitter handles. Later in the year I also spoke at QCon NY on the topic of domain modeling in the functional world.

One of the conferences that I plan to attend in 2013 is The Strange Loop. It has been one of the awesomest conferences so far and I hate to be tracking it only on Twitter every year. Keeping fingers crossed ..

some plans for 2013 (Not Resolutions though!)

In no specific order ..

Do more Haskell - Haskell is surely in for some big renaissance. I just spent a couple of hours of very productive time watching Edward Kmett explain his new lens library.
Continue Scala - obviously .. needs no explanation ..
Explore more of machine learning and work on at least one concrete project. I am now reading 2 solid books on ML theory - as I mentioned above I feel a bit uncomfortable learning implementations without any of the theory that drive them. While both of these books assume a solid background of mathematics and statistics (which I don't have), I am trying to crawl through them, greedily accumulating necessary math background as much as I can. Wish I could take a sabbatical leave for 1 year and finish both of them.

Guess that's all .. See you all on the other side of 2013. Happy holidays and have a gorgeous 2013.

Monday, October 01, 2012

Towards better refactoring support in IDEs for functional programming

A couple of days back I was thinking how we could improve the state of IDEs and let them give rich feedbacks to users focusing on code improvement and refactoring. When you write Java programs, IntelliJ IDEA is possibly the best IDE you can have with an extensive set of refactoring tools. But most of these are targeted towards reducing boilerplates and do refactor based on structural changes only (e.g. replace constructor with builder or use interfaces wherever possible etc.). Can we improve the state of the art when we are using a semantically richer language like Scala or Haskell ? How about suggesting some idioms that relate to the paradigm of functional programming and explore some of the best practices suggested by the masters of the trade ?

I always find many programmers use Option.get in Scala when they should be using higher order structures instead. Here's an example of the anti-pattern ..

if oName.isDefined oName.get.toUpperCase else "Name not present"

which should ideally be refactored to

oName.map(_.toUpperCase).getOrElse("Name not present")

The advantage with the latter is that it's more composable and can be pipelined to other higher order functions down the line without introducing any temporary variables.

Many people also use pattern matching for destructuring an Option ..

oName match {
  case Some(n) => n.toUpperCase
  case _ => "Name not present"
}

This is also not what idioms espouse and is almost equivalent to explicit null checking.

Can an IDE suggest such refactoring ? I tweeted about my thoughts and almost immediately Mirko, who is doing some great stuff on the Scala IDE, added this to his list of ideas. Nice!

Here's another idea. How about suggesting some program transformations for functional programming ? I know Scala is not a pure functional language and we don't have effect tracking. I have seen people use side-effecting functions in map, which, I know is a strict no-no. But again, the compiler doesn't complain. Still we can have the IDE suggest some refactorings assuming the programmer uses purity as per recommendations. After all, these are suggestions only and can enlighten a programmer with some of the realizations of how to model based on the best practices.

Consider this ..

(-1000000 to 1000000).map(_ + 1).filter(_ > 0)

Here the map is executed on all the elements and then filter processes the whole output. One option to optimize will be to defer the computations using view ..

(-1000000 to 1000000).view.map(_ + 1).filter(_ > 0)

But this doesn't reduce the load of computation on map. It just prevents the creation of a temporary list as an output of the map. We can apply a transformation called filter promotion that does the filtering first so that map can operate on a potentially smaller list. Of course this assumes that both functions are *pure* so that operations can be reordered - but that's what we should anyway ensure while doing functional programming .. right ? Here's the transformation ..

(-1000000 to 1000000).filter(((_: Int) > 0) compose ((_: Int) + 1)).map(_ + 1)

which can be further simplified into ..

(-1000000 to 1000000).filter((_: Int) >= 0).map(_ + 1)

.. and this has same computation load on filter but potentially less load on map, since it now has to process a filtered list.

Can we have such transformations available in the IDE ? I am a faithful user of vim, but with such enrichments I may consider going back to an IDE. TYhis really has no limits. Consider Wadler's free theorems. Many times we tend to write functions whose arguments are more specialized in types than what's required. Generalization of such functions using polymorphic arguments can lead to better abstraction and verification through free theorems. An IDE can suggest all these and may also help generate properties for property based testing using tools like QuickCheck and ScalaCheck. But that's discussion for another day ..

P.S. Simon Thompson's Haskell The Craft of Functional Programming 3rd edition discusses lots of similar transformations in various chapters that discuss reasoning or Haskell programs.

Friday, August 10, 2012

Category Theory and Programming - Reinforcing the value of generic abstractions

Ok .. so my last post on the probable usefulness of Category Theory in programming generated an overwhelming response all over the places. It initiated lots of discussions which are always welcome and offers a great conduit towards enlightenment of everyone. From all these discussions, we can summarize that it's not a boolean answer whether category theory has any usefulness in making you a better programmer. Category theory is abstract, probably more abstract than what our normal programmer's mind can comprehend. Here's what one mathematician's bias tells us on HN ..

"This may be a mathematician's bias, but Category Theory is somewhat underwhelming because you can't really do anything with it. It's actually too abstract--you can model basically every branch of mathematics with Categories, but you can't actually prove very much about them from this perspective. All you get is a different vocabulary to talk about results that you already knew. It's also less approachable because its objects of study are other branches of mathematics, whereas most other branches have number, vectors, and functions as standard objects."

Now let me digress a bit towards personal experience. I don't have a strong math background and have only started learning category theory. I don't get most of the mathy stuff that category theorists talk about that I cannot relate to my programming world. But the subset that Pierce talks about and which I can relate to when I write code gives me an alternate perspective to think about functional abstractions. And, as I mentioned in my last post, category theory focuses more on the morphisms than on the objects. When I am programming in a functional language, this has a strong parallel towards emphasizing how abstractions compose. Specifically point free style composition has its inspiration from category theory and the way it focuses on manipulating arrows across objects.

Focusing on Generic Abstractions

Category theory gives us Functors, Applicatives, Monads, Semigroups, Monoids and a host of other abstractions which we use extensively in our everyday programming. Or at least I try to. These have taught me to think in terms of generic abstractions. I will give you a practical example from my personal experience that served as a great eye opener towards appreciating the value of generic abstractions ..

I program mostly in Scala and some times in Haskell. And one thing I need to do frequently (and it's not at all something unique to me) is apply a function f to a collection of elements (say of type Foo) with the caveat that the function may sometimes return no result. In Scala the obvious way to model this is for the function to return an Option type. Just for the record, Option is the equivalent of Maybe in Haskell and can be one of the two concrete types, Some (indicating the presence of a value) and None (indicates no value). So, applying f: Foo => Option[Bar] over the elements of a List[Foo], gives me List[Option[Bar]]. My requirement was to convert the result into an Option[List[Bar]], which would have a value only if all the Foos in the List gave me Some[Bar], and None otherwise. Initially I implemented this function as a specialized utility for Option data type only. But then I realized that this function could very well be applicable for many other data types like Either, List etc. And this aha! moment came courtsey of Scalaz which has a function sequence that does the exact same thing and works for all Traversible data structures (not just Lists) and containing anything that's an instance of an Applicative Functor. The Scala standard library doesn't have generic abstractions like Monad, Monoid or Semigroup and I constantly find reinventing them all the time within real world domain models that I program. No wonder I have Scalaz as a default dependency in almost all of my Scala projects these days.

So much for generic abstractions .. Now the question is do I need to know Category Theory to learn the generic abstractions like Applicative Functors or Monads ? Possibly no, but that's because some of the benevolent souls in the programming community have translated all those concepts and abstractions to a language that's more approachable to a programmer like me. At the same time knowing CT certainly gives you a broader perspective. Like the above quote says, you can't pinpoint and say that I am using this from my category theory knowledge. But you can always get a categorical perspective and a common vocabulary that links all of them to the world of mathematical composition.

Natural Transformation, Polymorphism and Free Theorems

Consider yet another aha! moment that I had some time back when I was trying to grok Natural Transformations, which are really morphisms between Functors. Suppose I have 2 functors F and G. Then a natural transformation η: F → G is a map that takes an object (in Haskell, a type, say, a) and returns a morphism from F(a) to G(a), i.e.

η :: forall a. F a → G a

Additionally, a natural transformation also needs to satisfy the following constraint ..

For any f :: A → B, we must have fmap_G f . η = η . fmap_F f

Consider an example from Haskell, the function reverse on lists, which is defined as

reverse :: [a] -> [a], which we can more specifically state as reverse :: forall a. [a] -> [a] ..

Comparing reverse with our earlier definition of η, we have F ≡ [] and G ≡ []. And since for lists, fmap = map, we have the other criterion for natural transformation as map f . reverse = reverse . map f. And this is the free theorem for reverse!

Not only this, we can get more insight from the application of natural transformation. Note the forall annotation above. It literally means that η (or reverse) works for any type a. That is, the function is not allowed to peek inside the arguments and its behavior does not depend on the exact type of a. We can freely substitute one type for another and yet have the same behavior of the function. Hence a natural transformation is all about polymorphic functions.

Summary

I agree that Category Theory is not a must read to do programming, particularly if you are not using a typed functional language. But often I find a category diagram or a categorical analysis reveals a lot of insight about an abstraction, particularly with respect to how it composes with other similar abstractions in the world.

Monday, July 30, 2012

Does category theory make you a better programmer ?

How much of category theory knowledge should a working programmer have ? I guess this depends on what kind of language the programmer uses in his daily life. Given the proliferation of functional languages today, specifically typed functional languages (Haskell, Scala etc.) that embeds the typed lambda calculus in some form or the other, the question looks relevant to me. And apparently to a few others as well. In one of his courses on Category Theory, Graham Hutton mentioned the following points when talking about the usefulness of the theory :

Building bridges—exploring relationships between various mathematical objects, e.g., Products and Function
Unifying ideas - abstracting from unnecessary details to give general definitions and results, e.g., Functors
High level language - focusing on how things behave rather than what their implementation details are e.g. specification vs implementation
Type safety - using types to ensure that things are combined only in sensible ways e.g. (f: A -> B g: B -> C) => (g o f: A -> C)
Equational proofs—performing proofs in a purely equational style of reasoning

Many of the above points can be related to the experience that we encounter while programming in a functional language today. We use Product and Sum types, we use Functors to abstract our computation, we marry types together to encode domain logic within the structures that we build and many of us use equational reasoning to optimize algorithms and data structures.

But how much do we need to care about how category theory models these structures and how that model maps to the ones that we use in our programming model ?

Let's start with the classical definition of a Category. [Pierce] defines a Category as comprising of:

a collection of objects
a collection of arrows (often called morphisms)
operations assigning to each arrow f an object dom f, its domain, and an object cod f, its codomain (f: A → B, where dom f = A and cod f = B
a composition operator assigning to each pair of arrows f and g with cod f = dom g, a composite arrow g o f: dom f → cod g, satisfying the following associative law:

f: A → B

g: B → C

h: C → D

h o (g o f) = (h o g) o f

for each object A, an identity arrow id_A: A → A satisfying the following identity law:

f: A → B

id_B o f = f

f o id_A = f

Translating to Scala

Ok let's see how this definition can be mapped to your daily programming chores. If we consider Haskell, there's a category of Haskell types called Hask, which makes the collection of objects of the Category. For this post, I will use Scala, and for all practical purposes assume that we use Scala's pure functional capabilities. In our model we consider the Scala types forming the objects of our category.

You define any function in Scala from type A to type B (A => B) and you have an example of a morphism. For every function we have a domain and a co-domain. In our example, val foo: A => B = //.. we have the type A as the domain and the type B as the co-domain.

Of course we can define composition of arrows or functions in Scala, as can be demonstrated with the following REPL session ..

scala> val f: Int => String = _.toString
f: Int => String = <function1>

scala> val g: String => Int = _.length
g: String => Int = <function1>

scala> f compose g
res23: String => String = <function1>

and it's very easy to verify that the composition satisfies the associative law.

And now the identity law, which is, of course, a specialized version of composition. Let's define some functions and play around with the identity in the REPL ..

scala> val foo: Int => String = _.toString
foo: Int => String = <function1>

scala> val idInt: Int => Int = identity(_: Int)
idInt: Int => Int = <function1>

scala> val idString: String => String = identity(_: String)
idString: String => String = <function1>

scala> idString compose foo
res24: Int => String = <function1>

scala> foo compose idInt
res25: Int => String = <function1>

Ok .. so we have the identity law of the Category verified above.

Category theory & programming languages

Now that we understand the most basic correspondence between category theory and programming language theory, it's time to dig a bit deeper into some of the implicit correspondences. We will definitely come back to the more explicit ones very soon when we talk about products, co-products, functors and natural transformations.

Do you really think that understanding category theory helps you understand the programming language theory better ? It all depends how much of the *theory* do you really care about. If you are doing enterprise software development and/or really don't care to learn a language outside your comfort zone, then possibly you come back with a resounding *no* as the answer. Category theory is a subject that provides a uniform model of set theory, algebra, logic and computation. And many of the concepts of category theory map quite nicely to structures in programming (particularly in a language that offers a decent type system and preferably has some underpinnings of the typed lambda calculus).

Categorical reasoning helps you reason about your programs, if they are written using a typed functional language like Haskell or Scala. Some of the basic structures that you encounter in your everyday programming (like Product types or Sum types) have their correspondences in category theory. Analyzing them from CT point of view often illustrates various properties that we tend to overlook (or take for granted) while programming. And this is not coincidental. It has been shown that there's indeed a strong correspondence between typed lambda calculus and cartesian closed categories. And Haskell is essentially an encoding of the typed lambda calculus.

Here's an example of how we can explain the properties of a data type in terms of its categorical model. Consider the category of Products of elements and for simplicity let's take the example of cartesian products from the category of Sets. A cartesian product of 2 sets A and B is defined by:

A X B = {(a, b) | a ∈ A and b ∈ B}

So we have the tuples as the objects in the category. What could be the relevant morphisms ? In case of products, the applicable arrows (or morphisms) are the projection functions π₁: A X B → A and π₂: A X B → B. Now if we draw a category diagram where C is the product type, then we have 2 functions f: C → A and g: C→ B as the projection functions and the product function is represented by : C → A X B and is defined as <F, G>(x) = (f(x), g(x)). Here's the diagram corresponding to the above category ..

and according to the category theory definition of a Product, the above diagram commutes. Note, by commuting we mean that for every pair of vertices X and Y, all paths in the diagram from X to Y are equal in the sense that each path forms an arrow and these arrows are equal in the category. So here commutativity of the diagram gives
π₁ o <F, G> = f and
π₂ o <F, G> = g.

Let's now define each of the functions above in Scala and see how the results of commutativity of the above diagram maps to the programming domain. As a programmer we use the projection functions (_1 and _2 in Scala's Tuple2 or fst and snd in Haskell Pair) on a regular basis. The above category diagram, as we will see gives some additional insights into the abstraction and helps understand some of the mathematical properties of how a cartesian product of Sets translates to the composition of functions in the programming model.

scala> val ip = (10, "debasish")
ip: (Int, java.lang.String) = (10,debasish)

scala> val pi1: ((Int, String)) => Int = (p => p._1)
pi1: ((Int, String)) => Int = <function1>

scala> val pi2: ((Int, String)) => String = (p => p._2)
pi2: ((Int, String)) => String = <function1>

scala> val f: Int => Int = (_ * 2)
f: Int => Int = <function1>

scala> val g: Int => String = _.toString
g: Int => String = <function1>

scala> val `<f, g>`: Int => (Int, String) = (x => (f(x), g(x)))
<f, g>: Int => (Int, String) = <function1>

scala> pi1 compose `<f, g>`
res26: Int => Int = <function1>

scala> pi2 compose `<f, g>`
res27: Int => String = <function1>

So, as we claim from the commutativity of the diagram, we see that pi1 compose `<f, g>` is typewise equal to f and pi2 compose `<f, g>` is typewise equal to g. Now the definition of a Product in Category Theory says that the morphism between C and A X B is unique and that A X B is defined upto isomorphism. And the uniqueness is indicated by the symbol ! in the diagram. I am going to skip the proof, since it's quite trivial and follows from the definition of what a Product of 2 objects mean. This makes sense intuitively in the programming model as well, we can have one unique type consisting of the Pair of A and B.

Now for some differences in semantics between the categorical model and the programming model. If you consider an eager (or eager-by-default) language like Scala, the Product type fails miserably in presence of the Bottom data type (_|_) represented by Nothing. For Haskell, the non-strict language, it also fails when we consider the fact that a Product type needs to satisfy the equations (fst(p), snd(p)) == p and we apply the Bottom (_|_) for p. So, the programming model remains true only when we eliminate the Bottom type from the equation. Have a look at this comment from Dan Doel in James Iry's blog post on sum and product types.

This is an instance where a programmer can benefit from knwoledge of category theory. It's actually a bidirectional win-win when knowledge of category theory helps more in understanding of data types in real life programming.

Interface driven modeling

One other aspect where category theory maps very closely with the programming model is its focus on the arrows rather than the objects. This corresponds to the notion of an interface in programming. Category theory typically "abstracts away from elements, treating objects as black boxes with unexamined internal structure and focusing attention on the properties of arrows between objects" [Pierce]. In programming also we encourage interface driven modeling, where the implementation is typically abstracted away from the client. When we talk about objects upto isomorphism, we focus solely on the arrows rather than what the objects are made of. Learning programming and category theory in an iterative manner serves to enrich your knowledge on both. If you know what a Functor means in category theory, then when you are designing something that looks like a Functor, you can immediately make it generic enough so that it composes seamlessly with all other functors out there in the world.

Thinking generically

Category theory talks about objects and morphisms and how arrows compose. A special kind of morphism is Identity morphism, which maps to the Identity function in programming. This is 0 when we talk about addition, 1 when we talk about multiplication, and so on. Category theory generalizes this concept by using the same vocabulary (morphism) to denote both stuff that do some operations and those that don't. And it sets this up nicely by saying that for every object X, there exists a morphism id_X : X → X called the identity morphism on X, such that for every morphism f: A → B we have id_B o f = f = f o id_A. This (the concept of a generic zero) has been a great lesson at least for me when I identify structures like monoids in my programming today.

Duality

In the programming model, many dualities are not explicit. Category theory has an explicit way of teaching you the dualities in the form of category diagrams. Consider the example of Sum type (also known as Coproduct) and Product type. We have abundance of these in languages like Scala and Haskell, but programmers, particularly people coming from the imperative programming world, are not often aware of this duality. But have a look at the category diagram of the sum type A + B for objects A and B ..

It's the same diagram as the Product only with the arrows reversed. Indeed a Sum type A + B is the categorical dual of Product type A X B. In Scala we model it as the union type like Either where the value of the sum type comes either from the left or the right. Studying the category diagram and deriving the properties that come out of its commutativity helps understand a lot of theory behind the design of the data type.

In the next part of this discussion I will explore some other structures like Functors and Natural Transformation and how they map to important concepts in programming which we use on a daily basis. So far, my feeling has been that if you use a typed functional language, a basic knowledge of category theory helps a lot in designing generic abstractions and make them compose with related ones out there in the world.

Monday, July 23, 2012

Property based testing for domain models

One of the challenges that we face building a non trivial domain model is to write proper tests that verify the domain rules that the model implements. The domain rules can be quite complex, may have a number of edge cases which the developer himself may fail to take care of. When you use an implementation language that supports a decent type system, many of the rules and invariants can be encoded statically within the type system itself. This makes it impossible for the programmer to write any code that violates those constraints. But you can only encode some of the domain rules as part of your static type based constraints - you need to complement them with a testing procedure that validates the semantic behavior of the model.

If you are doing testing manually, you are doing it wrong. And if you use xUnit based testing procedures, there's enough scope for you to grow up towards better frameworks that offer true automation not only with respect to executing your tests, but generating the data as well.

Frameworks like QuickCheck in Haskell or ScalaCheck in Scala allows you to write property specifications that the model should satisfy and then generate data to guide evaluation of test executions for correctness as well as completeness. Here's what Bryan O'Sullivan et al mentions when discussing property based testing in their book Real World Haskell ..

Property-based testing encourages a high level approach to testing in the form of abstract invariants functions should satisfy universally, with the actual test data generated for the programmer by the testing library. In this way code can be hammered with thousands of tests that would be infeasible to write by hand, often uncovering subtle corner cases that wouldn't be found otherwise.

I have been doing lots of domain modeling on securities trading system. The model is a quite complex one and like the most of us, I started with xUnit based testing to verify some of the invariants and constraints that the system needs to honor. But very quickly I found that properties offer a more succinct way to abstract the constraints and invariants of my model. Hence using a tool like ScalaCheck makes it much simpler once I give it a proper data generator as input. Consider the simple property in the following example ..

A trade needs to be enriched in its lifecycle with tax/fee values and other attributes. We can then compute its net value which should be a positive numeric quantity.

property("Enrichment of a trade should result in netvalue > 0") {
  forAll((a: Trade) =>
    enrichTrade(a).netAmount.get should be > (BigDecimal(0)))
}

Here I don't need to construct concrete data by hand (in fact this is what makes xUnit based frameworks a sophisticated manual testing system - it only automates the execution part of your testing). Instead what I supply is a trade generator which gives ScalaCheck enough information based on which it can generate loads of trades. A typical simplified version of the trade generator is the following ..

implicit lazy val arbTrade: Arbitrary[Trade] =
  Arbitrary {
    for {
      a <- Gen.oneOf("acc-01", "acc-02", "acc-03", "acc-04")
      i <- Gen.oneOf("ins-01", "ins-02", "ins-03", "ins-04")
      r <- Gen.oneOf("r-001", "r-002", "r-003")
      m <- arbitrary[Market]
      u <- Gen.oneOf(BigDecimal(1.5), BigDecimal(2), BigDecimal(10))
      q <- Gen.oneOf(BigDecimal(100), BigDecimal(200), BigDecimal(300))
    } yield Trade(a, i, r, m, u, q)
  }

I can make more sophisticated properties and ask the system to check whether the model satisfies the property or not ..

property("Enrichment should mean netValue equals principal + taxes") {
  forAll((a: Trade) => {
    val et = enrichTrade(a)
    et.netAmount should equal (et.taxFees.map(_.foldLeft(principal(et))((a, b) => a + b._2)))
  })
}

This is a direct encoding of the business rule, which the domain model needs to satisfy. And using properties we can encode the verification in a more declarative fashion, and that too without bothering about how to generate concrete data for all the test cases.

Crosscutting Property Verification

When we talk about properties of the model, we can go all the way and write specifications for properties at all levels of granularity. It can be a local property limited to one module or it can be a system wide property, an invariant that holds good across multiple components. What I want to say is that property based testing can be very effective in system testing (or integration testing) as well.

In our system, we have Client Orders coming in that get executed in the stock exchanges resulting in broker side executions, which then get allocated to client accounts resulting in client trades. A succinct implementation of this entire flow can be the following ..

def tradeGeneration(market: Market, broker: Account, clientAccounts: List[Account]) =
  // client orders           executed at market by broker        & allocated to client accounts
  kleisli(clientOrders) >=> kleisli(execute(market)(broker)) >=> kleisli(allocate(clientAccounts))

We can use property based specification to test this entire lifecycle using ScalaCheck. Let's check for one of the invariants in this entire process -

Invariant: "The total quantity of order will be equal to the total quantity traded"

and here's the specification of the property in ScalaCheck ..

property("Client trade allocation in the trade pipeline should maintain quantity invariant") {
  forAll { (clientOrders: List[ClientOrder], args: (Market, Account, List[Account])) =>
    whenever (clientOrders.size > 0 && args._2.size > 0 && args._3.size > 0) {
      val trades = tradeGeneration(args._1, args._2, args._3)(clientOrders)
      trades.size should be > 0
      (trades.sequence[({type λ[a]=Validation[NonEmptyList[String],a]})#λ, Trade]) match {
        case Success(l) => {
          val tradeQuantity = l.map(_.quantity).sum
          val orderQuantity = fromClientOrders(clientOrders).map(_.items).flatten.map(_.qty).sum
          tradeQuantity should equal(orderQuantity)
        }
        case _ => fail("should get a list of size > 0")
      }
    }
  }
}

The properties are the domain rules and these can be prepared by the domain experts. I am a firm believer in collaborative involvement of domain experts in building the model. I have advocated the use of domain specific languages as a thin linguistic abstraction on top of a domain model that should be built iteratively with the domain experts. Property based testing is the third piece that completes the solution framework of developing complete domain models. Property based testing strengthens this view of increased involvement of the domain people in building the software. At the end of the day, leave everything to the respective experts - the domain people specifies properties and invariants, the developer implements the specification and the underlying test framework does the heavy lifting of generating enough data and automate the execution process.

Monday, February 20, 2012

It's the familiarity model!

James Iry recently blogged on code density and tries to find out the meaning of "dense code". As an example he took upon the topic of regular expressions, which despite being dense are not frowned upon in coding and hardly get replaced for making the code fragment more readable.

So the question is what makes code dense so that it's not acceptable to programmers and they complain about it's incomprehensibility ?

Later in the blog post James himself identifies unfamiliarity as one of the culprits. People don't complain about regexes since they are familiar with them, but will surely complain of something else which they are not familiar with.

Almost during the same time Ola Bini blogged about expressiveness in programming language syntax. He mentions that a well designed syntax should help programmers *read* the code easily. But he also questions about the target programmers ..

who is this person reading ? It makes a huge difference if we’re trying to design something that should be easy to read for a novice or we’re trying to design a syntax that makes it easier for an expert to understand what’s going on.

Once again we get into this territory of familiarity and mental model. An expressive piece of code becomes readable only to a person who is familiar with the underlying model. In my programming career I have come across this dichotomy a number of times where programmers complain of something being too dense the moment it crosses the threshold of his familiarity level. I have seen developers taking every pain to understand the nuances of a Spring XML configuration. Or who have spent zillions of hours mastering the whole bunch of performance tuning Hibernate with stuffs like query cache configuration. Believe me it's not simple with tonnes of corner cases to take care of and even today I am not sure if it can be achieved in a deterministic way for all kinds of data models. But these same developers complain when they are faced with maintaining code that needs a basic understanding of functional programming, set theory or algebraic data types. I think it's purely because these form outside the limits of their familiarity model.

For a programmer who is not familiar with higher order functions, combinators like map, fold or filter will look too dense. So when you say map (+1) [1..5], the code fragment looks much less comprehensible to him than his familar variant of using an imperative mutated-indexed for-loop. To the unfamiliar the functional variant appears dense, to the expert it becomes succinct.

One of the challenges that I face today is to make programmers believe that learning new stuff will only help them think better. It's not mandatory that they will need all of these tools as part of their day job. But broadening your mental model can only help your thought process to leverage a wider playground. Maybe in our part of the world big companies give no incentive to transform yourself from a billable offshore resource to a thinking programmer. But you really need to transcend the limits of your familiarity model in order to appreciate code which experts certify as succinct.

Monday, February 06, 2012

Applicatives and a story of composability with sjsonapp

sjson has just gone applicative. I have changed the typeclasses for reading and writing jsons so that the typeclass protocols now return applicatives instead of raw types. Of course this makes the protocols composable with any other applicative based API in the world. This is the advantage of programming with generic abstractions like functors, applicatives and monads - you can compose them readily with any API that the world has written using the same ones.

Previously the serialization typeclasses for sjson looked like ..

// takes a type and produces a Json abstraction
trait Writes[T] {
  def writes(o: T): JsValue
}

// reads a json abstraction and produces T
trait Reads[T] {
  def reads(json: JsValue): T
}

You can compose writes and reads together only through function composition since they have symmetric type signatures. But you cannot get the benefits of threading additional effectful computations through them. e.g. I cannot accumulate errors generically in reads that result from mismatches in the field names between the json structure and the Scala type. Or I cannot plugin additional validations on Json structures during de-serialization into Scala type and have the validation messages passed on to the client. I need to have specialized handling in my code base by resorting to side-effects like throwing exceptions, which don't compose well.

In sjsonapp, the typeclasses are changed to ..

trait Writes[T] {
  def writes(o: T): ValidationNEL[String, JsValue]
}

trait Reads[T] {
  def reads(json: JsValue): ValidationNEL[String, T]
}

scalaz defines an applicative functor named Validation that allows you to compose validating abstractions. I had discussed in detail how you can compose domain models using applicative functors like Validation in an earlier post. You can use Validation to accumulate errors that occur when you de-serialize a Json structure into a Scala object.

Applicative Composition FTW

Here's the immediate impact of making your APIs return an applicative - your json processing becomes a composable pipeline of abstractions. Here's an example of an identity operation where you serialize Scala objects into json and de-serialize them back into the same abstractions using applicative composition ..

describe("Serialize and compose applicatives") {
  it("should compose and form a bigger ADT") {
    case class Address(no: String, street: String, zip: String)
    implicit val AddressFormat: Format[Address] =
      asProduct3("no", "street", "zip")(Address)(Address.unapply(_).get)

    case class Name(firstName: String, lastName: String)
    implicit val NameFormat: Format[Name] =
      asProduct2("firstName", "lastName")(Name)(Name.unapply(_).get)

    case class Me(name: Name, age: Int, address: Address)
    implicit val MeFormat: Format[Me] =
      asProduct3("name", "age", "address")(Me)(Me.unapply(_).get)

    val name = Name("debasish", "ghosh")
    val address = Address("1050/2", "Survey Park", "700075")
    val me = Me(name, 40, address)

    fromjson[Me](tojson(me).toOption.get) should equal(me.success)

    (tojson(name) |@| tojson(address) |@| tojson(40)) {(nm, add, age) =>
      (fromjson[Name](nm) |@| fromjson[Address](add) |@| fromjson[Int](age)) {(n, ad, ag) => Me(n, ag, ad)}
    } should equal(Success(Success(me)))
  }
}

Accumulating Validation Errors

Here's a test snippet that demonstrates how you can get back validation errors in a List when de-serializing a Json structure that's supposed to make a Scala object ..

case class Person(firstName: String, lastName: String, gender: String, age: Int)
implicit val PersonFormat: Format[Person] =
  asProduct4("firstName", "lastName", "gender", "age")(Person)(Person.unapply(_).get)

val pjson = """{"FirstName" : "Debasish", "LastName" : "Ghosh", "gender": "M", "age": 40}"""
fromjson[Person](Js(pjson)).fail.toOption.get.list 
  should equal(List("field firstName not found", "field lastName not found"))

The power of composition with applicatives .. and note we don't use any side-effecting operations like throwing exceptions which eschews purity of your functions. The accumulation is powered by a Semigroup abstraction that constrains the error part of the Validation. Have a look at how the applicative for Validation constrains X to be a Semigroup in scalaz and accumulates the failures in the last clause of the pattern match ..

implicit def ValidationApply[X: Semigroup]: Apply[({type λ[α]=Validation[X, α]})#λ] = 
  new Apply[({type λ[α]=Validation[X, α]})#λ] {
    def apply[A, B](f: Validation[X, A => B], a: Validation[X, A]) = (f, a) match {
      case (Success(f), Success(a)) => success(f(a))
      case (Success(_), Failure(e)) => failure(e)
      case (Failure(e), Success(_)) => failure(e)
      case (Failure(e1), Failure(e2)) => failure(e1 |+| e2)
    }
}

Besides mismatches in field names, you can also plug in custom validation functions that will be invoked during de-serialization of your json structures and report similar errors when they fail .. Here's an example ..

case class Person(firstName: String, lastName: String, gender: String, age: Int)

val validGender: String => ValidationNEL[String, String] = {g =>
  if (g == "M" || g == "F") g.success else "gender must be M or F".fail.liftFailNel
}

val validAge: Int => ValidationNEL[String, Int] = {a =>
  if (a < 0 || a > 100) "age must be positive and < 100".fail.liftFailNel else a.success
}

// the typeclass implementation for Person
implicit val PersonFormat: Format[Person] = new Format[Person] {

  // the de-serializing function
  def reads(json: JsValue): ValidationNEL[String, Person] = json match {
    case m@JsObject(_) =>
      (field[String]("firstName", m)            |@|
      field[String]("lastName", m)              |@|
      field[String]("gender", m, validGender)   |@| // validation plugin
      field[Int]("age", m, validAge)) { Person }
  
    case _ => "JsObject expected".fail.liftFailNel
  }
  
  // the serializing function 
  //..
}

Note how we plug in the validations for gender and age into the typeclass instance for Person. Also note that these functions also return an instance of scalaz Validation which can be nicely composed with the return type of the reads function after constructing the instance of the Person class. Here's an example ..

val p = Person("ghosh", "debasish", "M", 27)
fromjson[Person](tojson(p).toOption.get) should equal(p.success)

val r = Person("ghosh", "debasish", "G", 270)
fromjson[Person](tojson(r).toOption.get).fail.toOption.get.list should 
  equal(List("gender must be M or F", "age must be positive and < 100"))

Applicatives open up a world of possibilities

Once you have your APIs based on applicatives you can use all other abstractions that applicative functors offer - e.g. you can compose validations using Kleislis ..

describe("Serialize and chain validate using Kleisli") {
  case class Me(firstName: String, lastName: String, age: Int, no: String, street: String, zip: String)
  implicit val MeFormat: Format[Me] =
    asProduct6("firstName", "lastName", "age", "no", "street", "zip")(Me)(Me.unapply(_).get)

  val positive: Int => ValidationNEL[String, Int] =
    (i: Int) => if (i > 0) i.success else "must be +ve".fail.liftFailNel

  val min: Int => ValidationNEL[String, Int] =
    (i: Int) => if (i > 10) i.success else "must be > 10".fail.liftFailNel

  val max: Int => ValidationNEL[String, Int] =
    (i: Int) => if (i < 100) i.success else "must be < 100".fail.liftFailNel

  it("should serialize and validate") {
    val me = Me("debasish", "ghosh", 30, "1050/2", "survey park", "700075")
    val json = tojson(me)

    import Validation.Monad._
    type VA[A] = ValidationNEL[String, A]

    field[Int]("age", json.toOption.get,
      kleisli[VA, Int, Int](positive) >=> 
           kleisli[VA, Int, Int](min) >=> kleisli[VA, Int, Int](max)) should equal(30.success)

    val me1 = me.copy(age = 300)
    val json1 = tojson(me1)

    field[Int]("age", json1.toOption.get,
      kleisli[VA, Int, Int](positive) >=> 
           kleisli[VA, Int, Int](min) >=> kleisli[VA, Int, Int](max)).fail.toOption.get.list 
           should equal(List("must be < 100"))
  }
}

The new version of sjson with all the applicative based APIs is available in a separate repository sjsonapp on my github. Another interesting development that will soon be available is pluggable backend for sjson. The current version (branch master) uses dispatch as the json processing backend. I am working towards making sjsonapp compatible with RosettaJson, so that you can use pluggable json processing backends like LiftJson, Dispatch or BlueEyes. The current RosettaJson compatible version is available in the branch rosetta - feel free to checkout and play with it.

Tuesday, January 24, 2012

List Algebras and the fixpoint combinator Mu

In my last post on recursive types and fixed point combinator, we saw how the type equations of the form a = F(a), where F is the type constructor have solutions of the form Mu a . F where Mu is the fixed point combinator. Substituting the solution in the original equation, we get ..

Mu a . F = F {Mu a . F / a}

where the rhs indicates substitution of all free a's in F by Mu a . F.

Using this we also got the type equation for ListInt as ..

ListInt = Mu a . Unit + Int x a

In this post we view the same problem from a category theory point of view. This post assumes understanding of quite a bit of category theory concepts. If you are unfamiliar with any of them you can refer to some basic text on the subject.

We start with the definition of ListInt as in the earlier post ..

// nil takes no arguments and returns a List data type

nil : 1 -> ListInt



// cons takes 2 arguments and returns a List data type

cons : (Int x ListInt) -> ListInt

Combining the two functions above, we get a single function as ..

in = [nil, cons] : 1 + (Int x ListInt) -> ListInt

We can say that this forms an algebra of the functor F(X) = 1 + (Int x X). Let's represent this algebra by (Mu F, in) or (Mu F, [nil, cons]), where Mu F is ListInt in the above combined function.

As a next step we show that the algebra (Mu F, [nil, cons]) forms an initial algebra representing the data type of Lists over a given set of integers. Here we are dealing with lists of integers though the same result can be shown for lists of any type A.

In order to show (Mu F, [nil cons]) form an initial F-algebra we consider an arbitrary F-algebra (C, phi), where phi is an arrow out of the sum type given by :

C : 1 -> C

h : (Int x C) -> C

and the join given by [c, h] : 1 + (Int x C) -> C

By definition, if (Mu F, [nil, cons]) has to form an initial F-algebra, then for any arbitrary F-algebra (C, phi) in that category, we need to find a function f: Mu F -> C which is a homomorphism and it should be unique. So for the algebra [c, h] the following diagram must commute ..

which means we must have a unique solution to the following 2 equations ..

f o nil = c

f o cons = h o (id x f)

From the universal property of initial F-algebras it's easy to see that this system of equations has a unique solution which is fold(c, h). It's the catamorphism represented by ..

f: {[c, h]}: ListInt -> C

This proves that (Mu F, [nil, cons]) is an initial F-algebra over the endofunctor F(X) = 1 + (Int x X). And it can be shown that an initial algebra in: F (Mu F) -> Mu F is an isomorphism and the carrier of the initial algebra is (upto isomorphism) a fixed point of the functor. Well, that may sound a bit of a mouthful. But we can discuss this in more details in one of my subsequent posts. There's a well established lemma due to Lambek that proves this. I can't do it in this blog post, since it needs some more prerequisites to be established beforehand which would make this post a bit bloated. But it's really a fascinating proof and I promise to take this up in one of my upcoming posts. Also we will see many properties of initial algebras and how they can be combined to define many of the properties of recursive data types in a purely algebraic way.

As I promised in my last post, here we have seen the other side of Mu - we started with the list definition, showed that it forms an initial algebra over the endofunctor F(X) = 1 + (Int x X) and arrived at the same conclusion that Mu F is a fixed point. Or Mu is the fixed point combinator.

Sunday, January 15, 2012

Event Sourcing, Akka FSMs and functional domain models

I blogged on Event Sourcing and functional domain models earlier. In this post I would like to share more of my thoughts on the same subject and how with a higher level of abstraction you can make your domain aggregate boundary more resilient and decoupled from external references.

When we talk about a domain model, the Aggregate takes the centerstage. An aggregate is a core abstraction that represents the time invariant part of the domain. It's an embodiment of all states that the aggregate can be in throughout its lifecycle in the system. So, it's extremely important that we take every pain to distil the domain model and protect the aggregate from all unwanted external references. Maybe an example will make it clearer.

Keeping the Aggregate pure

Consider a Trade model as the aggregate. By Trade, I mean a security trade that takes place in the stock exchange where counterparties exchange securities and currencies for settlement. If you're a regular reader of my blog, you must be aware of this, since this is almost exclusively the domain that I talk of in my blog posts.

A trade can be in various states like newly entered, value date added, enriched with tax and fee information, net trade value computed etc. In a trading application, as a trade passes through the processing pipeline, it moves from one state to another. The final state represents the complete Trade object which is ready to be settled between the counterparties.

In the traditional model of processing we have the final snapshot of the aggregate - what we don't have is the audit log of the actual state transitions that happened in response to the events. With event sourcing we record the state transitions as a pipeline of events which can be replayed any time to rollback or roll-forward to any state of our choice. Event sourcing is coming up as one of the potent ways to model a system and there are lots of blog posts being written to discuss about the various architectural strategies to implement an event sourced application.

That's ok. But whose responsibility is it to manage these state transitions and record the timeline of changes ? It's definitely not the responsibility of the aggregate. The aggregate is supposed to be a pure abstraction. We must design it as an immutable object that can respond to events and transform itself into the new state. In fact the aggregate implementation should not be aware of whether it's serving an event sourced architecture or not.

There are various ways you can model the states of an aggregate. One option that's frequently used involves algebraic data types. Model the various states as a sum type of products. In Scala we do this as case classes ..

sealed abstract class Trade {
  def account: Account
  def instrument: Instrument
  //..
}

case class NewTrade(..) extends Trade {
  //..
}

case class EnrichedTrade(..) extends Trade {
  //..
}

Another option may be to have one data type to model the Trade and model states as immutable enumerations with changes being effected on the aggregate as functional updates. No in place mutation, but use functional data structures like zippers or type lenses to create the transformed object in the new state. Here's an example where we create an enriched trade out of a newly created one ..

// closure that enriches a trade
val enrichTrade: Trade => Trade = {trade =>
  val taxes = for {
    taxFeeIds      <- forTrade // get the tax/fee ids for a trade
    taxFeeValues   <- taxFeeCalculate // calculate tax fee values
  }
  yield(taxFeeIds ° taxFeeValues)
  val t = taxFeeLens.set(trade, taxes(trade))
  netAmountLens.set(t, t.taxFees.map(_.foldl(principal(t))((a, b) => a + b._2)))
}

But then we come back to the same question - if the aggregate is distilled to model the core domain, who handles the events ? Someone needs to model the event changes, effect the state transitions and take the aggregate from one state to the next.

Enter Finite State Machines

In one of my projects I used the domain service layer to do this. The domain logic for effecting the changes lies with the aggregate, but they are invoked from the domain service in response to events when the aggregate reaches specific states. In other words I model the domain service as a finite state machine that manages the lifecycle of the aggregate.

In our example a Trading Service can be modeled as an FSM that controls the lifecycle of a Trade. As the following ..

import TradeModel._

class TradeLifecycle(trade: Trade, timeout: Duration, log: Option[EventLog]) 
  extends Actor with FSM[TradeState, Trade] {
  import FSM._

  startWith(Created, trade)

  when(Created) {
    case Event(e@AddValueDate, data) =>
      log.map(_.appendAsync(data.refNo, Created, Some(data), e))
      val trd = addValueDate(data)
      notifyListeners(trd) 
      goto(ValueDateAdded) using trd forMax(timeout)
  }

  when(ValueDateAdded) {
    case Event(StateTimeout, _) =>
      stay

    case Event(e@EnrichTrade, data) =>
      log.map(_.appendAsync(data.refNo, ValueDateAdded, None,  e))
      val trd = enrichTrade(data)
      notifyListeners(trd)
      goto(Enriched) using trd forMax(timeout)
  }

  when(Enriched) {
    case Event(StateTimeout, _) =>
      stay

    case Event(e@SendOutContractNote, data) =>
      log.map(_.appendAsync(data.refNo, Enriched, None,  e))
      sender ! data
      stop
  }

  initialize
}

The snippet above contains a lot of other details which I did not have time to prune. It's actually part of the implementation of an event sourced trading application that uses asynchronous messaging (actors) as the backbone for event logging and reaching out to multiple consumers based on the CQRS paradigm.

Note that the FSM model above makes it very explicit about the states that the Trade model can reach and the events that it handles while in each of these states. Also we can use this FSM technique to log events (for event sourcing), notify listeners about the events (CQRS) in a very much declarative manner as implemented above.

Let me know in the comments what are your views on this FSM approach towards handling state transitions in domain models. I think it helps keep aggregates pure and helps design domain services that focus on serving specific aggregate roots.

I will be talking about similar stuff, Akka actor based event sourcing implementations and functional domain models in PhillyETE 2012. Please drop by if this interests you.

Sunday, January 08, 2012

Learning the type level fixpoint combinator Mu

I blogged on Mu, type level fixpoint combinator some time back. I discussed how Mu can be implemented in Scala and how you can use it to derive a generic model for catamorphism and some cool type level data structures. Recently I have been reading TAPL by Benjamin Pierce that gives a very thorough treatment of the theories and implementation semantics of types in a programming language.

And Mu we meet again. Pierce does a very nice job of explaining how Mu does for types what Y does for values. In this post, I will discuss my understanding of Mu from a type theory point of view much of what TAPL explains.

As we know, the collection of types in a programming language forms a category and any equation recursive in types can be converted to obtain an endofunctor on the same category. In an upcoming post I will discuss how the fixed point that we get from Mu translates to an isomoprhism in the diagram of categories.

Let's have a look at the Mu constructor - the fixed point for type constructor. What does it mean ?

Here's the ordinary fixed point combinator for functions (from values to values) ..

Y f = f (Y f)

and here's Mu

Mu f = f (Mu f)

Quite similar in structure to Y, the difference being that Mu operates on type constructors. Here f is a type constructor (one that takes a type as input and generates another type). List is the most commonly used type constructor. You give it a type Int and you get a concrete type ListInt.

So, Mu takes a type constructor f and gives you a type T. This T is the fixed point of f, i.e. f T = T.

Consider the following recursive definition of a List ..

// nil takes no arguments and returns a List data type

nil : 1 -> ListInt



// cons takes 2 arguments and returns a List data type

cons : (Int x ListInt) -> ListInt

Taken together we would like to solve the following equation :

a = Unit + Int x a     // ..... (1)

Now this is recursive and can be unfolded infinitely as

a = Unit + Int x (Unit + Int x a)

  = Unit + Int x (Unit + Int x (Unit + Int x a))

  = ...

TAPL shows that this equation can be represented in the form of an infinite labeled tree and calls this infinite type regular. So, generally speaking, we have an equation of the form a = τ where

1. if a does not occur in τ, then we have a finite solution which, in fact is τ
2. if a occurs in τ, then we have an infinite solution represented by an infinite regular tree

So the above equation is of the form a = ... a ... or we can say a = F(a) where F is the type constructor. This highlights the recursion of types (not of values). Hence any solution to this equation will give us an object which will be the fixed point of the equation. We call this solution Mu a . F.

Since Mu a . F is a solution to a = F(a), we have the following:

Mu a . F = F {Mu a . F / a}, where the rhs indicates substitution of all free a's in F by Mu a . F.

Here Mu is the fixed point combinator which takes the type constructor F and gives us a type, which is the fixed point of F. Using this idea, the above equation (1) has the solution ListInt, which is the fixed point type ..

ListInt = Mu a . Unit + Int x a

In summary, we express recursive types using the fix point type constructor Mu and show that Mu generates the fixed point for the type constructor just like Y generates the same for functions on values.

Sunday, January 01, 2012

2011 - The year that was

The very first thing that strikes me as I start writing a personal account of 2011 as it was is how it has successfully infused some of the transformations in my regular chores of programming world. It has been different and I am starting to enjoy some of the renewed vigor in areas like Type Systems, Machine Learning, Algebra etc. Throughout the year I used mostly one single language - Scala for programming with some occasional stints in Haskell and Octave for the Stanford Machine Learning course. But I have no regrets in not being more polyglotic, because I could find more time to dig deep into some of the more fundamental areas like algebra, category theory and type systems.

Favorite books read / started reading

Types and Programming Languages by Benjamin Pierce : definitely a Knuth statured book in the theory of type systems in programming languages. It's written with a very pragmatic outlook and contains all necessary implementation details to complement the accompanying theory. I have not yet finished reading the book. I am into Chapter 20 and 21 doing recursive types, that look to be one of the most exhaustive treatments of the subject I have ever seen. If and when I manage to finish reading this book, my next plan for theory of programming languages is Design Concepts in Programming Languages.
Conceptual Mathematics by F. William Lawvere and Stephen H. Schanuel - I started reading this book from recommendation by Paul Snively as a precursor to Benjamin Pierce's Category Theory for Computer Scientists. This is an excellent introduction to Category Theory and contains a detailed treatment of the unifying ideas of mathemetics, set theory and category theory.
Learn You a Haskell for Great Good by Miran Lipovaca - possibly the most recommended and updated Haskell reading in print form. The chapters on Applicative Functors, Monads and Zippers are real treats.
Language Proof and Logic by Jon Barwise and John Etchemendy - Starts with a great review of logic and goes on to discuss proofs of soundness and
A Tribute to a Mathemagician by Cipra, Demaine, Demaine and Rodgers - Another book in a series written by those illustrious mathematicians and puzzlers who were inspired by Martin Gardner. It's a fascinating collection of essays on mathematical puzzles - get it if you have that bent of mind.

Exploring new ideas

Category Theory - Often debated on its usefulness in the practical world, category theory gives you the basic understanding of programming language design, semantics and domain theory. I did lots of readings on Category Theory this year and this has led to a more concrete understanding of type systems as well. Hope to continue more in 2012.
Algebra - What's the algebra behind the term Algebraic Data Types ? I took some notes as I started understanding the algebra of recursive data types. Have a look at my notes on github.
Machine Learning - I took the Stanford online course on machine learning. It's been a revelation for me to find the pervasiveness of the subject in today's application context. Also the course encouraged me to look more into mathematics that govern all the theories that ML implements.

Some great papers read

Theorems for Free! by Philip Wadler
Monad Transformers and Modular Interpreters by Sheng Liang, Paul Hudak and Mark Jones
Lazy Functional State Threads by John Launchbury and Simon Peyton Jones
A Domain-Specific Language for manipulation of binary data in Dylan by Hannes Mehnert and Andreas Bogk
RRB-Trees: Efficient Immutable Vectors by Phil Bagwell and Tiark Rompf
Categorical Programming with Inductive and Coinductive Types by Varmo Vene
Functional Programming with Overloading and Higher-Order Polymorphism by Mark P Jones

Programming and Open Source

Once again a year passed by where I did 95% of programming in Scala. Scala has somehow hit the sweet spot of my liking - OO, FP, JVM, succinctness, I get them all in Scala. However, having said that I have every honest intention to renew all my friendships with Haskell and Clojure in 2012. I did quite a bit of Haskell in 2010 and still reaping the ebnefits of being a better Scala programmer piggybacking on my Haskell thoughts. I know Haskell is purer, a piece of Haskell code can be poetry. But the pragmatics of being on the JVM makes Scala more appealing to my professional life.

Two of my open source projects sjson and scala-redis are still quite active. I get pull requests on a regular basis and of course quite a few feature requests and bugs reported on Github. I plan to make some major upgrades to sjson particularly when reflection becomes more accessible in Scala 2.10. Also in line are some enhancements planned towards functor based JSON composition in sjson, which I plan to take up pretty soon. I tried to upgrade scala-redis to keep it in sync with the various releases of redis. Thanks to all of you for trying out sjson and scala-redis. Open source programming is fun and I consider myself blessed to have the opportunities to give something back to the community, which has given me so much over the years.

Any mention of my programming activities in 2011 would be incomplete without mentioning scalaz. I now use it in almost every project. It's really a great creation by Tony, Runar, Jason, Paul and the other members of the team. Using scalaz, I have learnt a lot about functional programming and functional thinking.

Another library that I have been using regularly since its inception is Akka. Asynchronous messaging is the gateway towards writing scalable applications and Akka provides the right set of batteries towards that. You get messaging, data flows, agents, STMs and all through a nice set of APIs both in Java and Scala. I think Akka is nicely poised to be the killer application to push Scala into the mainstream.

Some Publications

In 2011 I got the following two papers published, one of them as part of the esteemed team of Justin, Kresten and Steve. Thanks guys ..

Debasish Ghosh, Justin Sheehy, Kresten Krab Thorup and Steve Vinoski, "Programming Language Impact on the Development of Distributed Systems," FOME'11: Future of Middleware at Middleware'2011.
Debasish Ghosh, "DSL for the Uninitiated," Communications of the ACM, vol. 54, no. 7, pp. 44-50, July 2011

Some nice experiences

I attended 2 international conferences in 2011 - QCon London and PhillyETE. I also talked at PhillyETE on Domain Specific Languages. Both the conferences were amazing and I got to know in person many of the faces that I see and talk to regularly on Twitter and Google+. Incidentally I will also be talking at PhillyETE 2012 slated to be held in April.

My book DSLs In Action came out in late Dec 2010. 2011 was the year where I got the first royalty check from Manning. The writing of the book has been an amazing experience and to get to hear good words from people using the book gives another level of satisfaction. Thank you Manning for giving me the opportunity.

Looking forward to 2012

I am not one for resolutions, but here's a wish list towards more geekery in 2012 ..

Program more in Haskell and Clojure
Blog more (It was pathetic in 2011)
Do more math
Attend more online classes (currently registered for Natural Language Processing, Algorithms and Probabilistic Graphical Modeling at Stanford)
Try to do more conferences (currently registered for PhillyETE and Scala Days)
Learn more algebra, type theory and category theory
Get started with TAOCP Vol 4A
Learn Factor

Wish you a very happy new year. See you all in 2012!