Friday, August 27, 2010

Random thoughts on Clojure Protocols

Great languages are those that offer orthogonality in design. Stated simply it means that the language core offers a minimal set of non-overlapping ways to compose abstractions. In an earlier article A Case for Orthogonality in Design I discussed some features from languages like Haskell, C++ and Scala that help you compose higher order abstractions from smaller ones using techniques offered by those languages.

In this post I discuss the new feature in Clojure that just made its way in the recently released 1.2. I am not going into what Protocols are - there are quite a few nice articles that introduce Clojure Protocols and the associated defrecord and deftype forms. This post will be some random rants about how protocols encourage non intrusive extension of abstractions without muddling inheritance into polymorphism. I also discuss some of my realizations about what protocols aren't, which I felt was equally important along with understanding what they are.

Let's start with the familiar Show type class of Haskell ..

> :t show
show :: (Show a) => a -> String

Takes a type and renders a string for it. You get show for your class if you have implemented it as an instance of the Show type class. The Show type class extends your abstraction transparently through an additional behavior set. We can do the same thing using protocols in Clojure ..

(defprotocol SHOW 
  (show [val]))

The protocol definition just declares the contract without any concrete implementation in it. Under the covers it generates a Java interface which you can use in your Java code as well. But a protocol is not an interface.

Adding behaviors non-invasively ..

I can extend an existing type with the behaviors of this protocol. And for this I need not have the source code for the type. This is one of the benefits that ad hoc polymorphism of type classes offers - type classes (and Clojure protocols) are open. Note how this is in contrast to the compile time coupling of Java interface and inheritance.

Extending java.lang.Integer with SHOW ..

(extend-type Integer
  SHOW
  (show [i] (.toString i)))

We can extend an interface also. And get access to the added behavior from *any* of its implementations .. Here's extending clojure.lang.IPersistentVector ..

(extend-type clojure.lang.IPersistentVector
  SHOW
  (show [v] (.toString v)))

(show [12 1 4 15 2 4 67])
> "[12 1 4 15 2 4 67]"

And of course I can extend my own abstractions with the new behavior ..

(defrecord Name [last first])

(defn name-desc [name]
  (str (:last name) " " (:first name)))

(name-desc (Name. "ghosh" "debasish")) ;; "ghosh debasish"

(extend-type Name
  SHOW
  (show [n]
    (name-desc n)))

(show (Name. "ghosh" "debasish")) ;; "ghosh debasish"

No Inheritance

Protocols help you wire abstractions that are in no way related to each other. And it does this non-invasively. An object conforms to a protocol only if it implements the contract. As I mentioned before, there's no notion of hierarchy or inheritance related to this form of polymorphism.

No object bloat, no monkey patching

And there's no object bloat going on here. You can invoke show on any abstraction for which you implement the protocol, but show is never added as a method on that object. As an example try the following after implementing SHOW for Integer ..

(filter #(= "show" (.getName %)) (.getMethods Integer))

will return an empty list. Hence there is no scope of *accidentally* overriding some one else's monkey patch on some shared class.

Not really a type class

Clojure protocols dispatch on the first argument of the methods. This limits its ability from getting the full power that Haskell / Scala type classes offer. Consider the counterpart of Show in Haskell, which is the Read type class ..

> :t read  
read :: (Read a) => String -> a

If your abstraction implements Read, then the exact instance of the method invoked will depend on the return type. e.g.

> [1,2,3] ++ read "[4,5,6]"
=> [1,2,3,4,5,6]

The specific instance of read that returns a list of integers is automatically invoked here. Haskell maintains the dispatch match as part of its global dictionary.

We cannot do this in Clojure protocols, since it's unable to dispatch based on the return type. Protocols dispatch only on the first argument of the function.


Tuesday, August 10, 2010

Using generalized type constraints - How to remove code with Scala 2.8

I love removing code. More I remove lesser is the surface area for bugs to bite. Just now I removed a bunch of classes, made unnecessary by Scala 2.8.0 type system.

Consider this set of abstractions, elided for demonstration purposes ..

trait Instrument

// equity
case class Equity(name: String) extends Instrument

// fixed income
abstract class FI(name: String) extends Instrument
case class DiscountBond(name: String, discount: Int) extends FI(name)
case class CouponBond(name: String, coupon: Int) extends FI(name)


Well, it's the instrument hierarchy (simplified) that gets traded in a securities exchange everyday. Now we model a security trade that exchanges instruments and currencies ..

class Trade[<: Instrument](id: Int, account: String, instrument: I) {
  //..
  def calculateNetValue(..) = //..
  def calculateValueDate(..) = //..
  //..
}


In real life a trade will have lots and lots of attributes. But here we don't need them, since our only purpose here is to demonstrate how we can throw away some piece of code :)

Trade can have lots of methods which model the domain logic of the trading process, calculating the net amount of the trade, the value date of the trade etc. Note all of these are valid processes for every type of instrument.

Consider one usecase that calculates the accrued interest of a trade. The difference with other methods is that accrued interest is only applicable for Coupon Bonds, which, according to the above hierarchy is a subtype of FI. How do we express this constraint in the above Trade abstraction ? What we need is to constrain the instrument in the method.

My initial implementation was to make the AccruedInterestCalculator a separate class parameterized with the Trade of the appropriate type of instrument ..

class AccruedInterestCalculator[<: Trade[CouponBond]](trade: T) {
  def accruedInterest(convention: String) = //.. impl
}


and use it as follows ..

val cb = CouponBond("IBM", 10)
val trd = new Trade(1, "account-1", cb)
new AccruedInterestCalculator(trd).accruedInterest("30U/360")


Enter Scala 2.8 and the generalized type constraints ..

Before Scala 2.8, we could not specialize the Instrument type I for any specific method within Trade beyond what was specified as the constraint in defining the Trade class. Since calculation of accrued interest is only valid for coupon bonds, we could only achieve the desired effect by having a separate abstraction as above. Or we could take recourse to runtime checks.

Scala 2.8 introduces generalized type constraints which allow you to do exactly this. We have 3 variants as:
      
  • A =:= B, which mandates that A and B should exactly match
  •  
  • A <:< B, which mandates that A must conform to B
  •  
  • A A <%< B, which means that A must be viewable as B

Predef.scala contains these definitions. Note that unlike <: or >:, the generalized type constraints are not operators. They are classes, instances of which are implicitly provided by the compiler itself to enforce conformance to the type constraints. Here's an example for our use case ..

class Trade[<: Instrument](id: Int, account: String, instrument: I) {
  //..
  def accruedInterest(convention: String)(implicit ev: I =:= CouponBond): Int = {
    //..
  }
}



ev is the type class which the compiler provides that ensures that we invoke accruedInterest only for CouponBond trades. You can now do ..


val cb = CouponBond("IBM", 10)
val trd = new Trade(1, "account-1", cb)
trd.accruedInterest("30U/360")


while the compiler will complain with an equity trade ..

val eq = Equity("GOOG")
val trd = new Trade(2, "account-1", eq)
trd.accruedInterest("30U/360")



Now I can throw away my AccruedInterestCalculator class and all associated machinery. A simple type constraint tells us a lot and models domain constraints, and all that too at compile time. Yum!


You can also use the other variants to great effect when modeling your domain logic. Suppose you have a method that can be invoked only for all FI instruments, you can express the constraint succinctly using <:< ..

class Trade[<: Instrument](id: Int, account: String, instrument: I) {
  //..
  def validateInstrumentNotMatured(implicit ev: I <:< FI): Boolean = {
    //..
  }
}


This post is not about discussing all capabilities of generalized type constraints in Scala. Have a look at these two threads on StackOverflow and this informative gist by Jason Zaugg (@retronym on Twitter) for all the details. I just showed you how I removed some of my code to model my real world domain logic in a more succinct way that also fails fast during compile time.




Update: In response to the comments regarding Strategy implementation ..

Strategy makes a great use case when you want to have multiple implementations of an algorithm. In my case there was no variation. Initially I kept it as a separate abstraction because I was not able to constrain the instrument type in the accruedInterest method whole being within the trade class. Calculating accruedInterest is a normal domain operation for a CouponBond trade - hence trade.accruedInterest(..) looks to be a natural API for the context.

Now let us consider the case when the calculation strategy can vary. We can very well extract the variable part from the core implementation and model it as a separate strategy abstraction. In our case, say the calculation of accrued interest will depend on principal of the trade and the trade date (again, elided for simplicity of demonstration) .. hence we can have the following contract and one sample implementation:

trait CalculationStrategy {
  def calculate(principal: Int, tradeDate: java.util.Date): Int
}

case class DefaultImplementation(name: String) extends CalculationStrategy {
  def calculate(principal: Int, tradeDate: java.util.Date) = {
    //.. impl
  }
}

But how do we use it within the core API that the Trade class publishes ? Type Classes to the rescue (once agian!) ..

class Trade[<: Instrument](id: Int, account: String, instrument: I) {
  //..
  def accruedInterest(convention: String)(implicit ev: I =:= CouponBond, strategy: CalculationStrategy): Int = {
    //..
  }
}

and we can now use the type classes using our own specific implementation ..

implicit val strategy = DefaultImplementation("default")
  
val cb = CouponBond("IBM", 10)
val trd = new Trade(1, "account-1", cb)
trd.accruedInterest("30U/360")  // uses the default type class for the strategy

Now we have the best of both worlds. We implement the domain constraint on instrument using the generalized type constraints and use type classes to make the calculation strategy flexible.

Monday, August 09, 2010

Updates on DSLs In Action - Into Copy Editing


I have completed writing DSLs In Action. As we speak, the book has moved from the development editor to the copy editor. I will be starting the process of copy editing along with the team of helpful copy editors of Manning.

The Table of Contents has been finalized. Have a look at the details and send me your feedbacks regarding the contents of the book.

DSLs In Action is a book for the practitioner. It contains real world experience of writing DSLs in a multitude of JVM languages. As the table of contents show, I have used Java, Groovy, Ruby, Scala and Clojure to demonstrate their power in DSL design and implementation. I have also focused on the integration aspects between these languages, which is fashionably known today by the name of polyglot programming.

All examples in the book are from the real world domain of securities trading and brokerage systems. I have intentionally chosen a specific domain to demonstrate the progression of DSL implementation from small trivial examples to serious complex and non-trivial ones. This also goes to bust a common myth that DSLs are applicable only for toy examples.

Another recurring theme throughout the book has been a strong focus on abstraction design. Designing good DSLs in an exercise in making well-designed abstractions. A DSL is really a thin linguistic abstraction on top of the semantic model of the domain. If the underlying model is expressive enough and publishes well behaved abstractions, then designing a user friendly syntax on top of it becomes easy. The book discusses lots of tools and techniques that will help you think in terms of designing expressive DSLs.

The book is replete with code written in multiple languages. You can get it all by cloning my github repo which contains maven based instructions to try most of them yourself.

And finally, thanks to all the reviewers for the great feedback received so far. They have contributed a lot towards improvement of the book, all remaining mistakes are mine.

Monday, July 19, 2010

sjson: Now offers Type Class based JSON Serialization in Scala

sjson's serialization APIs have so long been based on reflection. The advantage was that the API was remarkably easy to use, while the heavy lifting was done underneath by the reflection based implementation.

However we need to remember that there's a big difference between the richness of type information that a JSON structure has and that which a Scala object can have. Unless you preserve the type information as part of your serialization protocol when going from Scala to JSON, it becomes very tricky and extremely difficult in some cases to do a lossless transformation. And with the JVM, type erasure makes it almost impossible to reconstruct some of the serialized JSON structures into the corresponding original Scala objects.

From ver 0.7, sjson offers JSON serialization protocol that does not use reflection in addition to the original one. This is useful in the sense that the user gets to define his own protocol for serializing custom objects to JSON. Whatever you did with annotations in Reflection based JSON Serialization, you can define custom protocol to implement them.

sjson's type class based serialization is inspired from the excellent sbinary by David MacIver (currently maintained by Mark Harrah) and uses the same protocol and even steals many of the implementation artifacts.

For an introduction to the basics of the concepts of type class, its implementation in Scala and how type class based serialization protocols can be designed in Scala, refer to the following blog posts which I wrote a few weeks back:


JSON Serialization of built-in types

Here’s a sample session at the REPL that uses the default serialization protocol of sjson ..

scala> import sjson.json._
import sjson.json._

scala> import DefaultProtocol._
import DefaultProtocol._

scala> val str = "debasish"
str: java.lang.String = debasish

scala> import JsonSerialization._
import JsonSerialization._

scala> tojson(str)
res0: dispatch.json.JsValue = "debasish"

scala> fromjson[String](res0)
res1: String = debasish


Now consider a generic data type List in Scala. Here’s how the protocol works ..

scala> val list = List(10, 12, 14, 18)
list: List[Int] = List(10, 12, 14, 18)

scala> tojson(list)
res2: dispatch.json.JsValue = [10, 12, 14, 18]

scala> fromjson[List[Int]](res2)
res3: List[Int] = List(10, 12, 14, 18)

Define your Class and Custom Protocol

In the last section we saw how default protocols based on type classes are being used for serialization of standard data types. If you have your own class, you can define your custom protocol for JSON serialization.

Consider a case class in Scala that defines a Person abstraction .. But before we look into how this serializes into JSON and back, here's the generic serialization protocol in sjson :-

trait Writes[T] {
  def writes(o: T): JsValue
}

trait Reads[T] {
  def reads(json: JsValue): T
}

trait Format[T] extends Writes[T] with Reads[T]

Format[] is the type class that specifies the contract for serialization. For your own abstraction you need to provide an implementation of the Format[] type class. Let’s do the same for Person within a specific Scala module. In case you don't remember the role that modules play in type class based design in Scala, they allow selection of the appropriate instance based on the static type checking that the language offers. This is something that you don't get in Haskell.

object Protocols {
  // person abstraction
  case class Person(lastName: String, firstName: String, age: Int)

  // protocol definition for person serialization
  object PersonProtocol extends DefaultProtocol {
    import dispatch.json._
    import JsonSerialization._

    implicit object PersonFormat extends Format[Person] {
      def reads(json: JsValue): Person = json match {
        case JsObject(m) =>
          Person(fromjson[String](m(JsString("lastName"))), 
            fromjson[String](m(JsString("firstName"))), fromjson[Int](m(JsString("age"))))
        case _ => throw new RuntimeException("JsObject expected")
      }

      def writes(p: Person): JsValue =
        JsObject(List(
          (tojson("lastName").asInstanceOf[JsString], tojson(p.lastName)), 
          (tojson("firstName").asInstanceOf[JsString], tojson(p.firstName)), 
          (tojson("age").asInstanceOf[JsString], tojson(p.age)) ))
    }
  }
}

Note that the implementation of the protocol uses the dispatch-json library from Nathan Hamblen. Basically the methods writes and reads define how the JSON serialization will be done for my Person object. Now we can fire up a scala REPL and see it in action :-

scala> import sjson.json._
import sjson.json._

scala> import Protocols._
import Protocols._

scala> import PersonProtocol._
import PersonProtocol._

scala> val p = Person("ghosh", "debasish", 20)
p: sjson.json.Protocols.Person = Person(ghosh,debasish,20)

scala> import JsonSerialization._
import JsonSerialization._

scala> tojson[Person](p)         
res1: dispatch.json.JsValue = {"lastName" : "ghosh", "firstName" : "debasish", "age" : 20}

scala> fromjson[Person](res1)
res2: sjson.json.Protocols.Person = Person(ghosh,debasish,20)

We get serialization of the object into JSON structure and then back to the object itself. The methods tojson and fromjson are part of the Scala module that uses the type class Format as implicits. Here’s how we define it ..

object JsonSerialization {
  def tojson[T](o: T)(implicit tjs: Writes[T]): JsValue = {
    tjs.writes(o)
  }

  def fromjson[T](json: JsValue)(implicit fjs: Reads[T]): T = {
    fjs.reads(json)
  }
}

Verbose ?

Sure .. you have to do a lot of stuff to define the protocol for your class. If you have a case class, the sjson has some out of the box magic for you where you can do away with all the verbosity. Once again the Scala’s type system to the rescue.

Let’s see how the protocol can be extended for your custom classes using a much less verbose API which applies only for case classes. Here’s a session at the REPL ..

scala> case class Shop(store: String, item: String, price: Int)
defined class Shop

scala> object ShopProtocol extends DefaultProtocol {
     |   implicit val ShopFormat: Format[Shop] = 
     |       asProduct3("store", "item", "price")(Shop)(Shop.unapply(_).get)
     |   }
defined module ShopProtocol

scala> import ShopProtocol._
import ShopProtocol._

scala> val shop = Shop("Shoppers Stop", "dress material", 1000)
shop: Shop = Shop(Shoppers Stop,dress material,1000)

scala> import JsonSerialization._
import JsonSerialization._

scala> tojson(shop)
res4: dispatch.json.JsValue = {"store" : "Shoppers Stop", "item" : "dress material", "price" : 1000}

scala> fromjson[Shop](res4)
res5: Shop = Shop(Shoppers Stop,dress material,1000)

If you are curious about what goes on behind the asProduct3 method, feel free to peek into the source code.

Tuesday, July 06, 2010

Refactoring into Scala Type Classes

A couple of weeks back I wrote about type class implementation in Scala using implicits. Type classes allow you to model orthogonal concerns of an abstraction without hardwiring it within the abstraction itself. This takes the bloat away from the core abstraction implementation into separate independent class structures. Very recently I refactored Akka actor serialization and gained some real insights into the benefits of using type classes. This post is a field report of the same.

Inheritance and traits looked good ..

.. but only initially. Myself and Jonas Boner had some cool discussions on serializable actors where the design we came up with looked as follows ..

trait SerializableActor extends Actor 
trait StatelessSerializableActor extends SerializableActor

trait StatefulSerializerSerializableActor extends SerializableActor {
  val serializer: Serializer
  //..
}

trait StatefulWrappedSerializableActor extends SerializableActor {
  def toBinary: Array[Byte]
  def fromBinary(bytes: Array[Byte])
}

// .. and so on 

All these traits make the concerns of serializability just too coupled with the core Actor implementation. And with various forms of serializable actors, clearly we were running out of class names. One of the wisdoms that the GoF Patterns book taught us was that when you struggle naming your classes using inheritance, you're definitely doing it wrong! Look out for other ways that separate the concerns more meaningfully.

With Type Classes ..

We took the serialization stuff out of the core Actor abstraction into a separate type class.

/**
 * Type class definition for Actor Serialization
 */
trait FromBinary[<: Actor] {
  def fromBinary(bytes: Array[Byte], act: T): T
}

trait ToBinary[<: Actor] {
  def toBinary(t: T): Array[Byte]
}

// client needs to implement Format[] for the respective actor
trait Format[<: Actor] extends FromBinary[T] with ToBinary[T]

We define 2 type classes FromBinary[T <: Actor] and ToBinary[T <: Actor] that the client needs to implement in order to make actors serializable. And we package them together as yet another trait Format[T <: Actor] that combines both of them.

Next we define a separate module that publishes APIs to serialize actors that use these type class implementations ..

/**
 * Module for actor serialization
 */
object ActorSerialization {

  def fromBinary[<: Actor](bytes: Array[Byte])
    (implicit format: Format[T]): ActorRef = //..

  def toBinary[<: Actor](a: ActorRef)
    (implicit format: Format[T]): Array[Byte] = //..

  //.. implementation
}

Note that these type classes are passed as implicit arguments that the Scala compiler will pick up from the surrounding lexical scope. Here's a sample test case which implements the above strategy ..

A sample actor with encapsulated state. Note that we no longer have any incidental complexity of my actor having to inherit from any specialized Actor class ..

class MyActor extends Actor {
  var count = 0

  def receive = {
    case "hello" =>
      count = count + 1
      self.reply("world " + count)
  }
}

and the client implements the type class for protocol buffer based serialization and package it as a Scala module ..

object BinaryFormatMyActor {
  implicit object MyActorFormat extends Format[MyActor] {
    def fromBinary(bytes: Array[Byte], act: MyActor) = {
      val p = Serializer.Protobuf
                        .fromBinary(bytes, Some(classOf[ProtobufProtocol.Counter]))
                        .asInstanceOf[ProtobufProtocol.Counter]
      act.count = p.getCount
      act
    }
    def toBinary(ac: MyActor) =
      ProtobufProtocol.Counter.newBuilder.setCount(ac.count).build.toByteArray
  }
}

We have a test snippet that uses the above type class implementation ..

import ActorSerialization._
import BinaryFormatMyActor._

val actor1 = actorOf[MyActor].start
(actor1 !! "hello").getOrElse("_") should equal("world 1")
(actor1 !! "hello").getOrElse("_") should equal("world 2")

val bytes = toBinary(actor1)
val actor2 = fromBinary(bytes)
actor2.start
(actor2 !! "hello").getOrElse("_") should equal("world 3")

Note that the state is correctly serialized by toBinary and then subsequently de-serialized to get the updated value of the Actor state.

This refactoring has made the core actor implementation much cleaner moving away the concerns of serialization to a separate abstraction. The client code also becomes cleaner in the sense that the client actor definition does not include details of how the actor state is being serialized. Scala's power of implicit arguments and executable modules made this type class based implementation possible.