Ruminations of a Programmer

Sunday, August 16, 2009

5 Reasons why you should learn a new language NOW!

There have been quite a few murmers in the web sphere today regarding the ways Java programming paradigms have changed since its inception in the late 90s. A clear mandate and recommendation towards immutable abstractions, DSL like interfaces, actor based concurrency models indicate a positive movement towards a trend that nicely aligns with all the language research that has been going on in the community since quite some time. Language platforms are also improving by the day, efforts have been on for making the platforms a better host for multi-paradigm languages. Time is now for you to learn a new language - here are some of my thoughts of why you should invest in learning a new language of your choice .. NOW!

#1

Language barriers are going down - polyglot programming is on the way up. Two of the big enablers towards this movement are:

Middleware inter-operability using document formats like JSON. You can implement persistent actors in Scala or Java that use MongoDB or CouchDB as the storage of JSON documents, which interoperate nicely with your payment gateway system hosted on MochiWeb, developed on an Erlang stack.

Easier language inter-operability using DSLs. While you are on a specific platform like the Java Virtual Machine you can design better APIs in an alternative language that interoperates with the core language of your application. Here's how I got hooked on to Scala in an attempt to make my Java objects smarter and publish better APIs to my clients. Even Google, known for their selective set of languages to use in production applications, have been using s-expressions as an intermediate language expressed as a set of Scheme macros for their Android platform.

#2

Learning a different language helps you look at a problem in a different way. Maybe, the new way models your domain more expressively and succinctly. And you will need to write and maintain lesser amount of code in the new language. Once you're familiar with the paradigms of the new language, idiomatic code will look more expressive to you, and you will never complain about the snippet in defence of the average programmer. What you flaunt today as design patterns will come as natural idiomatic expressions in your new language - you will be programming at a higher level of abstraction.

#3

Playing on the strengths that the new language offers. Long back I blogged on Erlang becoming mainstream as a middleware language. You do not have to use Erlang for the chores of application development that you do in your day job. Nor you will have to be an Erlang expert to use Erlang based solutions like RabbitMQ or CouchDB. But look at the spurt of development that have been going on using the strengths of Erlang's concurrency, distribution and fault tolerance capabilities. As of today, Erlang is unmatched in this regard. And Erlang has the momentum both as a language and as the platform that delivers robust middlware. Learning Erlang will give you more insights into the platform's capabilities and will give you the edge to make a rational decision when your client asks you to select Webmachine as the REST based platform for your next Web application talking to the Riak datastore.

#4

The Java Virtual Machine is now the cynosure of performance optimization and language research. Initially being touted as the platform for hosting statically typed languages, the JVM is now adding capabilities to make itself a better host for dynamically typed languages as well. Anything that runs on the JVM is now a candidate for being integrated into your enterprise application architecture tomorrow. Learning a new JVM language will give you a head start. And it will safeguard your so long acquired Java expertise too. JRuby is a classic example. From a really humble beginning, JRuby today offers you the best of dynamic language capabilities by virtue of being a 100% compatible Ruby interpreter and a solid player in the JVM. JRuby looks to be the future of Ruby in the enterprise application space. Groovy has acquired the mindshare of lots of Java professionals by virtue of its solid integration with the Java platform. Clojure is bringing in the revival of Lisp on the JVM. And the list continues .. Amongst the statically typed ones, Scala is fast emerging as the next mainstream language for the JVM (after Java) and can match the performance of Java as of today. And the best part is that your erstwhile investment on Java will only continue to grow - you will be able to freely interoperate any of these languages with your Java application.

#5

This is my favorite. Learn a language for the fun of it. Learn something which is radically different from what you do in your day job. Maybe Factor, maybe some other concatenative language like Forth or Joy. Or Lua, that's coming up fast as a scripting language to extend your database or application. A couple of days ago I discovered JKat, a dynamically typed, stack-based (concatenative) language similar to Forth but implemented as an interpreter on top of the JVM. You can write neat DSLs and embed the JKat interpreter very much like Lua with your application. Indulge to the sinful feeling that programming in such languages offer - you will never regret it.

Monday, August 10, 2009

Static Typing gives you a head start, Tests help you finish

In one of my earlier posts (almost a year back) I had indicated how type driven modeling leads to succinct domain structures that inherit the following goodness :

Lesser amount of code to write, since the static types encapsulate lots of business constraints

Lesser amount of tests to write, since the compiler writes them implicitly for you

In a recent thread on Twitter, I had mentioned about a comment that Manuel Chakravarty made in one of the blog posts of Micheal Feathers ..

"Of course, strong type checking cannot replace a rigorous testing discipline, but it makes you more confident to take bigger steps."

The statement resonated my own feelings on static typing that I have been practising for quite some time now using Scala. Since the twitter thread became louder, Patrick Logan made an interesting comment in my blog on this very subject ..

This is interesting... it is a long way toward the kind of explanation I have been looking for re: "type-driven programming" with rich type systems as opposed to "test-driven programming" with dynamic languages.

I am still a big fan of the latter and do not fully comprehend the former.

I'd be interested in your "type development" process - without "tests" of some kind, the type system may validate the "type soundness" of your types, but how do you know they are the types you actually *want* to have proven sound?

and the conversation became somewhat longer where both of us were trying to look into the practices and subtleties that domain modeling with type constraints imply on the programmer. One of the points that Patrick raised was regarding the kind of tests that you would typically provide for a code like this.

Let me try to look into some of the real life coding that I have been using this practice on. When I have a code snippet like this ..

/**
 * A trade needs to have a Trading Account
 */
trait Trade {
  type T
  val account: T
  def valueOf: Unit
}

/**
 * An equity trade needs to have a Stock as the instrument
 */
trait EquityTrade extends Trade {
  override def valueOf {
    //.. calculate value
  }
}

/**
 * A fixed income trade needs to have a FixedIncome type of instrument
 */
trait FixedIncomeTrade extends Trade {
  override def valueOf {
    //.. calculate value
  }
}
//..
//..

/**
 * Accrued Interest is computed only for fixed income trades
 */
trait AccruedInterestCalculatorComponent {
  type T

  val acc: AccruedInterestCalculator
  trait AccruedInterestCalculator {
    def calculate(trade: T)
  }
}

I need to do validations and write up unit and functional tests to check ..

EquityTrade needs to work only on equity class of instruments

FixedIncomeTrade needs to work on fixed incomes only and not on any other instruments

For every method in the domain model that takes an instrument or trade, I need to check if the passed in instrument or trade is of the proper type and as well write unit tests that check the same. AccruedInterestCalculator takes a trade as an argument, which needs to be of type FixedIncomeTrade, since accrued interest is only meaningful for bond trades only. The method AccruedInterestCalculator#calculate() needs to do an explicit check for the trade type which makes me write unit tests as well for valid as well as invalid use cases.

Now let us introduce the type constraints that a statically typed language with a powerful type system offers.

trait Trade {
  type T <: Trading
  val account: T

  //..as above
}

trait EquityTrade extends Trade {
  type S <: Stock
  val equity: S

  //.. as above
}

trait FixedIncomeTrade extends Trade {
  type FI <: FixedIncome
  val fi: FI

  //.. as above
}
//..

The moment we add these type constraints our domain model becomes more expressive and implicitly constrained with a lot of business rules .. as for example ..

A Trade takes place on a Trading account only

An EquityTrade only deals with Stocks, while a FixedIncomeTrade deals exclusively with FixedIncome type of instruments

Consider this more expressive example that slaps the domain constraints right in front of you without them being buried within procedural code logic in the form of runtime checks. Note that in the following example, all the types and vals that were left abstract earlier are being instantiated while defining the concrete component. And you can only instantiate honoring the domain rules that you have defined earlier. How useful is that as a succinct way to write concise domain logic without having to write any unit test ?

object FixedIncomeTradeComponentRegistry extends TradingServiceComponentImpl
  with AccruedInterestCalculatorComponentImpl
  with TaxRuleComponentImpl {

  type T = FixedIncomeTrade
  val tax = new TaxRuleServiceImpl
  val trd = new TradingServiceImpl
  val acc = new AccruedInterestCalculatorImpl
}

Every wiring that you do above is statically checked for consistency - hence the FixedIncome component that you build will honor all the domain rules that you have stitched into it through explicit type constraints.

The good part is that these business rules will be enforced by the compiler itself, without me having to write any additional explicit check in the code base. And the compiler is also the testing tool - you will not be able to instantiate a FixedIncomeTrade with an instrument that is not a subtype of FixedIncome.

Then how do we test such type constrained domain abstractions ?

Rule #1: Type constraints are tested by the compiler. You cannot instantiate an inconsistent component that violates the constraints that you have incorporated in your domain abstractions.

Rule #2: You need to write tests for the business logic only that form the procedural part of your abstractions. Obviously! Types cannot be of much help there. But if you are using a statically typed language, get the maximum out of the abstractions that the type system offers. There are situations when you will discover repetitive procedural business logic with minor variations sprinkled across the code base. If you are working with a statically typed language, model them up into a type family. Your tests for that logic will be localized *only* within the type itself. This is true for dynamically typed languages as well. Where static typing gets the advantage is that all usages will be statically checked by the compiler. In a statically typed language, you think and model in "types". In a dynamically typed languages you think in terms of the messages that the abstrcation needs to handle.

Rule #3: But you need to create instances of your abstractions within the tests. How do you do that ? Very soon you will notice that the bulk of your tests are being polluted by complicated instantiations using concrete val or type injection. What I do usually is to use the generators that ScalaCheck offers. ScalaCheck offers a special generator, org.scalacheck.Arbitrary.arbitrary, which generates arbitrary values of any supported type. And once you have the generators in place, you can use them to write properties that do the necessary testing of the rest of your domain logic.

Sunday, August 02, 2009

MongoDB for Akka Persistence

Actors and message passing have been demonstrated to be great allies in implementing some of the specific use cases of concurrent applications. Message passing concurrency promotes loosely coupled application components, and hence has the natural side-effect of almost infinite scalability. But as Jonas Boner discusses in his JavaOne 2009 presentation, there are many examples in the real world today that have to deal with shared states, transactions and atomicity of operations. Software Transactional Memory provides a viable option towards these use cases, as has been implemented in Clojure and Haskell.

Akka, designed by Jonas Boner, offers Transactors, that combine the benefits of actors and STM, along with a pluggable storage model. It provides a unified set of data structures managed by the STM and backed by a variety of storage engines. It currently supports Cassandra as the storage model out of the box.

Over the weekend I was trying out MongoDB as yet another out of the box persistence options for Akka transactors. MongoDB is a high performance, schema free document oriented database that stores documents in the form of BSON, an enhanced version of JSON. The main storage abstraction is a Collection, which can loosely be equated to a table in a relational database. Besides support for replication, fault tolerance and sharding capabilities, the aspect which makes MongoDB much more easier to use is the rich querying facilities. It supports lots of built-in query capabilities with conditional operators, regular expressions and powerful variants of SQL where clauses on the document model .. Here are some examples of query filters ..

db.myCollection.find( { $where: "this.a > 3" });
db.myCollection.find( { "field" : { $gt: value1, $lt: value2 } } );  // value1 < field < value2

and useful convenience functions ..

db.students.find().limit(10).forEach( ... )  // limit the fetch count
db.students.find().skip(..) // skip some records

In Akka we can have a collection in MongoDB that can be used to store all transacted data keyed on a transaction id. The set of data can be stored in a HashMap as key-value pairs. Have a look at the following diagram for the scheme of data storage using MongoDB Collections ..

Akka TransactionalState offers APIs to publish the appropriate storage engines depending on the configuration ..

class TransactionalState {
  def newPersistentMap(
    config: PersistentStorageConfig): TransactionalMap[String, AnyRef] = 
    config match {
    case CassandraStorageConfig() => new CassandraPersistentTransactionalMap
    case MongoStorageConfig() => new MongoPersistentTransactionalMap
  }

  def newPersistentVector(
    config: PersistentStorageConfig): TransactionalVector[AnyRef] = 
    config match {
    //..
  }

  def newPersistentRef(
    config: PersistentStorageConfig): TransactionalRef[AnyRef] = 
    config match {
    //..
  }
  //..
}

and each transactional data structure defines the transaction semantics for the underlying structure that it encapsulates. For example, for a PersistentTransactionalMap we have the following APIs ..

abstract class PersistentTransactionalMap[K, V] extends TransactionalMap[K, V] {

  protected[kernel] val changeSet = new HashMap[K, V]

  def getRange(start: Int, count: Int)

  // ---- For Transactional ----
  override def begin = {}
  override def rollback = changeSet.clear

  //.. additional map semantics .. get, put etc.
}

A concrete implementation defines the rest of the semantics used to handle transactional data. The concrete implementation is parameterized with the actual storage engine that can be plugged in for specific implementations.

trait ConcretePersistentTransactionalMap extends PersistentTransactionalMap[String, AnyRef] {
  val storage: Storage
  
  override def getRange(start: Int, count: Int) = {
    verifyTransaction
    try {
      storage.getMapStorageRangeFor(uuid, start, count)
    } catch {
      case e: Exception => Nil
    }
  }

  // ---- For Transactional ----
  override def commit = {
    storage.insertMapStorageEntriesFor(uuid, changeSet.toList)
    changeSet.clear
  }

  override def contains(key: String): Boolean = {
    try {
      verifyTransaction
      storage.getMapStorageEntryFor(uuid, key).isDefined
    } catch {
      case e: Exception => false
    }
  }

  //.. others 
}

Note the use of abstract val in the above implementation that will be concretized when we make a Mongo map ..

class MongoPersistentTransactionalMap 
  extends ConcretePersistentTransactionalMap {
  val storage = MongoStorage
}

For the Storage part, we have another trait which abstracts the storage specific APIs ..

trait Storage extends Logging {
  def insertMapStorageEntriesFor(name: String, entries: List[Tuple2[String, AnyRef]])
  def removeMapStorageFor(name: String)
  def getMapStorageEntryFor(name: String, key: String): Option[AnyRef]
  def getMapStorageSizeFor(name: String): Int
  def getMapStorageFor(name: String): List[Tuple2[String, AnyRef]]
  def getMapStorageRangeFor(name: String, start: Int, 
    count: Int): List[Tuple2[String, AnyRef]]
}

I am in the process of implementing a concrete implementation of storage using MongoDB, which will look like the following ..

object MongoStorage extends Storage {
  val KEY = "key"
  val VALUE = "val"
  val db = new Mongo(..);  // needs to come from configuration
  val COLLECTION = "akka_coll"
  val coll = db.getCollection(COLLECTION)
  
  private[this] val serializer: Serializer = ScalaJSON
  
  override def insertMapStorageEntriesFor(name: String, entries: List[Tuple2[String, AnyRef]]) {
    import java.util.{Map, HashMap}
    val m: Map[String, AnyRef] = new HashMap
    for ((k, v) <- entries) {
      m.put(k, serializer.out(v))
    }
    coll.insert(new BasicDBObject().append(KEY, name).append(VALUE, m))
  }
  
  override def removeMapStorageFor(name: String) = {
    val q = new BasicDBObject
    q.put(KEY, name)
    coll.remove(q)
  }
  //.. others
}

As the diagram above illustrates, every transaction will have its own DBObject in the Mongo Collection, which will store a HashMap that contains the transacted data set. Using MongoDB's powerful query APIs we can always get to a specific key/value pair for a particular transaction as ..

// form the query object with the transaction id
val q = new BasicDBObject
q.put(KEY, name)

// 1. use the query object to get the DBObject (findOne)
// 2. extract the VALUE which has the HashMap of transacted data set
// 3. query on the HashMap on the passed in key to get the value
// 4. use the scala-json serializer to get back the Scala object
serializer.in(
  coll.findOne(q)
      .get(VALUE).asInstanceOf[JMap[String, AnyRef]]
      .get(key).asInstanceOf[Array[Byte]], None)

MongoDB looks a cool storage engine and has already been used in production as a performant key/value store. It looks promising to be used as the backup storage engine for persistent transactional actors as well. Akka transactors look poised to evolve as a platform that can deliver the goods for stateful STM based as well as stateless message passing based concurrent applications. I plan to complete the implementation in the near future and, if Jonas agrees will be more than willing to contribute to the Akka master.

Open source is as much about contributing, as it is about using ..

Monday, July 20, 2009

Macros, Preprocessors and DSL development

Along with the recent trend of DSLs becoming more and more popular, we are also seeing a growing trend of programming languages adding preprocessing and macro based features as part of their machinery. Is this a mere coincidence or we are becoming more aware towards Guy Steele's words of wisdom that "a main goal in designing a language should be to plan for growth".

Compile time meta-programming has long been dominated between the 2 extremes of C pre-processors and Lisp macros. In the context of DSL implementation, I have been doing some reading on syntax extension features and meta-programming in various languages. Even I came across this thread in the core-ruby discussion group, where people have been talking about implementing Converge style macros in Ruby. Lisp and Dylan implement macros mainly on top of a language that's syntactically minimal. But nowadays, we are looking at syntax rich languages like Template Haskell and MetaOCaml that implement macros as part of the language.

Converge is, of course a very interesting experiment, where Tratt has implemented Template Haskell like macro capabilities on top of a Python like dynamically typed language. Converge macros are different from Lisp, in the sense that unlike Lisp, they implement macro calls as a special syntax, while macro definitions are regular functions. When the compiler encounters the special syntax in a macro call, it does relevant processing for the quasi-quotations and splice annotations and builds up the resultant AST, which it then merges with the main AST. Thus the AST structure is also abstracted from the user, unlike Ruby and Groovy that allows explicit manipulation of the abstract syntax tree by the user. For details of Converge compile time meta-programming have a look at the Converge site.

Some languages like Nemerle and MetaLua allow dynamic extension of the language grammar through macros. Like Lisp in both of them, macros are not first class citizens, but help implement syntactic extensions in their own unique ways.

So long Haskell has been doing lots of DSL development based on pure embedding using powerful features like monadic interpreters, lazy evaluation and higher order function composition. But macros add yet another level of expressivity in language syntax, not possible through embedding alone. Are we seeing a new and invigorated effort towards implementing syntactic extensions to programming languages ? And does this have any relation to the recent interest and advancements in DSL based development ?

Sunday, July 12, 2009

DSL Composition techniques in Scala

One of the benefits of being on Twitter is the real time access to the collective thought streams of many great minds of our industry. Some time back, Paul Snively pointed to this paper on Polymorphic Embedding of DSLs in Scala. It discusses many advanced Scala idioms that you can implement while designing embedded DSLs. I picked up a couple of cool techniques on DSL composition using the power of Scala type system, which I could use in one of my implementations.

A big challenge with DSLs is composability. DSLs are mostly used in silos these days to solve specific problems in one particular domain. But within a single domain there are situations when you need to compose multiple DSLs to design modular systems. Languages like Scala and Haskell offer powerful type systems to achieve modular construction of abstractions. Using this power, you can embed domain specific types within the rich type systems offered by these languages. This post describes a cool example of DSL composition using Scala's type system. The example is a very much stripped down version of a real life scenario that computes the payroll of employees. It's not the richness of DSL construction that's the focus of this post. If you want to get a feel of the power of Scala to design internal and external DSLs, have a look at my earlier blog posts on the subject. Here the main focus is composition and reusability - how features like dependent method types and abstract types help compose your language implementations in Scala.

Consider this simple language interface for salary processing of employees ..

trait SalaryProcessing {
  // abstract type
  type Salary

  // declared type synonym
  type Tax = (Int, Int)

  // abstract domain operations
  def basic: BigDecimal
  def allowances: BigDecimal
  def tax: Tax
  def net(s: String): Salary
}

Salary is an abstract type, while Tax is desfined as a synonym of a Tuple2 for the tax components applicable for an employee. In real life, the APIs will be more detailed and will possibly take employee ids or employee objects to get the actual data out of the repository. But, once again, let's not creep about the DSL itself right now.

Here's a sample implementation of the above interface ..

trait SalaryComputation extends SalaryProcessing {
  type Salary = BigDecimal

  def basic = //..
  def allowances = //..
  def tax = //..

  private def factor(s: String) = {
    //.. some implementation logic
    //.. depending upon the employee id
  }

  def net(s: String) = {
    val (t1, t2) = tax

    // some logic to compute the net pay for employee
    basic + allowances - (t1 + t2 * factor(s))
  }
}

object salary extends SalaryComputation

Here's an implementation from the point of view of computation of the salary of an employee. The abstract type Salary has been concretized to BigDecimal which indicates the absolute amount that an employee makes as his net pay. Cool .. we can have multiple such implementations for various types of employees and contractors in the organization.

Irrespective of the number of implementations that we may have, the accounting process needs to record all of them in their books, where they would like to have all separate components of the salary separately from one single API. For this, we need to define a separate implementation for the accounting department with a different concrete type definition for Salary that separates the net pay and the tax part. Scala's abstract types allow this type definition overriding much like values. But the trick is to design the Accounting abstraction in such a way that it can be composed with all definitions of Salary that individual implementations of SalaryProcessing define. This means that any reference to Salary in the implementation of Accounting needs to refer to the same definition that the composed language uses.

Here's the definition of the Accounting trait that embeds the semantics of the other language that it composes with ..

trait Accounting extends SalaryProcessing {
  // abstract value
  val semantics: SalaryProcessing

  // define type to use the same semantics as the composed DSL
  type Salary = (semantics.Salary, semantics.Tax)

  def basic = semantics.basic
  def allowances = semantics.allowances
  def tax = semantics.tax

  // the accounting department needs both net and tax info
  def net(s: String) = {
    (semantics.net(s), tax)
  }
}

and here's how Accounting composes with SalaryComputation ..

object accounting extends Accounting {
  val semantics = salary
}

Now let's define the main program that processes the payroll for all the employees ..

def pay(semantics: SalaryProcessing,
  employees: List[String]): List[semantics.Salary] = {
  import semantics._
  employees map(net _)
}

The pay method accepts the semantics to be used for processing and returns a dependent type, which depends on the semantics passed. This is an experimental feature in Scala and needs to be used with the -Xexperimental flag of the compiler. This is an example where we publish just the right amount of constraints that's required for the return type. Also note the semantics of the import statement in Scala that's being used here. Firstly it's scoped within the method body. And also it imports only the members of an object that enbales us to use DSLish syntax for the methods on semantics, without explicit qualification.

Here's how we use the composed DSLs with the pay method ..

val employees = List(...)

// only SalaryComputation
println(pay(salary, employees))

// SalaryComputation composed with Accounting
println(pay(accounting, employees))

Sunday, July 05, 2009

Patterns in Internal DSL implementations

I have been thinking recently that classifying DSLs as Internal and External is too broadbased considering the multitude of architectural patterns that we come across various implementations. I guess the more interesting implementations are within the internal DSL genre, starting from plain old fluent interfaces mostly popularized by Martin Fowler down to the very sophisticated polymorphic embedding that has recently been demonstrated in Scala.

I like to use the term embedded more than internal, since it makes explicit the fact that the DSL piggybacks the infrastructure of an existing language (aka the host language of the DSL). This is the commonality part of all embedded DSLs. But DSLs are nothing more than well-designed abstractions expressive enough for the specific domain of use. On top of this commonality, internal DSL implementations also exhibit systematic variations in form, feature and architecture. The purpose of this post is to identify some of the explicit and interesting patterns that we find amongst the embedded DSL implementations of today.

Plain Old Smart APIs, Fluent Interfaces

Enough has been documented on this dominant idiom mostly used in the Java and C# community. Here's one of my recent favorites ..

ConcurrentMap<Key, Graph> graphs = new MapMaker()
  .concurrencyLevel(32)
  .softKeys()
  .weakValues()
  .expiration(30, TimeUnit.MINUTES)
  .makeComputingMap(
     new Function<Key, Graph>() {
       public Graph apply(Key key) {
         return createExpensiveGraph(key);
       }
     });

My good friend Sergio Bossa has recently implemented a cute DSL based on smart builders for messaging in Actorom ..

on(topology).send(EXPECTED_MESSAGE)
  .withTimeout(1, TimeUnit.SECONDS)
  .to(address);

Actorom is a full Java based actor implementation. Looks very promising - go check it out ..

Carefully implemented fluent interfaces using the builder pattern can be semantically sound and order preserving as well. You cannot invoke the chain elements out of sequence and come up with an inconsistent construction for your object.

Code generation using runtime meta-programming

We are seeing a great surge in mindshare in runtime meta-programming with the increased popularity of languages like Groovy and Ruby. Both these languages implement meta-object protocols that allow developers to manipulate meta-objects at runtime through techniques of method synthesis, method interception and runtime evals of code strings.

Code generation using compile time meta-programming

I am not going to talk about C pre-processor macros here. They are considered abominations compared to what Common Lisp macros have been offering since the 1960s. C++ offers techniques like Expression Templates that have been used successfully to generate code during compilation phase. Libraries like Blitz++ have been developed using these techniques through creation of parse trees of array expressions that are used to generate customized kernels for numerical computations.

But Lisp is the real granddaddy of compile time meta-programming. Uniform representation of code and data, expressions yielding values, syntactic macros with quasiquoting have made extension of Lisp language possible through user defined meta objects. Unlike C, C++ and Java, what Lisp does is to make the parser of the language available to the macros. So when you write macros in Common Lisp or Clojure, you have the full power of the extensible language at your disposal. And since Lisp programs are nothing but list structures, the parser is also simple enough.

The bottom line is that you can have a small surface syntax for your DSL and rely on the language infrastructure for generating the appropriate code during the pre-compilation phase. That way the runtime does not contain any of the meta-objects to be manipulated, which gives you an edge over performance compared to the Ruby / Groovy option.

Explicit AST manipulation using the Interpreter Pattern

This is yet another option that we find being used for DSL implementation. The design follows the Interpreter pattern of GOF and uses the host language infrastructure for creating and manipulating the abstract syntax tree (AST). Groovy and Ruby have now developed this infrastructure and support code generation through AST manipulation. Come to think of it, this is really the Greenspunning of Lisp, where you can program in the AST itself and use the host language parser to manipulate it. While in other languages, the AST is far away from the CST (concrete syntax tree) and you need the heavy-lifting of scanners and parsers to get the AST out of the CST.

Purely Embedded typed DSLs

Unlike pre-processor based code generation, pure embedding of DSLs are implemented in the form of libraries. Paul Hudak demonstrated this with Haskell way back in 1998, when he used the techniques of monadic interpreters, partial evaluation and staged programming to implement purely embedded DSLs that can be evolved incrementally over time. Of course when we talk about typed abstractions, the flexibility depends on how advanced type system you have. Haskell has one and offers functional abstractions based on its type system as the basis of implementation. Amongst today's languages, Scala offers an advanced type system and unlike Haskell has the goodness of a solid OO implementation to go along with its functional power. This has helped implementing Polymorphically Embeddable DSLs, a significant improvement over the capabilities that Hudak demonstrated with Haskell. Using features like Scala traits, virtual types, higher order generics and family polymorphism, it is possible to have multiple implementations of a DSL on top of a single surface syntax. This looks very promising and can open up ideas for implementing domain specific optimizations and interesting variations to coexist on the same syntax of the DSL.

Are there any interesting patterns of internal DSL implementations that are being used today ?