Ruminations of a Programmer: lisp

Showing posts with label lisp. Show all posts

Monday, July 22, 2013

The Realm of Racket is an enjoyable read

There are many ways to write a programming language book. You can start introducing the syntax and semantics of the language in a naturally comprehensible sequence of complexity and usage. Or you can choose to introduce the various features of the language with real world examples using the standard librray that the language offers. IIRC Accelerated C++ by Andrew Koenig and Barbara Moo takes this route. I really loved this approach and enjoyed reading the book.

Of course Matthias Felleisen is known for a third way of teaching a language - the fun way. The Little Schemer and The Seasoned Schemer have introduced a novel way of learning a language. The Realm of Racket follows a similar style of teaching the latest descendant of Lisp, one game at a time. The implementation of every game introduces the idioms and language features with increasing degrees of complexity. There's a nice progression which helps understanding the complex features of the language building upon the already acquired knowledge of the simpler ones in earlier chapters.

The book begins with a history of the Racket programming language and how it evolved as a descendant of Scheme, how it makes programming fun and how it can be used successfully as an introductory language to students aspiring to learn programming. Then it starts with Getting Started with Dr Racket for the impatient programmer and explains the IDE that will serve as your companion for the entire duration of your playing around with the book.

Every chapter introduces some of the new language features and either develops a new game or builds on some improvement of a game developed earlier. This not only demonstrates a real world usage of the syntax and semantics of the language but makes the programmer aware of how the various features interact as a whole to build complex abstractions out of simpler ones. The book also takes every pain to defer the complexity of the features to the right point so that the reader is not burdened upfront. e.g. Lambdas are introduced only when the authors have introduced all basics of programming with functions and recursion. Mutants are introduced only after teaching the virtues of immutablity. For loops and comprehensions appear only when the book has introduced all list processing functions like folds, maps and filters. And then the book goes into great depth explaining why the language has so many variants of the for loop like for/list, for/fold, for*, for/first, for/last etc. In this entire discussion of list processing, for loops etc., I would love to see a more detailed discussion on sequence in the book. Sequence abstracts a large number of data types, but, much like Clojure it introduces a new way of API design - a single sequence to rule them all. API designers would surely like to have more of this sauce as part of their repertoire. Racket's uniform way of handling sequence is definitely a potent model of abstraction as compared to Scheme or other versions of Lisp.

The games developed progress in complexity and we can see the powers of the language being put to great use when the authors introduce lazy evaluation and memoized computations and use them to improve the Dice of Doom. Then the authors introduce distributed game development which is the final frontier that the book covers. It's truly an enjoyable ride through the entire series.

The concluding chapter talks about some of the advanced features like classes, objects and meta-programming. Any Lisp book will be incomplete without a discussion of macros and language development. But I think the authors have done well to defer these features till the end. Considering the fact that this is a book for beginning to learn the language this sounded like a sane measure.

However, as a programmer experienced in other languages and wanting to take a look at Racket, I would have loved to see some coverage on testing. Introducing a bit on testing practices, maybe a unit testing library would have made the book more complete.

The style of writing of this book has an underlying tone of humor and simplicity, which makes it a thoroughly enjoyable read. The use of illustrations and comics take away the monotony of learning the prosaics of a language. And the fact that Racket is a simple enough language makes this combination with pictures very refreshing.

On the whole, as a book introducing a language, The Realm of Racket is a fun read. I enjoyed reading it a lot and recommend it without reservations for your bookshelf.

Monday, July 20, 2009

Macros, Preprocessors and DSL development

Along with the recent trend of DSLs becoming more and more popular, we are also seeing a growing trend of programming languages adding preprocessing and macro based features as part of their machinery. Is this a mere coincidence or we are becoming more aware towards Guy Steele's words of wisdom that "a main goal in designing a language should be to plan for growth".

Compile time meta-programming has long been dominated between the 2 extremes of C pre-processors and Lisp macros. In the context of DSL implementation, I have been doing some reading on syntax extension features and meta-programming in various languages. Even I came across this thread in the core-ruby discussion group, where people have been talking about implementing Converge style macros in Ruby. Lisp and Dylan implement macros mainly on top of a language that's syntactically minimal. But nowadays, we are looking at syntax rich languages like Template Haskell and MetaOCaml that implement macros as part of the language.

Converge is, of course a very interesting experiment, where Tratt has implemented Template Haskell like macro capabilities on top of a Python like dynamically typed language. Converge macros are different from Lisp, in the sense that unlike Lisp, they implement macro calls as a special syntax, while macro definitions are regular functions. When the compiler encounters the special syntax in a macro call, it does relevant processing for the quasi-quotations and splice annotations and builds up the resultant AST, which it then merges with the main AST. Thus the AST structure is also abstracted from the user, unlike Ruby and Groovy that allows explicit manipulation of the abstract syntax tree by the user. For details of Converge compile time meta-programming have a look at the Converge site.

Some languages like Nemerle and MetaLua allow dynamic extension of the language grammar through macros. Like Lisp in both of them, macros are not first class citizens, but help implement syntactic extensions in their own unique ways.

So long Haskell has been doing lots of DSL development based on pure embedding using powerful features like monadic interpreters, lazy evaluation and higher order function composition. But macros add yet another level of expressivity in language syntax, not possible through embedding alone. Are we seeing a new and invigorated effort towards implementing syntactic extensions to programming languages ? And does this have any relation to the recent interest and advancements in DSL based development ?

Friday, October 03, 2008

Erlang VM : now hosting multiple languages

In an earlier post, I had wondered why the Erlang virtual machine does not host a more diversified set of languages ..

"BEAM provides an ecosystem that offers phenomenal scalability with concurrent processes and distributed programming, which is really difficult (if not impossible) to replicate in any other virtual machine being worked upon today. After all, it is much easier to dress up Erlang with a nicer syntax than implementing the guarantees of reliability and extreme concurrency in any of your favorite languages' runtime."

Then I had blogged about Reia, the Python/Ruby like scripting language on BEAM.

A few days back, Robert Virding released a stable version of LFE - Lisp Flavored Erlang, a concurrent Lisp based on the features and limitations of the Erlang VM. Unlike Lisp it doesn't have global data or mutating operations. Instead it has the goodness of Lisp macros, sexprs, code-as-data together with the Erlang power of pattern matching and binary comprehensions. And the best part is that LFE hosts seamlessly with vanilla Erlang/OTP.

Along with Erlang being used to develop middleware applications, we are seeing increased use of Erlang VM, hosting more and more language variants. This is a clear indication that the Erlang ecosystem is growing. As Ted Leung has rightly observed in his post on VMs for everybody, we are going to see not only flourishing new virtual machines, but also lots of languages atop existing virtual machines.

Real good time to be a hacker .. a pity though only a few lucky ones get paid for hacking ..

Friday, October 19, 2007

Clojure is here

I came across this posting Lisp on the JVM in reddit and thought .. what the heck ? What's so great about it when we have already ABCL, KAWA, SISC for the JVM ? In fact the title in reddit is a bit misleading - Clojure is very much like Lisp. It is targetted for the JVM, but more than anything else, the design embodies lots of thoughts towards immutability, functional data structures, concurrency, STM etc. Here is a comment from the author himself on reddit :

Clojure has some tangible, non-superficial differences from Common Lisp and Scheme. They yield something that is different, and might or might not be more suitable depending on your programming style and application domain.

Most of the core data structures are immutable. This is part of an overall design philosophy to make Clojure a good language for concurrent/multi-core programming.

Most of the data structures are extensible abstractions. This is different from Common Lisp where you can't extend the sequence functions to your own data structures, for instance. Even invocability is an abstraction - allowing maps to be functions of their keys and vice-versa.

Clojure extends code-as-data to maps and vectors in a deep way. They have literal reader syntax, the compiler understands them, backquote works with them, they support metadata etc. Because they are efficiently immutable and persistent, they support very Lisp-y recursive usage, shared structure etc, in ways Common Lisp's hash tables and vectors cannot.

Clojure embraces its host platform in ways that the standard Lisps ported to the JVM can't. For instance, Common Lisp's strings could never be Java Strings since the former are mutable and the latter are not. Clojure strings are Java Strings. The Clojure sequence library functions work over Clojure and Java data structures transparently. Etc.

Clojure has metadata as a core concept, not something one could retrofit onto the built-in Common Lisp types.

Clojure is designed for concurrency. Vars (similar to CL special variables) have explicit threading semantics. Clojure has a software transactional memory system. Etc.

In short, Clojure is (non-gratuitously) different. If you don't want different, you don't want Clojure. If you like Lisp and need to write multi-threaded programs for the JVM, you might find something interesting in Clojure.

I had blogged about SISC sometime back and discussed how we could use Scheme as a more consumer friendly XML in your Java application. I think Clojure is going to be the most interesting dynamic language on the JVM very soon. There has never been a better time to learn Lisp !

Monday, September 17, 2007

Code-as-Data, Encapsulation and the Lisp Dogma

Raganwald talks about code/data separation and encapsulation. Here he quotes Steve Yegge from one of his drunken rants, where the latter points out the virtues of using Lisp s-expressions as executable XML.

Just could not resist ruminating Douglas Hofstadter in his Metamagical Themas on the same subject in his essay Lisp: Recursion and Generality. He talks about Lisp as the medium that unifies the *inert* data (which he calls declarative knowledge) with *active* code (or procedural knowledge). He mentions ..

The main idea is that in Lisp, one has the ability to "elevate" an inert, information-containing data structure to the level of "animate agent", where it becomes a manipulator of inert structures itself. This program-data cycle, or loop, can continue on and on, with structures reaching out, twisting back, and indirectly modifying themselves or related structures.

Talking about this data-code duality he goes on ..

Moreover, Lisp's loop of program and data should remind biologists of the way that genes dictate the form of enzymes, and enzymes manipulate genes (among other things). Thus Lisp's procedural-declarative program-data loop provides a primitive, but very useful and tangible example of one of the most fundamental patterns at the base of life: the ability of passive structures to control their own destiny, by creating and regulating active structures whose form they dictate.

Steve Yegge has also expressed it succinctly in the same post (mentioned above) ..

But Lisp is directly executable, so you could simply make the tag names functions that automatically transform themselves. It'd be a lot easier than using XSLT, and less than a tenth the size.

When you have code-as-data and data-as-code, you have encapsulated the data structures in a form where they can transform themselves. While Lisp allows you to do this, is it too unnatural for a programming language that forces you to program in its native abstract syntax tree format?

Monday, September 10, 2007

Got Closures ? Have OO

In the classical object oriented model, an object encapsulates local state (instance variables) and contains a pointer to the shared procedures. These procedures are the methods, which operate on the encapsulated state, that forms the environment of the object. Each method can declare local variables as well as look up for additional state information from the shared environment based on lexical scoping rules. So we have the object as the combination of the environment and the set of methods that operate on the environment. Class based languages like Java and C++ provide another abstraction - the class, which instantiates objects by initializing the environment and setting up appropriate pointers to the shared procedures.

But what if my programming language is classless ? One of the best examples of this is the Javascript language. It is based on prototypes, without the class structure and supports higher order functions and lexical closures. How do I implement object-orientation in Javascript ? Javascript is a prototypal language without any built in support for classes. This post attempts to look into OO through a different looking glass, a quite unnatural source, a quite different language and a very much different programming paradigm. In the absence of first class support for the classical OO paradigm, this post looks at alternative means of implementing encapsulation and object-orientation while designing real world abstractions. The language used is Scheme, popularly referred to as a functional language, and the implementation addresses all issues faced by the other numerous classless languages mentioned above.

The basic theme of the following discussion is from SICP, a true classic of Computer Science in the domain of programming languages. In case you have any doubt regarding the credibility of SICP, please go through the first two customer reviews of this book in Amazon.

Object Orientation - the Scheme way !

Let us look at the following abstraction of a bank account in Scheme ..

(define make-account
  (lambda (account-no account-name balance)

    ;; accessors
    (define (get-account-no) account-no)
    (define (get-name) account-name)
    (define (get-balance) balance)

    ;; mutators
    (define (set-account-name! new-account-name)
      (set! account-name new-account-name)
      account-name)

    ;; deposit money into account
    (define (deposit! amount)
      (set! balance (+ balance amount))
      balance)

    ;; withdraw money
    (define (withdraw! amount)
      (if (>= balance amount)
          (begin (set! balance (- balance amount))
                 balance)
          "Insufficient funds"))

    ;; the dispatcher
    (define (self message)
      (case message
        ((no)         get-account-no)
        ((nm)         get-name)
        ((balance)    get-balance)
        ((set-nm!)    set_account-name!)
        ((withdraw!)  withdraw!)
        ((deposit!)   deposit!)
        (else (error "unknown selector" message))))
    self))

Every time we call make-account, we get back a dispatch function, every instance of which shares the same code, but operates on the different set of data supplied to it, which forms the execution environment.

;; creates an account a1
(define a1 (make-account 100 "abc" 0))

;; creates an account a2
(define a2 (make-account 200 "xyz" 0))

;; fetches the account-no of a1 -> 100
((a1 'no))

;; sets the account-name of a1 to "pqr"
((a1 'set-nm!) "pqr")

That is, we have two separate objects for the account abstraction and a bunch of shared methods to operate on each of them. A clean separation of code and data, a nice encapsulation of state. The arguments passed to the make-account invocation, account-no, account-name and balance are completely encapsulated within the abstraction and can only be accessed through the shared procedures.

The Secret Sauce

Lexical closures! In the above code snippet, all free variables within the methods are looked up from the environment of execution based on the lexical scoping principles of Scheme. This is exactly similar to the classical OO model that we discussed in the beginning. In the model with first class objects, we have

the environment formed by the instance variables, encapsulating local state per object and

the methods, which are shared across objects.

These two orchestrate the mechanism through which behaviors are implemented in abstractions. OTOH in languages like Scheme and Javascript, the corresponding roles are played by

the execution context, where the procedures look up for free variables and

the procedures themselves.

Hence closures play the same role in the Scheme based implementation that objects play in a classical one with Java or C++. Here is what wikipedia has to say :

a closure is a function that is evaluated in an environment containing one or more bound variables. When called, the function can access these variables. The explicit use of closures is associated with functional programming and with languages such as ML and Lisp. Constructs such as objects in other languages can also be modeled with closures.

What about Inheritance ?

We can implement inheritance in the above abstraction by using the delegation model. This is an object based implementation, similar to what we do in classless languages like Javascript. Simply incorporate the specialized behaviors in a new abstraction and delegate the common behaviors to the base abstraction. The following example implements a minimum-balance-checking-account, which has an additional restriction on the withdraw method in the form of a minimum balance check. It delegates all common behavior to the earlier abstraction make-account, while itself implements only the specialized functionality within the withdraw method.

(define make-min-balance-account
  (lambda (account-no account-name balance)
    (let ((account (make-account account-no account-name balance)))

      ;; implement only the specialized behavior
      ;; delegate others to the base abstraction
      (define (withdraw! amount)
        (let ((bal ((account 'balance))))
          (if (>= (- bal amount) 1000)
              (begin ((account 'withdraw!) amount)
                     ((account 'balance)))
              "Min balance check failed")))

      (define (self message)
        (case message
          ((withdraw!)    withdraw!)
          (else (account message))))
      self)))

But Scheme is a Functional Language

Programming without assignments is functional programming, where procedures can be viewed as computing mathematical functions, without any change in the local state. Scheme is a multi-paradigm language and the above implementation uses assignments to mutate the local states of the abstraction. The set! operation used in the above implementation helps us model local states of objects. However, it is absolutely possible to provide a completely functional-OO system implementing polymorphism using Scheme that does not have a single assignment or mutator operation. See here for a sample implementation.

Are Closures equivalent to Objects ?

This is a very interesting topic and has been extensively discussed in various forums in the theory of programming languages. While objects are often referred to as "poor man's closures", the converse has also been shown to be true. Instead of trying to mess around with this tension between opposites, let me point you to a fascinating note on this topic in one of the forums of discussion.

Monday, August 13, 2007

Collector Idiom in Scheme - Is this a Design Pattern ?

As a newbie into the world of functional programming (one of my new year resolutions for 2007), I started my venture to learn the skills of constructing recursive processes and manipulating recursive data structures with Friedman and Felleisen's The Little Schemer. I am now into Chapter 8, Lambda The Ultimate, sailing through The Ninth Commandment (Abstract Common Patterns with a new function) and into Page 137 of the text. And here I meet multirember&co(), destined to be my thought-process-companion for the last one week. It took me quite some cycles of stepping through the debugger of DrScheme to figure out the actual computation process going on inside the stacks of lambdas that the function defines.

Here is a copy of the function for those uninitiated to the amazing text ..

(define multirember&co
  (lambda (a lat col)
    (cond
      ((null? lat)
       (col (quote()) (quote())))
      ((eq? (car lat) a)
       (multirember&co a
                       (cdr lat)
                       (lambda (newlat seen)
                         (col newlat
                              (cons (car lat) seen)))))
      (else
       (multirember&co a
                       (cdr lat)
                       (lambda (newlat seen)
                         (col (cons (car lat) newlat)
                              seen)))))))

What does this function call (multirember&co a lat f) do ?

In the words of the authors ..

It looks at every atom of the lat to see whether it is eq? to a. Those atoms that are not are collected in one list ls1; the others for which the answer is true are collected in a second list ls2. Finally it determines the value of (f ls1 ls2).

The statement of work is quite straightforward. I tried implementing the same in another functional language, Erlang, and it did not take me too much time to come up with a supposedly meaningful Erlangy implementation ..

filter_atom(A, L, Col) -> filter_atom(A, L, [], [], Col).

filter_atom(A, [H|T], Has, NotHas, Col) ->
  case (H =:= A) of
    true    -> filter_atom(A, T, [H|Has], NotHas, Col);
    false   -> filter_atom(A, T, Has, [H|NotHas], Col)
  end;

filter_atom(A, [], Has, NotHas, Col) ->
  Col(Has, NotHas).

The Erlang implementation looks simple enough and does not have the accidental complexity that we find in the Scheme implementation.

Honest confession : I am also a newbie in Erlang - any suggestions for improvements towards a more Erlangy modeling is welcome!

Why does the Scheme implementation look so complex ?

Coming back to the Scheme implementation, I am not sure if this is the most idiomatic implementation of the solution. But it definitely is an ample demonstration of the power of closures (lambdas). The inner lambdas build the new collectors by getting stuff from the enclosing scope and constructs values that are handed over to the next collector in the stack. This is precisely what the Tenth Commandment preaches - Build functions to collect more than one value at a time.

Is this a Design Pattern ?

The idiom of using the Collector for building up values has been used in subsequent function implementations as well - refer to multiinsertLR&co() and evens-only*&co() in the same chapter of the book. Given a list, and some predicate, the Collector idiom uses continuation passing style (CPS) to create closures that successively collects multiple values from the list. Is there a design pattern lurking around ?

I am not qualified enough to comment on the virtues of the above Scheme implementation. One great thing about it is that the above implementation is tail recursive. I can come up with an accumulator based implementation, which is not tail recursive, but possibly has less of an accidental complexity.

What do the experts have to say about the collector based idiom ?

Monday, April 23, 2007

Executable XML (aka Lisp)

In a project that I have been working on for quite some time, the back office system receives XML messages from the front and middle office systems for processing. It is a securities trading and settlement system for one of the big financial houses of the world - typical messages are trades, settlements, position etc. which reach the back-office after the trade is made. Like any sane architect we have designed the system based on the Java EE stack (nowadays you never get fired for choosing Java ..) centered around a message oriented middleware transporting XML messages with gay abandon. The system has gone live for many implementations and has been delivering satisfactory throughput all over.

No complaints whatsoever, on the architecture, on the Java EE backbone, on the multitudes of XML machinery that goes behind the engineering harness of the millions of messages generated every day. If I were to architect the same system today (the existing one had been architected 3 years back), I, possibly would have gone for a similar stack, just for the sheer stability and robustness of the XML based technology and the plethora of toolset that XML offers today.

Why am I writing this blog then ?

Possibly I have been having extra caffeine of late, which has been taking away most of my sleep at night. It is 1 AM in the morning and I am still glued to two of my newest possessions in my bookshelf.

I had read some parts of SICP long back - rereading it now is like a rediscovery of many of the Aha! moments that I had last time and of course, lots of ruminations and discoveries this time as well. Based on the new found lights of Lispy (and s-expy) knowledge, I hope that some day I will be able to infuse my today's dreams and rumblings into a real life architecture. I am not sure if we will ever reach the stage of human evolution when Lisp and Scheme will be considered the bricks and mortar of enterprise architecture. Till then they will exist as the sexy counterparts of Java and C++, and will continue to allure all developers who have once committed the sin of being there and done that.

The XML Message - Today's Brick and Mortar

Here is how a sample trade message (simplified for clarity) looks in our system :

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE trd01 SYSTEM "trd01.dtd">
<trd01>
  <id>10234</id>
  <trade_type>equity</trade_type>
  <trade_date>2005-02-21T18:57:39</trade_date>
  <instrument>IBM</instrument>
  <value>10000</value>
  <trade_ccy>usd</trade_ccy>
  <settlement_info>
    <settle_date>2005-02-21T18:57:39</settle_date>
    <settle_ccy>usd</settle_ccy>
  </settlement_info>
</trd01>

We use XML parsers to parse this message, use all sorts of XPath expressions, XQuery and XSLT transformations to do all processing, querying and tearing apart the hierarchical structures that embody an XML message. The above XML message is *data* and we have tonnes of Java code processing the XML data, transforming them into business logic and persisting them in the database. So, we have the *code* (in Java) strictly separated from the *data* (in XML) using a small toolbox comprising of a bundle of XML parsers, XPath expressions, XSLT transformations, all packaged in a couple of dozens of third party jars (aka frameworks). The entire exercise is meant to make these *data* executable.

Executable Data - aka Lisp

Why not use a power that allows you to execute your data directly instead of creating unnecessary abstraction barriers in the name of OOP ? Steve Yeggey summarises it with elan :

The whole nasty "configuration" problem becomes incredibly more convenient in the Lisp world. No more stanza files, apache-config, .properties files, XML configuration files, Makefiles — all those lame, crappy, half-language creatures that you wish were executable, or at least loaded directly into your program without specialized processing. I know, I know — everyone raves about the power of separating your code and your data. That's because they're using languages that simply can't do a good job of representing data as code. But it's what you really want, or all the creepy half-languages wouldn't all evolve towards being Turing-complete, would they?

In Lisp, I can make the above data much more readable, clearer in intent, easier for the eyes and at the same time make it executable ..

(trd01
  (id 10234)
  (trade_type "equity")
  (trade_date "2005-02-21T18:57:39")
  (instrument "IBM")
  (value 10000)
  (trade_ccy "usd")
  (settle_info
    (settle_date "2005-02-24T18:57:39")
    (settle_ccy "usd"))))

Lisp is a language which was intended to be small with *no* syntax, where you have the power of macros to create your own syntax and roll it out into the language. I made each of the above tags separate Scheme functions (Oh! I was using Scheme btw), each one of which is capable of transforming itself into the desired functionality. As a result, the above data is also my code and directly executes. Another of those Aha! moments.

But my entire system is based on Java! Surely you are not telling me to change all of the guts to Scheme - are you ? My job will be at stake and I will never be able to convince my pointy haired boss that the trading back office system is running on Lisp. In fact, many people who dare to use Lisp in daytime projects are often careful to keep this a secret in the industry. And unless you have PG on your company board or blessed enough to get the favor of Y, this is a very useful tip.

Enter SISC

SISC is a lightweight, platform independent Scheme system targetting the Java Virtual Machine. It comes as a lightweight distribution (the core jar is 233 KB) and offers Scheme as a scripting language for Java. In SISC bridging is accomplished by a Java API for executing Scheme code and evaluating Scheme expressions, and a module that provides Scheme-level access to Java objects and implementation of Java interfaces in Scheme.

I can write Scheme modules and load it using Java api from my Java code, once I have bootstrapped the SISC runtime. Here is how I can initialize SISC from my Java application so as to enable my application use the Scheme functions :

// bootstrapping the SISC runtime
SeekableInputStream heap = new MemoryRandomAccessInputStream(
    getClass().getResourceAsStream("/sisc.shp"));
AppContext ctx = new AppContext();
ctx.addHeap(heap);
Interpreter interpreter = Context.enter(ctx);

and then I can use on the fly evaluation of Scheme functions as follows :

interpreter.eval("(load \"trd01.scm\")");
String s = interpreter.eval("(trd01 (id 10234) (trade_type "equity") ...)").toString();

There are quite a few variants of eval() that SISC offers, along with multiple modes of execution of Scheme code from the Java environment. For details have a look at their documentation. SISC also offers calling Java code from Scheme accessing Java classes through the extensible type system of SISC.

I do not dream about using SISC in any production code in the near foreseeable future. But just wanted to share my rants with all of you. In today's world, it is really raining programming languages - all scripting languages like Ruby, Python, JRuby, Groovy etc. are making inroads as the preferred glue language of today's enterprise architecture. But Lisp stands out as a solid robust language, with the exceptionally powerful code-as-data paradigm - I always felt Lisp was way ahead of its time. Possibly this is the time when Lisp needs to be reincarnated, all incompatibilities should be rubbed off the numerous versions and dialects of Lisp. Lisp code is executable data - it makes perfect sense to replace all frameworks that execute reams of code to process hierarchical structures as data by a single language.

Thursday, January 18, 2007

Syntax Extensibility, Ruby Metaprogramming and Lisp Macros

Over the last few days I have been feeling a bit Lispy. It's not that I have been immersed in Lisp programming, I still do Java for my day job and enjoy the process of staring at reams of standard object oriented api calls and the big gigantic frameworks that provide the glue code for the enterprise software. Java is still my favorite programming language, I still enjoy writing Java and have been recently working on bigger commitments to write more Java with more Spring and more Hibernate.

The only difference is that I have started reading Paul Graham's On Lisp again !

I am convinced that I will not be programming production level business applications in Lisp in the near foreseeable future. But reading Lisp makes me think differently, the moment I start writing event listeners in Java Swing, I start missing true lexical closures, I look forward to higher level functions in the language. Boilerplates irritate me much more and make me imagine how I could have modeled it better using Scheme macros. True, I have been using the best IDE and leave it to its code generation engine to generate all boilerplates, I have also put forth a bit of an MDA within my development environment that generates much of the codes from the model. I am a big fan of AOP and have been using aspects for quite some time to modularize my designs and generate write-behind logic through the magic of weaving bytecodes.

The difference, once again, is that, I have been exposed to the best code generator of all times, the one with simple uniform syntax having access to the whole language parser, that gets the piece of source code in a single uniform data structure and knows how to munch out the desired transformation in a fail-safe manner day in and day out - the Lisp macro.

Abstractions - Object Orientation versus Syntax Construction

For someone obsessed with OO paradigm, thriving on the backbones of objects, virtual functions and polymorphism, I have learnt to model abstractions in terms of objects and classes (the kingdom of nouns). I define classes on top of the Java language infrastructure, add data members as attributes, add behavior to the abstractions through methods defined within the classes that operate on the attributes and whenever need be, I invoke the methods on an instantiated class object. This is the way I have, so far, learnt to add abstraction to an application layer. Abstraction, as they say, is an artifact of the solution domain, which should ultimately bring you closer to the problem domain. We have :

Machine Language -> High Level language -> Abstractions in the Solution Domain -> Problem Domain

In case of object oriented languages like Java, the size of the language is monstrous, add to that at least a couple of gigantic frameworks, and abstractions are clear guests on top of the language layer. Lisp, in its original incarnation, was conceived as a language with very little syntax. It was designed as a programmable programming language, and developing abstractions in Lisp, not only enriches the third block above, but a significant part of the second block as well. I now get what Paul Graham has been talking about programming-bottom-up, the extensible language, build-the-language-up-toward-your-program.

Take this example :

I want to implement dolist(), which effects an operation on each member of a list. With a Lisp implementation, we can have a natural extension of the language through a macro

dolist (x '(1 2 3)) (print x) (if (evenp x) (return)))

and the moment we define the macro, it blends into the language syntax like a charm. This is abstraction through syntax construction.

And, the Java counterpart will be something like :

// ..
Collection<..> list = ... ;
CollectionUtils.dolist(list, 
    new Predicate() {
      public boolean evaluate() {
        // ..
      }
    });
// ..

which provides an object oriented abstraction of the same functionality. This solution provides the necessary abstraction, but is definitely not as seamless an extension of the language as its Lisp counterpart.

Extending Extensibility with Metaprogramming

Metaprogramming is the art of writing programs which write programs. Languages which offer syntax extensibility provide the normal paths to metaprogramming. And Java is a complete zero in this regard. C offers more trouble to programmers through its whacky macros, while C++'s template metaprogramming facilities are no less hazardous than pure black magic.

Ruby offers excellent metaprogramming facilities through its eval() family of methods, the here-docs, open classes, blocks and procedures. Ruby is a language with very clean syntax, having the natural elegance of Lisp and extremely powerful metaprogramming facilities. Ruby metaprogramming capabilities have given a new dimension to the concept of api design in applications. Have a look at this example from a sample Rails application :

class Product < ActiveRecord::Base
  validates_presence_of :title, :description, :image_url
  validates_format_of :image_url,
    :with => %r{^http:.+\.(gif|jpg|png)$}i,
    :message => "must be a URL for a GIF, JPG, or PNG image"
end

class LineItem < ActiveRecord::Base
  belongs_to :product
end

It's really cool DSL made possible through syntax extension capabilities offered by Ruby. It's not much of OO that Rails exploits to offer great api s, instead it's the ability of Ruby to define new syntactic constructs through first class symbols that add to the joy of programming.

How will the above LineItem definition look in Lisp's database bindings ? Let's take this hypothetical model :

(defmodel <line_item> ()
    (belongs_to <product>))

The difference with the above Rails definition is the use of macros in the Lisp version as opposed to class functions in Rails. In the Rails definition, belongs_to is a class function, which when called defines a bunch of member functions in the class LineItem. Note that this is a commonly used idiom in Ruby metaprogramming where we can define methods in the derived class right from the base class. But the main point here is that in the Lisp version, the macros are replaced in the macro expansion phase before the program runs and hence provides an obvious improvement in performance compared to its Rails counterpart.

Another great Lispy plus ..

Have a look at the following metaprogramming snippet in Ruby, incarnated using class_eval for generating the accessors in a sample bean class :

def self.property(*properties)
  properties.each do |prop|
    class_eval <<-EOS
      def #{prop} ()
        @#{prop}
      end
      def #{prop}= (val)
        @#{prop} = val
      end
    EOS
  end
end

Here the code which the metaprogram generates is embedded within Ruby here-docs as a string - eval ing on a string is not the recommended best practice in the Ruby world. These stringy codes are not treated as first class citizens, in the sense that IDEs do not respect them as code snippets and neither do the debuggers. This has been described in his usual style and detail by Steve Yeggey in this phenomenal blog post. Using define_method will make it IDE friendlier, but at the expense of readability and speed. The whacky class_eval runs much faster than the define_method version. A rough benchmark indicated that the class_eval version ran twice as fast on Ruby 1.8.5 than the one using define_method.

def self.property(*properties)
  properties.each do |prop|
    define_method(prop) {
      instance_variable_get("@#{prop}")
    }
            
    define_method("#{prop}=") do |value|
      instance_variable_set("@#{prop}", value)
    end
  end
end

Anyway, all these are examples of dynamic metaprogramming in Ruby since everything gets done at runtime. This is a big difference with Lisp, where the code templates are not typeless strings - they are treated as valid Lisp data structures, which the macro processor can process like normal Lisp code, since macros, in Lisp operates on the parse tree of the program. Thus code templates in Lisp are IDE friendly, debugger friendly and real first class code snippets. Many people have expressed their wish to have Lisp macros in Ruby - Ola Bini has some proposals on that as well. Whatever little I have been through Lisp, Lisp macros are really cool and a definite step forward towards providing succinct extensibility to the language through user defined syntactic control structures.

OO Abstractions or Syntax Extensions ?

Coming from an OO soaked background, I can only think in terms of OO abstractions. Ruby is, possibly the first language that has pointed me to situations when syntax extensions scale better than OO abstractions - Rails is a live killer example of this paradigm. And finally when I tried to explore the roots, the Lisp macros have really floored me with their succinctness and power. I do not have the courage to say that functional abstractions of Lisp and Ruby are more powerful than OO abstractions. Steve Yeggey has put it so subtly the natural inhibition of OO programmers towards extended syntactic constructs :

Lots of programmers, maybe even most of them, are so irrationally afraid of new syntax that they'd rather leaf through hundreds of pages of similar-looking object-oriented calls than accept one new syntactic construct.

My personal take will be to exploit all features the language has to offer. With a language like Ruby or Scala or Lisp, syntax extensibility is the natural model. While Java offers powerful OO abstractions - look at the natural difference of paradigms in modeling a Ruby on Rails application and a Spring-Hibernate application. This is one of the great eye-openers that the new dynamic languages have brought to the forefront of OO programmers - beautiful abstractions are no longer a monopoly of OO languages. Lisp tried to force this realization long back, but possibly the world was not ready for it.

Monday, January 08, 2007

Why I should learn Lisp

At the beginning of 2006, I had promised myself that I will learn Ruby and the tricks of the trade of functional programming. I do Java for a day job and get paid for consulting on enterprise Java architectures. I like Java, I love the Java community, I am a big fan of some of the great cool Java frameworks that are out there. I used to do C++ as well five years back and took great pride in designing allocators and smart pointers. All these were part of the application codebase, and despite using productive libraries like Boost, infrastructure code management (aka memory management and memory leaks) took away most of my night's sleep, at the expense of the geek feeling that I am doing C++. Java was the language that took away from me the pride of writing destructors and allocators. But in course of this sense of loss, I realized that I was now programming at a higher level of abstraction with the entire memory management left to the programming runtime. I was deep into encapsulation and better object orientation and embraced each successive release of Java with great enthusiasm.

One day, after reading a few pages of the pickaxe book and doing some hunting on the internet for Ruby evangelists, I came up with the following piece of Ruby code as the implementation of the Strategy Design Pattern :

class Filter
  def filter(values)
    new_list = []
    values.each { |v| filter_strategy(v, new_list) }
    new_list
  end
end

class EvenFilter < Filter
  def even?(i)
    i%2 == 0
  end
    
  def filter_strategy(value, list)
    if even?(value) 
      list << value 
    end
  end
end

of = EvenFilter.new
array = [1,2,3,4,5]
puts of.filter(array)

On further introspection, more reading of the pickaxe book and more rummaging through the musings of Java bashers in LtU, the light of lambda dawned on me. Looked like I was going through the enlightenment of the functional programming paradigms, the enhanced expressivity and abstraction that higher order procedures add to the programs. I could appreciate the value of lexical closures, bottom-up programming and functional abstractions. The new class for Strategy implementation is adapted from Nathan Murray's excellent presentation on Higher Order Procedures in Ruby :

class FilterNew
  def filter(strategy)
    lambda do |list|
      new_list = []
      list.each do |element|
        new_list << element if strategy.call(element)
      end
      new_list
    end
  end
end

of = FilterNew.new
filter_odds = of.filter( lambda{|i| i % 2 != 0} )
array = [1,2,3,4,5]
puts filter_odds.call(array)

The Disappearing Strategy Classes

In the new implementation, where is the strategy class that is supposed to be hooked polymorphically in the context and provide the flexible OO implementation ?

It has disappeared into the powerful abstraction of the language. The method filter() in the second example does not return the newly created list, unlike the first one - it returns a procedure, which can act on other sets of data. The second example is an implementation at a much higher level of abstraction, which adds to the expressivity of the intended functionality.

In fact with functional programming paradigms, many of the design patterns which GOF have carefully listed in the celebrated book on Design Patterns, simply go away in a language that allows user to program at a higher level of abstraction. Have a look at this excellent presentation by Norvig.

As Paul rightly mentions in his post, the paradigms of functional programming hides a lot of accidental complexity mainly because of the following traits, which the language offers :

Higher level of abstraction, which leads to lesser LOC, and hence reduced number of bugs

Side-effect free pure functional code, which liberates the programmer from managing state and sequence of execution

Improved concurrency and scalability because of the stateless and side-effect-free programming model

Ruby or Lisp ?

People look upon Lisp as the language of the Gods, someone has mentioned Ruby as an acceptable Lisp, many others consider Ruby as lying midway between Java and Lisp. Ruby is an object-oriented language with functional programming capabilities, while Lisp came into being in 1958 with the landmark 'eval' function of John McCarthy. As Paul Graham says :

With macros, closures, and run-time typing, Lisp transcends object-oriented programming.

Lisp and Smalltalk have been the main inspirations to Matz behind designing the Ruby language. May be Ruby is more pragmatic than Lisp, but the roots of Ruby are definitely engrained within the concepts of pure macros, lexical closures and extensibility mechanisms that Lisp provides. Lisp is the true embodiment of "code-as-data" paradigm. Lispers claim that Lisp (or any of its dialects) is definitely more expressive than Ruby, Lisp macros can extend the language more seamlessly than Ruby blocks. I am not qualified enough to comment on this. But my only observation is that behind the nice Lispy DSL that Rails provide, its implementation looks really clumsy and possibly would have been much more cleaner in Lisp.

Not only Ruby, functional programming constructs are beginning to make their appearence in modern day OO languages as well. C# and Visual Basic already offer lambdas and comprehensions, Java will have closures in the next release - the Lisp style is promising to come back.

Still I do not think Lisp is going to make mainstream, yet I need to learn Lisp to be a better fit in today's world of changing programming paradigms.

Monday, October 23, 2006

Why OOP Alone in Java is Not Enough

Object-oriented languages have taught us to think in terms of objects (or nouns) and Java is yet another example of the incarnation of the noun land. When was the last time you saw an elegant piece of Swing code ? Steve Yegge is merciless when he rants about it .. and rightly so ..

Building UIs in Swing is this huge, festering gob of object instantiations and method calls. It's OOP at its absolute worst.

There are ways of making OOP smart, we have been talking about fluent interfaces, OO design patterns, AOP and higher level of abstractions similar to those of DSLs. But the real word is *productivity* and the language needs to make your user elegantly productive. Unfortunately in Java, we often find people generating reams of boilerplates (aka getters and setters) that look like pureplay copy-paste stuff. Java abstractions thrive on the evil loop of the 3 C's create-construct-call along with liberal litterings of getters and setters. You create a class, declare 5 read-write attributes and you have a pageful of code before you start throwing in a single piece of actual functionality. Object orientation procrastinates public attributes, restricts visibility of implementation details, but never prevents the language from providing elegant constructs to handle boilerplates. Ruby does this, and does it with elan.

Java is not Orthogonal

Paul Graham in On Lisp defines orthogonality of a language as follows :

An orthogonal language is one inwhich you can express a lot by combining a small number of operators in a lot of different ways.

He goes on to explain how the complement function in Lisp has got rid of half of the *if_not* funtions from the pairs like [remove-if, remove-if-not], [subst-if, subst-if-not] etc. Similarly in Ruby we can have the following orthogonal usage of the "*" operator across data types :

"Seconds/day: #{24*60*60}" will give Seconds/day: 86400
"#{'Ho! '*3}Merry Christmas!" will give Ho! Ho! Ho! Merry Christmas!

C++ supports operator overloading, which is also a minimalistic way to extend your operator usage.

In order to bring some amount of orthogonality in Java we have lots of frameworks and libraries. This is yet another problem of dealing with an impoverished language - you have a proliferation of libraries and frameworks which add unnecessary layers in your codebase and tend to collapse under their weight.

Consider the following code in Java to find a matching sub-collection based on a predicate :

class Song {
  private String name;
  ...
  ...
}

// ...
// ...
Collection<Song> songs = new ArrayList<Song>();
// ...
// populate songs
// ...
String title = ...;
Collection<Song> sub = new ArrayList<Song>();
for(Song song : songs) {
  if (song.getName().equals(title)) {
    sub.add(song);
  }
}

The Jakarta Commons Collections framework adds orthogonality by defining abstractions like Predicate, Closure, Transformer etc., along with lots of helper methods like find(), forAllDo(), select() that operate on them, which helps user do away with boilerplate iterators and for-loops. For the above example, the equivalent one will be :

Collection sub = CollectionUtils.transformedCollection(songs, 
    TransformerUtils.invokerTransformer("getName"));
CollectionUtils.select(sub,
  PredicateUtils.equalPredicate(title));

Yuck !! We have got rid of the for-loop, but at the expense of ugly ugly syntax, loads of statics and type-unsafety, for which we take pride in Java. Of course, in Ruby we can do this with much more elegance and lesser code :

@songs.find {|song| title == song.name }

and this same syntax and structure will work for all sorts of collections and arrays which can be iterated. This is orthogonality.

Another classic example of non-orthogonality in Java is the treatment of arrays as compared to other collections. You can initialize an array as :

String[] animals = new String[] {"elephant", "tiger", "cat", "dog"};

while for Collections you have to fall back to the ugliness of explicit method calls :

Collection<String> animals = new ArrayList<String>();
animals.add("elephant");
animals.add("tiger");
animals.add("cat");
animals.add("dog");

Besides arrays have always been a second class citizen in the Java OO land - they support covariant subtyping (which is unsafe, hence all runtime checks have to be done), cannot be subclassed and are not extensible unlike other collection classes. A classic example of non-orthogonality.

Initialization syntax ugliness and lack of literals syntax support has been one of the major failings of Java - Steve Yegge has documented it right to its last bit.

Java and Extensibility

Being an OO language, Java supports extension of classes through inheritance. But once you define a class, there is no scope of extensibility at runtime - you cannot define additional methods or properties. AOP has been in style, of late, and has proved quite effective as an extension tool for Java abstractions. But, once again it is NOT part of the language and hence does not go to enrich the Java language semantics. There is no meta-programming support which can make Java friendlier for DSL adoption. Look at this excellent example from this recent blogpost :

Creating some test data for building a tree, the Java way :

Tree a = new Tree("a");

Tree b = new Tree("b");
Tree c = new Tree("c");
a.addChild(b);
a.addChild(c);

Tree d = new Tree("d");
Tree e = new Tree("e");
b.addChild(d);
b.addchild(e);

Tree f = new Tree("f");
Tree g = new Tree("g");
Tree h = new Tree("h");
c.addChild(f);
c.addChild(g);
c.addChild(h);

and the Ruby way :

tree = a {
      b { d e }
      c { f g h }
    }

It is really this simple - of course you have the meta-programming engine backing you for creating this DSL. What this implies is that, with Ruby you can extend the language to define your own DSL and make it usable for your specific problem at hand.

Java Needs More Syntactic Sugars

Any Turing complete programming language has the ability to allow programmers implement similar functionalities. Java is a Turing complete language, but still does not boost enough programmer's productivity. Brevity of the language is an important feature and modern day languages like Ruby and Scala offer a lot in that respect. Syntactic sugars are just as important in making programmers feel concise about the implementation. Over the last year or so, we have seen lots of syntactic sugars being added to C# in the forms of Anomymous Methods, Lambdas, Expression Trees and Extension Methods. I think Java is lagging behind a lot in this respect. The smart for-loop is an example in the right direction. But Sun will do the Java community a world of good in offering other syntactic sugars like automatic accessors, closures and lambdas.

Proliferation of Libraries

In order to combat Java's shortcomings at complexity management, over the last five years or so, we have seen the proliferation of a large number of libraries and frameworks, that claim to improve programmer's productivity. I gave an example above, which proves that there is no substitute for language elegance. These so called productivity enhancing tools are added layers on top of the language core and have been mostly delivered as generic ones which solve generic problems. There you are .. a definite case of Frameworkitis. Boy, I need to solve this particular problem - why should I incur the overhead of all the generic implementations. Think DSL, my language should allow me to carve out a domain specific solution using a domain specific language. This is where Paul Graham positions Lisp as a programmable programming language. I am not telling all Java libraries are crap, believe me, some of them really rocks, java.util.concurrent is one of the most significant value additions to Java ever and AOP is the closest approximation to meta-programming in Java. Still I feel many of them would not have been there, had Java been more extensible.

Is it Really Static Typing ?

I have been thinking really hard about this issue of lack of programmer productivity with Java - is static typing the main issue ? Or the lack of meta-programming features and the ability that languages like Ruby and Lisp offer to treat code and data interchangeably. I think it is a combination of both the features - besides Java does not support first class functions, it doesn't have Closures as yet and does not have some of the other productivity tools like parallel assignment, multiple return values, user-defined operators, continuations etc. that make a programmer happy. Look at Scala today - it definitely has all of them, and also supports static typing as well.

In one of the enterprise Java projects that we are executing, the Maven repository has reams of third party jars (mostly open source) that claim to do a better job of complexity management. I know Ruby is not enterprise ready, Lisp never claimed to deliver performance in a typical enterprise business application, Java does the best under the current circumstances. And the strongest point of Java is the JVM, possibly the best under the Sun. Initiatives like Rhino integration, JRuby and Jython are definitely in the right direction - we all would love to see the JVM evolving as a friendly nest of the dynamic languages. The other day, I was listening to the Gilad Bracha session on "Dynamically Typed Languages on the Java Platform" delivered in Lang .NET 2006. He discussed about invokedynamic bytecode and hotswapping to be implemented in the near future on the JVM. Possibly this is the future of the Java computing platform - it's the JVM that holds more promise for the future than the core Java programming language.