Ruminations of a Programmer: xml

I have never been a big fan of using XML as a language for human interaction. I always felt that XML is too noisy for human comprehension and never a pleasing sight for your eyes. In an earlier post I had pitched in with the idea of using Lisp s-expressions as executable XML instead of the plethora of angle brackets polluting your comprehension power. XML is meant for machine interpretation, the hierarchical processing of XML documents gives you the raw power when you are programming for the machine. But, after all, Programs must be written for people to read, and only incidentally for machines to execute (from SICP).

With the emergence of the modern stream of scripting languages, XML is definitely taking a backseat in handling issues like software builds and configuration management. People are no longer amused with the XML hell of Maven and have been desperately looking for alternatives. Buildr, a drop-in replacement for Maven uses Ruby (inspired by Rake), SCons uses Python scripts, Groovy has also been used in the configuration space for Jini-based ComputeCycles project (via Artima). XML has also started looking like just yet another option for configuring your beans in the Spring community. We are just spoilt with other options of configuration - Spring JavaConfig uses Java configuration, Springy does it with JRuby, Groovy SpringBuilder does it with Groovy.

Configuration and builds of systems are nontrivial activities and need a mix of both declarative and procedural programming. Using XML as the brute force approach towards this has resulted in complex hierarchical structures, too complex for human comprehension. A perfect example in hand is Maven which attempts to do too many things with XML. The result is extremely complex XML hell much to the anguish of developers and build managers. A big build system configured using Maven becomes a maintenance nightmare in no time.

People are getting increasingly used to the comforts of DSLs and configurations of large systems have never been too trivial an artifact. Hence it is logical that we would like to have something which provides a cool humane interface. And XML is definitely not one of them ..

In a project that I have been working on for quite some time, the back office system receives XML messages from the front and middle office systems for processing. It is a securities trading and settlement system for one of the big financial houses of the world - typical messages are trades, settlements, position etc. which reach the back-office after the trade is made. Like any sane architect we have designed the system based on the Java EE stack (nowadays you never get fired for choosing Java ..) centered around a message oriented middleware transporting XML messages with gay abandon. The system has gone live for many implementations and has been delivering satisfactory throughput all over.

No complaints whatsoever, on the architecture, on the Java EE backbone, on the multitudes of XML machinery that goes behind the engineering harness of the millions of messages generated every day. If I were to architect the same system today (the existing one had been architected 3 years back), I, possibly would have gone for a similar stack, just for the sheer stability and robustness of the XML based technology and the plethora of toolset that XML offers today.

Why am I writing this blog then ?

Possibly I have been having extra caffeine of late, which has been taking away most of my sleep at night. It is 1 AM in the morning and I am still glued to two of my newest possessions in my bookshelf.

I had read some parts of SICP long back - rereading it now is like a rediscovery of many of the Aha! moments that I had last time and of course, lots of ruminations and discoveries this time as well. Based on the new found lights of Lispy (and s-expy) knowledge, I hope that some day I will be able to infuse my today's dreams and rumblings into a real life architecture. I am not sure if we will ever reach the stage of human evolution when Lisp and Scheme will be considered the bricks and mortar of enterprise architecture. Till then they will exist as the sexy counterparts of Java and C++, and will continue to allure all developers who have once committed the sin of being there and done that.

The XML Message - Today's Brick and Mortar

Here is how a sample trade message (simplified for clarity) looks in our system :

<?xml version="1.0" encoding="utf-8" standalone="no"?>
<!DOCTYPE trd01 SYSTEM "trd01.dtd">
<trd01>
  <id>10234</id>
  <trade_type>equity</trade_type>
  <trade_date>2005-02-21T18:57:39</trade_date>
  <instrument>IBM</instrument>
  <value>10000</value>
  <trade_ccy>usd</trade_ccy>
  <settlement_info>
    <settle_date>2005-02-21T18:57:39</settle_date>
    <settle_ccy>usd</settle_ccy>
  </settlement_info>
</trd01>

We use XML parsers to parse this message, use all sorts of XPath expressions, XQuery and XSLT transformations to do all processing, querying and tearing apart the hierarchical structures that embody an XML message. The above XML message is *data* and we have tonnes of Java code processing the XML data, transforming them into business logic and persisting them in the database. So, we have the *code* (in Java) strictly separated from the *data* (in XML) using a small toolbox comprising of a bundle of XML parsers, XPath expressions, XSLT transformations, all packaged in a couple of dozens of third party jars (aka frameworks). The entire exercise is meant to make these *data* executable.

Executable Data - aka Lisp

Why not use a power that allows you to execute your data directly instead of creating unnecessary abstraction barriers in the name of OOP ? Steve Yeggey summarises it with elan :

The whole nasty "configuration" problem becomes incredibly more convenient in the Lisp world. No more stanza files, apache-config, .properties files, XML configuration files, Makefiles — all those lame, crappy, half-language creatures that you wish were executable, or at least loaded directly into your program without specialized processing. I know, I know — everyone raves about the power of separating your code and your data. That's because they're using languages that simply can't do a good job of representing data as code. But it's what you really want, or all the creepy half-languages wouldn't all evolve towards being Turing-complete, would they?

In Lisp, I can make the above data much more readable, clearer in intent, easier for the eyes and at the same time make it executable ..

(trd01
  (id 10234)
  (trade_type "equity")
  (trade_date "2005-02-21T18:57:39")
  (instrument "IBM")
  (value 10000)
  (trade_ccy "usd")
  (settle_info
    (settle_date "2005-02-24T18:57:39")
    (settle_ccy "usd"))))

Lisp is a language which was intended to be small with *no* syntax, where you have the power of macros to create your own syntax and roll it out into the language. I made each of the above tags separate Scheme functions (Oh! I was using Scheme btw), each one of which is capable of transforming itself into the desired functionality. As a result, the above data is also my code and directly executes. Another of those Aha! moments.

But my entire system is based on Java! Surely you are not telling me to change all of the guts to Scheme - are you ? My job will be at stake and I will never be able to convince my pointy haired boss that the trading back office system is running on Lisp. In fact, many people who dare to use Lisp in daytime projects are often careful to keep this a secret in the industry. And unless you have PG on your company board or blessed enough to get the favor of Y, this is a very useful tip.

Enter SISC

SISC is a lightweight, platform independent Scheme system targetting the Java Virtual Machine. It comes as a lightweight distribution (the core jar is 233 KB) and offers Scheme as a scripting language for Java. In SISC bridging is accomplished by a Java API for executing Scheme code and evaluating Scheme expressions, and a module that provides Scheme-level access to Java objects and implementation of Java interfaces in Scheme.

I can write Scheme modules and load it using Java api from my Java code, once I have bootstrapped the SISC runtime. Here is how I can initialize SISC from my Java application so as to enable my application use the Scheme functions :

// bootstrapping the SISC runtime
SeekableInputStream heap = new MemoryRandomAccessInputStream(
    getClass().getResourceAsStream("/sisc.shp"));
AppContext ctx = new AppContext();
ctx.addHeap(heap);
Interpreter interpreter = Context.enter(ctx);

and then I can use on the fly evaluation of Scheme functions as follows :

interpreter.eval("(load \"trd01.scm\")");
String s = interpreter.eval("(trd01 (id 10234) (trade_type "equity") ...)").toString();

There are quite a few variants of eval() that SISC offers, along with multiple modes of execution of Scheme code from the Java environment. For details have a look at their documentation. SISC also offers calling Java code from Scheme accessing Java classes through the extensible type system of SISC.

I do not dream about using SISC in any production code in the near foreseeable future. But just wanted to share my rants with all of you. In today's world, it is really raining programming languages - all scripting languages like Ruby, Python, JRuby, Groovy etc. are making inroads as the preferred glue language of today's enterprise architecture. But Lisp stands out as a solid robust language, with the exceptionally powerful code-as-data paradigm - I always felt Lisp was way ahead of its time. Possibly this is the time when Lisp needs to be reincarnated, all incompatibilities should be rubbed off the numerous versions and dialects of Lisp. Lisp code is executable data - it makes perfect sense to replace all frameworks that execute reams of code to process hierarchical structures as data by a single language.

Ruminations of a Programmer

Thursday, May 10, 2007

XML - Not for Human Consumption

Monday, April 23, 2007

Executable XML (aka Lisp)