Sunday, June 21, 2009

scouchdb Views now interoperable with Scala Objects

In one of the mail exchanges that I had with Dick Wall before the scouchdb demonstration at JavaOne ScriptBowl, Dick asked me the following ..

"Can I return an actual car object instead of a string description? It would be killer if I can actually show some real car sale item objects coming back from the database instead of the string description."

Yes, Dick, you can, now. scouchdb now offers APIs for returning Scala objects directly from couchdb views. Here's an example with Dick's CarSaleItem object model ..

// CarSaleItem class
@BeanInfo
case class CarSaleItem(make : String, model : String, 
  price : BigDecimal, condition : String, color : String) {

  def this(make : String, model : String, 
    price : Int, condition : String, color : String) =
    this(make, model, BigDecimal.int2bigDecimal(price), condition, color)

  private [db] def this() = this(null, null, 0, null, null)

  override def toString = "A " + condition + " " + color + " " + 
    make + " " + model + " for $" + price
}


The following map function returns the car make as the key and the car price as the value ..

// map function
val redCarsPrice =
  """(doc: dispatch.json.JsValue) => {
        val (id, rev, car) = couch.json.JsBean.toBean(doc, 
          classOf[couch.db.CarSaleItem]);
        if (car.color.contains("Red")) List(List(car.make, car.price)) else Nil
  }"""


This is exciting. The following map function returns the car make as the key and the car object as the value ..

// map function
val redCars =
  """(doc: dispatch.json.JsValue) => {
        val (id, rev, car) = couch.json.JsBean.toBean(doc, 
          classOf[couch.db.CarSaleItem]);
        if (car.color.contains("Red")) List(List(car.make, car)) else Nil
  }"""


And now some regular view setup code that registers the views in the CouchDB design document.

// view definitions
val redCarsView = new View(redCars, null)
val redCarsPriceView = new View(redCarsPrice, null)

// handling design document stuff
val cv = DesignDocument("car_views", null, Map[String, View]())
cv.language = "scala"

val rcv = 
  DesignDocument(cv._id, null, 
    Map("red_cars" -> redCarsView, "red_cars_price" -> redCarsPriceView))
rcv.language = "scala"
couch(Doc(carDb, rcv._id) add rcv)


The following query returns JSON corresponding to the car objects being returned from the view ..

val ls1 = couch(carDb view(
  Views builder("car_views/red_cars") build))


On the client side, we can do a simple map over the collection that converts the returned collection into a collection of the specific class objects .. Here we have a collection of CarSaleItem objects ..

import dispatch.json.Js._;
val objs =
  ls1.map { car =>
    val x = Symbol("value") ? obj
    val x(x_) = car
    JsBean.toBean(x_, classOf[CarSaleItem])._3
  }
objs.size should equal(3)
objs.map(_.make).sort((e1, e2) => (e1 compareTo e2) < 0) 
  should equal(List("BMW", "Geo", "Honda"))


But it gets better than this .. we can now have direct Scala objects being fetched from the view query directly through scouchdb API ..

// ls1 is now a list of CarSaleItem objects
val ls1 = couch(carDb view(
  Views builder("car_views/red_cars") build, classOf[CarSaleItem]))
ls1.map(_.make).sort((e1, e2) => (e1 compareTo e2) < 0) 
  should equal(List("BMW", "Geo", "Honda"))


Note the class being passed as an additional parameter in the view API. Similar stuff is also being supported for views having reduce functions. This makes scouchdb more seamless for interoperability between JSON storage layer and object based application layer.

Have a look at the project home page and the associated test case for details ..

Thursday, June 18, 2009

Scala/Lift article available as a podcast

Myself and Steve Vinoski's article in IEEE Internet Computing (May/June issue) titled "Scala and Lift - Functional Recipes for the Web" is now available as a podcast. Here it goes ..

Thanks Steve, for the effort ..

Sunday, June 14, 2009

Code Reading for fun and profit

I still remember those days when APIs were not so well documented, and we didn't have the goodness that Javadocs bring us today. I was struggling to understand the APIs of the C++ Standard Library by going through the source code. Before that my only exposure to code reading was a big struggle to pile through reams of Cobol code that we were trying to migrate to the RDBMS based platforms. Code reading was not so enjoyable (at least to me) those days. Still I found it a more worthwhile exercise than trying to navigate through inconsistent pieces of crappy paperwork and half-assed diagrams that project managers passed on in the name of design documentation.

Exploratory Code Reading ..

C++ Standard library and Boost changed it all. C++ was considered to be macho enough those days, particularly if you can boast of your understandability of the template meta-programming that Andrei Alexandrescu first brought to the mainstream through his columns in C++ Report and his seemingly innocuously titled Modern C++ Design. Code reading became a pleasure to me, code understanding was more satisfying, particularly if you could reuse some of those code snippets in your own creations. It was the first taste of how dense C++ code could be, it was as if every sentence had some hidden idioms that you're trying to unravel. That was exploratory code reading - as if I was trying to explore the horizons of the language and its idioms as the experts documented with great care. I subscribed to the view that Code is the Design.

Collaborating with xUnit ..

Then came unit testing and the emergence of xUnit frameworks that proved to be the most complete determinants of the virtues of code reading. Code reading changed from being a passive learning vehicle to an active reification of thoughts. Just fire up your editor, load the unit testing framework and validate your understanding through testXXX() methods. It was then that I realized the wonders of code reading through collaboration with unit testing frameworks. It was as if you are doing pair programming with xUnit - together you and your xUnit framework are trying to understand the library that you're exploring. TDD was destined to be the next step, the only change being that instead of code understanding you're now into real world code writing.

Code Reading on the GO ..

Sometimes I enjoy reading code when I'm traveling or in a long commute. It's not painstaking, you do not have any specific agenda or you're not working against a strict timeline for the project. I found this habit very productive and in fact learnt quite a few tricks of the trade in some of these sessions. I still remember how I discovered the first instance of how to implement the Strategy pattern through Java enums browsing through Guice code in one of the flights to Portland.

Code Reading towards Polyglotism ..

When you're learning a new language, it helps a lot looking at existing programs in languages that you've been programming for long. And think how you could model it in the new language that you're learning. It's not a transliteration, often it results in knowing new idioms and lots of aha! moments as you explore through your learning process. This is one of the most invaluable side-effects of code reading - reading programs in language X makes you a better programmer in language Y. Stuart Halloway in his book on Clojure programming gives a couple of excellent examples of how thinking functionally while reading Java code makes you learn lots of idioms of the new paradigm.

Reading bad code ..

This is important too, since it makes you aware you of the anti-patterns of a language. It's a common misconception that using recursion in functional programs makes them more idiomatic. Recursion has its own problems, and explicit recursions are best hidden within the language offered combinators and libraries. Whenever you see explicit recursion in non trivial code snippets that can potentially get a large data set, think twice. You may be better off refactoring it some other way, particular when you have an underlying runtime that does not support tail call optimization. Code that do not read well, are not communicative to users. Code reading makes you aware of the importance of expressiveness, you realize that you'd not write code that you cannot read well.

Well, that was a drunken rant .. that I wrote as a side-effect in the midst of reading the Scala source for 2.8 Collections ..

Sunday, June 07, 2009

scouchdb Scala View Server gets "reduce"

scouchdb View Server gets reduce. After a fairly long hiatus, I finally got some time to do some hacking on scouchdb over the weekend. And this is what came out of a brief stint on Saturday evening ..

map was already supported in version 0.3. You could define map functions in Scala as ..

val mapfn = """(doc: dispatch.json.JsValue) => {
  val it = couch.json.JsBean.toBean(doc, classOf[couch.json.TestBeans.Item_1])._3;
  for (st <- it.prices)
    yield(List(it.item, st._2))
}"""


Now you can do reduce too ..

val redfn = """(key: List[(String, String)], values: List[dispatch.json.JsNumber], rereduce: Boolean) => {
  values.foldLeft(BigDecimal(0.00))
    ((s, f) => s + (match { case dispatch.json.JsNumber(n) => n }))
}"""


attach the map and reduce functions to a view ..

val view = new View(mapfn, redfn)


and finally fetch using the view query ..

val ls1 =
  couch(test view(
    Views.builder("big/big_lunch")
         .build))
ls1.size should equal(1)


reduce, by default returns only one row through a computation on the result set returned by map. The above query does not use grouping and returns 1 row as the result. You can also use view results grouping and return rows grouped by keys ..

val ls1 =
  couch(test view(
    Views.builder("big/big_lunch")
         .options(optionBuilder group(true) build) // with grouping
         .build))
ls1.size should equal(3)


For a more detailed discussion and examples have a look at the project home page documentation or browse through the test script ScalaViewServerSpec.

The current trunk is 0.3.1. The previous version has been tagged as 0.3 and available in tags folder.

Next up ..

  • JPA like collections of objects directly from scouchdb views

  • more capable reduce options (rereduce, collations etc.)

  • replication

  • advanced exception management with new dbDispatch


.. and lots of other features ..

Stay tuned!

Wednesday, June 03, 2009

scouchdb @ JavaOne

JavaOne script bowl was organized as a panel session to show off different scripting languages on the JVM. Tha languages considered were Jython, Groovy, Scala, JRuby and Clojure. As part of the Scala show, Dick Wall demonstrated scouchdb, the Scala driver for CouchDB. Cool .. and thanks Dick for choosing scouchdb ..

Alex Miller has more details here ..

Monday, June 01, 2009

Prototypal Inheritance in Javascript - Template Method meets Strategy

I have been reading some of the papers on Self, a programming environment that models computation exclusively in terms of objects. However unlike the classical object-oriented approach, Self is a classless language, where everything is an object. An object has slots - each slot has a name and a value. The slot name is always a String, while the value can be any other Self object. The slot can point to methods as well, consisting of code. A special designated slot points to the parent object in the hierarchy. Hence each object is consistently designed for extensibility through inheritance. But since we don't have class structures, everything is dynamic and runtime. Objects interact through messages - when an object receives a message, it looks up into its slot for a match. If the matching message is not found, the search continues up the chain through successive parent pointers, till the root is reached.

Prototype based languages offer a different way of implementing objects, and hence require a different thinking for structuring your programs. They make you think more in terms of messages that your objects will receive, and how the messages get propagated up the inheritance chain.

Javascript follows an almost identical architecture, where the hierarchies of objects are constructed through prototypes. This post is not about Self or, for that matter, about the Javascript language. Some time back I had blogged about how the Template Method design pattern gets subsumed into higher order functions and closures when implemented using functional programming languages.

In a class based language, template method pattern is implemented in terms of inheritance, which makes the structure of the pattern static and makes the derivatives of the hierarchy statically coupled to the base abstraction. Closures liberate the pattern structure from this compile time coupling and make it dynamic. But once we take off the class inheritance part and use higher order functions to plug in the variable parts of the algorithm, what we end up with closely matches the Strategy pattern. Have a look at James Iry's insightful comments in my earlier post.

James also hinted at another level of subsumption which is more interesting - the case of the two patterns implemented in a prototype based language like Javascript. Here is how it looks ..

// the template function at the base object
// defines the generic flow
// uses hooks to be plugged in by derived objects

var processor = {
  process: function() {
    this.doInit();
    this.doProcess();
    this.doEnd();
    return true;
  }
};


We construct another object that inherits from the base object. The function beget is the one that Douglas Crockford defines as a helper to create a new object using another object as the prototype.

if (typeof Object.beget !== 'function') {
  Object.beget = function(o) {
    var F = function() {};
    F.prototype = o;
    return new F();
  };
}

var my_processor  = Object.beget(processor);


The new object now implements the variable parts of the algorithm.

my_processor.doInit = function() {
  //..
};
my_processor.doProcess = function() {
  //..
};
my_processor.doEnd = function() {
  //..
};


and we invoke the function from the base object ..

my_processor.process();

If we need to define another specialization of the algorithm that only has to override a single variable part, we do it likewise by supplying the object my_processor as the prototype ..

var your_processor= Object.beget(my_processor);
your_processor.doEnd = function() {
  //.. another specialization
};

your_processor.process();


So what we get is a dynamic version of the Template Method pattern with no static coupling - thanks to prototypal inheritance of Javascript. Is this a Template Method pattern or a Strategy pattern ? Both get subsumed into the prototypal nature of the language.

Sunday, May 17, 2009

scouchdb gets View Server in Scala

CouchDB views are the real wings of the datastore that goes into every document and pulls out data exactly what you have asked for through your queries. The queries are different from the ones you do in an RDBMS using SQL - here you have all the state-of-the-art map/reduce being exercised through each of the cores that your server may have. One very good part of views in CouchDB is that the view server is a separate abstraction from the data store. Computation of views is delegated to an external server process that communicates with the main process over standard input/output using a simple line-based protocol. You can find more details about this protocol in the couchdb wiki.

The default implementation of the query server in CouchDB uses Javascript running via Mozilla SpiderMonkey. However, language aficionados always find a way to push their own favorite into any accessible option. People have developed query servers for Ruby, Php, Python and Common Lisp.

scouchdb gives one for Scala. You can now write map and reduce scripts for CouchDB views in Scala .. the reduce part is not yet ready. But the map functions actually do work in the repository. Here is a usual session using ScalaTest ..


// create some records in the store
couch(test doc Js("""{"item":"banana","prices":{"Fresh Mart":1.99,"Price Max":0.79,"Banana Montana":4.22}}"""))
couch(test doc Js("""{"item":"apple","prices":{"Fresh Mart":1.59,"Price Max":5.99,"Apples Express":0.79}}"""))
couch(test doc Js("""{"item":"orange","prices":{"Fresh Mart":1.99,"Price Max":3.19,"Citrus Circus":1.09}}"""))

// create a design document
val d = DesignDocument("power", null, Map[String, View]())
d.language = "scala"

// a sample map function in Scala
val mapfn1 = 
  """(doc: dispatch.json.JsValue) => {
    val it = couch.json.JsBean.toBean(doc, classOf[couch.json.TestBeans.Item_1])._3; 
    for (st <- it.prices)
      yield(List(it.item, st._2))
  }"""
    
// another map function
val mapfn2 = """(doc: dispatch.json.JsValue) => {
    import dispatch.json.Js._; 
    val x = Symbol("item") ? dispatch.json.Js.str;
    val x(x_) = doc; 
    val i = Symbol("_id") ? dispatch.json.Js.str;
    val i(i_) = doc;
    List(List(i_, x_)) ;
  }"""




Now the way the protocol works is that when the view functions are stored in the view server, CouchDB starts sending the documents one by one and every function gets invoked on every document. So once we create a design document and attach the view with the above map functions, the view server starts processing the documents based on the line based protocol with the main server. And if we invoke the views using scouchdb API as ..

couch(test view(
  Views builder("power/power_lunch") build))


and

couch(test view(
  Views builder("power/mega_lunch") build))


we get back the results based on the queries defined in the map functions. Have a look at the project home page for a complete description of the sample session that works with Scala view functions.

Setting up the View Server

The view server is an external program which will communicate with the CouchDB server. In order to set our scouchdb query server, here are the steps :

The common place to do custom settings for couchdb is local.ini. This can usually be found under /usr/local/etc/couchdb folder. There has been some changes in the configuration files since CouchDB 0.9 - check out the wiki for them. In my system, I set the view server path as follows in local.ini ..

[query_servers]
scala=$SCALA_HOME/bin/scala -classpath couch.db.VS "/tmp/vs.txt"

  • scala is the language of query server that needs to be registered with CouchDB. Once you start futon after registering scala as the language, you should be able to see "scala" registered as a view query language for writing map functions.

  • The classpath points to the jar where you deploy scouchdb.

  • couch.db.VS is the main program that interacts with the CouchDB server. Currently it takes as argument one file name where it sends all statements that it exchanges with the CouchDB server. If it is not supplied, all interactions are routed to the stderr.

  • another change that I needed to make was setting of the os_process_timeout value. The default is set to 5000 (5 seconds). I made the following changes in local.ini ..


[couchdb]
os_process_timeout=20000

Another thing that needs to be setup is an environment variable named CDB_VIEW_CLASSPATH. This should point to the classpath which needs to be passed to the Scala interpreter for executing the map/reduce functions.

You've been warned!

All the above stuff is very much development in progress and has been tested only to the limits of some unit test suites also recorded in the codebase. Use at your own risk, and please, please send feedbacks, patches, bug reports etc. in the project tracker.

Happy hacking!

P.S. Over the weekend I got a patch from Martin Kleppmann that adds the ability to store the type name of an object in the JSON blob when it is serialized (either as fully-qualified class name or as base name without the package component), and to automatically create a bean of the right type when that JSON blob is loaded from the database (without advance knowledge of what that type is going to be). Thanks Martin - I will have a look and integrate it in the trunk.

I have undertaken this as a side project and only get to work on it over the weekends. It is great to have contributory patches from the community that only goes on to enrich the framework. I need to work on the reduce part of the query server and then will launch into a major refactoring to incorporate 0.3 release of Nathan's dbDispatch. Nathan has made some fruitful changes on exceptions and response-code handling. I am itching to incorporate the goodness in scouchdb.

Monday, May 11, 2009

CouchDB and Scala - Updates on scouchdb

A couple of posts back, I introduced scouchdb, the Scala driver for CouchDB persistence. The primary goal of the framework is to offer non-intrusiveness in persistence, in the sense that the Scala objects can be absolutely oblivious to the underlying CouchDB existence. The last post discussed how Scala objects can be added, updated or deleted from CouchDB with the underlying JSON representation carefully veneered away from client APIs. Here is an example of the fetch API in scouchdb ..

val sh = couch(test by_id(s_id, classOf[Shop]))

The document is fetched as an instance of the Scala class Shop, which can then be manipulated using usual Scala machinery. The return type is a Tuple3, where the first two components are the id and revision that may be useful for doing future updates of the document, while sh._3 is the object retrieved from the data store. Returning tuples from a method is a typical Scala idiom that can give rise to some nice pattern matching code capsules ..

couch(test by_id(s_id, classOf[Shop])) match {
  case (id, rev, obj) =>
    //..
  //..
}


The last post also discussed the View APIs and the little builder syntax for View queries.

Over the weekend, scouchdb got some more features, hence a brief post introducing the new additions ..

Temporary Views

No frills, just shares the similar builder interface as ordinary views, with the addition of specifying the map and reduce functions. Here is the necessary spec for querying temporary views ..


describe("fetch from temporary views") {
  it("should fetch 3 rows with group option and 1 row without group option") {
    val mf = 
      """function(doc) {
           var store, price;
           if (doc.item && doc.prices) {
             for (store in doc.prices) {
               price = doc.prices[store];
               emit(doc.item, price);
             }
           }
         }"""
      
    val rf = 
      """function(key, values, rereduce) {
           return(sum(values))
         }"""
      
    // with grouping
    val aq = 
      Views.adhocBuilder(View(mf, rf))
           .options(optionBuilder group(true) build)
           .build
    val s = couch(
      test adhocView(aq))
    s.size should equal(3)
      
    // without grouping
    val aq_1 = 
      Views.adhocBuilder(View(mf, rf))
           .build
    val s_1 = couch(
      test adhocView(aq_1))
    s_1.size should equal(1)
  }
}




Attachment Handling

With each document, CouchDB allows attachments, much like emails. Along with creating a document, I can have a separate attachment associated with the document. However, when the document is retrieved, the attachment, by default is not fetched. It has to be fetched using a special URI. All these are now encapsulated in Scala APIs in scouchdb. Have a look at the following spec ..


describe("create a document and make an attachment") {
  val att = "The quick brown fox jumps over the lazy dog."
    
  val s = Shop("Sears", "refrigerator", 12500)
  val d = Doc(test, "sears")
  var ir:(String, String) = null
  var ii:(String, String) = null
    
  it("document creation should be successful") {
    couch(d add s)
    ir = couch(>%(Id._id, Id._rev))
    ir._1 should equal("sears")
  }
  it("query by id should fetch a row") {
    ii = couch(test by_id ir._1)
    ii._1 should equal("sears")
  }
  it("sticking an attachment should be successful") {
    couch(d attach("foo", "text/plain", att.getBytes, Some(ii._2)))
  }
  it("retrieving the attachment should equal to att") {
    val air = couch(>%(Id._id, Id._rev))
    air._1 should equal("sears")
    couch(d.getAttachment("foo") as_str) should equal(att)
  }
}




CouchDB also allows adding attachments to yet non-existing documents. Adding the attachment will create the document as well. scouchdb supports that as well. Have a look at the bdd specs in the test folder for details of the usage.

Bulk Documents

CouchDB has separate REST interfaces for handling editing of multiple documents at the same time. I can have multiple documents, some of which need to be added as new, some to be updated with specific revision information and some to be deleted from the existing database. And all these can be done using a single POST. scouchdb uses a small DSL for handling such requests. Here is how ..


describe("bulk updates of documents") {
  it("should create 3 documents with 1 post") {
    val cnt = couch(test all_docs).filter(_.startsWith("_design") == false).size 
      
    val s1 = Shop("cc", "refrigerator", 12500)
    val s2 = Shop("best buy", "macpro", 1500)
    val a1 = Address("Survey Park", "Kolkata", "700075")
    val a2 = Address("Salt Lake", "Kolkata", "700091")
      
    couch(test docs(List(s1, s2, a1, a2), false)).size should equal(4)
    couch(test all_docs).filter(_.startsWith("_design") == false).size should equal(cnt + 4)
  }
  it("should insert 2 new documents, update 1 existing document and delete 1 - all in 1 post") {
    val sz = couch(test all_docs).filter(_.startsWith("_design") == false).size
    val s = Shop("Shoppers Stop", "refrigerator", 12500)
    val d = Doc(test, "ss")
      
    val t = Address("Monroe Street", "Denver, CO", "987651")
    val ad = Doc(test, "add1")
      
    var ir:(String, String) = null
    var ir1:(String, String) = null
    
    couch(d add s)
    ir = couch(>%(Id._id, Id._rev))
    ir._1 should equal("ss")
      
    couch(ad add t)
    ir1 = couch(ad >%(Id._id, Id._rev))
    ir1._1 should equal("add1")
      
    val s1 = Shop("cc", "refrigerator", 12500)
    val s2 = Shop("best buy", "macpro", 1500)
    val a1 = Address("Survey Park", "Kolkata", "700075")
      
    val d1 = bulkBuilder(Some(s1)).id("a").build 
    val d2 = bulkBuilder(Some(s2)).id("b").build
    val d3 = bulkBuilder(Some(s)).id("ss").rev(ir._2).build
    val d4 = bulkBuilder(None).id("add1").rev(ir1._2).deleted(true).build

    couch(test bulkDocs(List(d1, d2, d3, d4), false)).size should equal(4)
    couch(test all_docs).filter(_.startsWith("_design") == false).size should equal(sz + 3)
  }
}




As can be found from the above, there are 2 levels of APIs for bulk updates. scouchdb already has an api for creating a document from a Scala object with auto id generation :

def doc[<: AnyRef](obj: T) = { //..

As an extension, I introduce the following which lets users add multiple new documents through a single API. Note here all of the documents will be added new ..

def docs(objs: List[<: AnyRef], allOrNothing: Boolean) = { //..

and the objects can be of any type, not necessarily the same. This is illustrated in the first of the 2 specs above.

But in case you need to use the full feature of bulk uploads and editing of multiple documents, I offer a builder based interface, which is illustrated in the second spec above. Here 2 new documents are added, 1 being updated and 1 deleted, all through one single API.

In case you are doing CouchDB and Scala stuff, give scouchdb a spin and post comments on your feedback. I am yet to write a meaningful application using scouchdb - any feedback will be immensely helpful.