Ruminations of a Programmer

Saturday, November 17, 2007

Learning Functional Programming

comp.lang.functional recently carried an interesting thread on learning functional programming. Of course it has the usual recommendations in favor (and against as well) of learning Common Lisp, Scheme and Haskell as a starter. However, with Microsoft productizing F#, a variant of OCamL on .NET, the ML family has also been a strong contender these days. From the thread, talking of OCamL ..

It's generally regarded the C++ of the functional world: fast, practical, somewhat ugly. A lot less bloated, of course. It also boasts the OO paradigm (hence, the name) and direct imperative constructs where you see fit (rather than Haskell's insistence on purity via monads)...

and

OCaml is, above all a practical and pragmatic language. Companies like Jane St Capital (see the article in issue 7 of "The Monad Reader") and XenSource among others are using Ocaml for real world problems.

OCaml encourages you to write pure static single assignment functional code. However, when that is not enough, you can drop back to imperative coding and uses references (ie mutable variables). It even offers OO for situation when OO is needed ..

F# has certainly been been regarded as one of the better modern functional programming languages. Jon Harrop, of OCaml fame, has high regards for it, and he has been working on "porting" his OCaml book in F# ..

As a functional programming language, F# is among the best. It provides modern features like an expressive static type system, pattern matching and OOP as well as being practical. The F# language is also being pushing into the mainstream by Microsoft, who recently announced its productization.

I think the biggest plus in favor of F# is the committment of Microsoft to push it as a mainstream functional language on the .NET platform. Integrating F# as a fully supported language in Visual Studio will definitely add to the momentum in the language being used in mainstream enterprise applications.

Is the functional programming paradigm looking stronger than ever in being pushed to the mainstream development horizon ?

Friday, November 09, 2007

Infinite Streams using Java Closures

Neal Gafter's prototype of the closures implementation in Java has given us enough playground to fool around with. Of late, I have been dabbling around with a couple of idioms in functional programming trying to implement it in Java. Many of them have already been tried using functors, anonymous inner classes and the likes. Many of them work too, but at the cost of high accidental complexity. The following is an attempt to get a clear implementation of infinite streams in Java.

Infinite streams give you the illusion that it can contain infinite number of objects. The real kludge behind infinite streams is lazy evaluation. SICP introduces the term delayed evaluation, which enables us to represent very large (even infinite) sequences as streams. Functional languages like Haskell and Miranda employs laziness as the default paradigm of evaluation, while languages like Scheme implement the same concepts as library functions (delay and force).

Dominik Gruntz implements infinite streams in Java using the functor paradigm and inner classes. The obvious problem is verbosity resulting from the accidental complexity that they lend to the implementation. In this post, I attempt the same using Neal's closures prototype. So, without further ado ..

The Stream Interface

Here's the contract for lazy evaluation ..

class StreamTest {
  interface Stream<E> {
    E car();
    Stream<E> cdr();
    E get(int index);
    <R> Stream<R> map(Unary<? super E, R> f);
    Stream<E> filter(Unary<? super E, Boolean> f);
  }
  //..
}

and a lazy implementation using Java closures ..

class StreamTest {
  interface Stream<E> {
    //.. as above
  }

  static class LazyStream<E> implements Stream<E> {
    private E car;
    private {=>Stream<E>} cdr;

    // constructor
    public LazyStream(E car, {=>Stream<E>} cdr) {
      this.car = car;
      this.cdr = cdr;
    }

    // accessors
    public E car() { return car; }
    public Stream<E> cdr() { return cdr.invoke(); }

    // access at position
    public E get(int index) {
      Stream<E> stream = this;
      while (index-- > 0) {
      stream = stream.cdr();
    }
      return stream.car();
    }

    // map over the stream
    public <R> Stream<R> map(Unary<? super E, R> f) {
      return cons(f.invoke(car), {=>cdr().map(f)});
    }

    // filter the stream
    public Stream<E> filter(Unary<? super E, Boolean> f) {
      if (car() != null) {
        if (f.invoke(car()) == true) {
          return cons(car(), {=>cdr().filter(f)});
        } else {
          return cdr().filter(f);
        }
      }
      return null;
    }

    // factory method cons
    public static <E> LazyStream<E> cons(E val, {=>Stream<E>} c) {
      return new LazyStream<E>(val, c);
    }
  }
}

A couple of points bugging ..

I had to make up the Unary class since the closure did not allow me to specify ? super E in the left hand side. Ricky has clarified with Neal that this is due to the fact that things on the left hand side of a closure automatically have ? super in their types. Hence a little noise ..

static class Unary<T,R> {
  private {T=>R} u;

  public Unary({T=>R} u) {
    this.u = u;
  }

  public R invoke(T arg) {
    return u.invoke(arg);
  }
}

and now some tests ..

class StreamTest {
  //.. all above stuff
  //.. and the tests

  // helper function generating sequence of natural numbers
  static LazyStream<Integer> integersFrom(final int start) {
    return LazyStream.cons(start, {=>integersFrom(start+1)});
  }

  // helper function for generating fibonacci sequence
  static LazyStream<Integer> fibonacci(final int a, final int b) {
    return LazyStream.cons(a, {=>fibonacci(b, a+b)});
  }

  public static void main(String[] args) {

    // natural numbers
    Stream<Integer> integers = integersFrom(0);
    Stream<Integer> s = integers;
    for(int i=0; i<20; i++) {
      System.out.print(s.car() + " ");
      s = s.cdr();
    }
    System.out.println("...");

    // a map example over the stream
    Stream<Integer> t = integers;
    Stream<Integer> u = t.map(new Unary<Integer, Integer>({Integer i=>i*i}));
    for(int i=0; i<20; i++) {
      System.out.print(u.car() + " ");
      u = u.cdr();
    }
    System.out.println("...");

    // a filter over stream
    Stream<Integer> x = integers;
    Stream<Integer> y = x.filter(new Unary<Integer, Boolean>({Integer i=>i%2==0}));
    for(int i=0; i<20; i++) {
      System.out.print(y.car() + " ");
      y = y.cdr();
    }
    System.out.println("...");
  }
}

Closures in Java will surely bring in a new paradigm of programming within the developers. The amount of excitement that the prototype has already generated is phenomenal. It'll be too bad if they do not appear in Java 7.

Update: Ricky Clarkson points out in the Comments that {? super E=>? extends R} is the same as {E=>R}. The covariance / contravariance stuff just blew off my mind when I compiled the post. Hence the Unary class is not required. Just remove the class and substitute the closure in map() and filter(). e.g.

public <R> Stream<R> map({E=>R} f) {
  return cons(f.invoke(car), {=>cdr().map(f)});
}

Monday, November 05, 2007

Towards more Powerful Abstractions in the Javaland

Way back in 2000, Scott Myers suggested improving encapsulation through increased usage of non-member, non-friend functions in C++. He had this punchline as the starting point of his column :

If you're writing a function that can be implemented as either a member or as a non-friend non-member, you should prefer to implement it as a non-member function. That decision increases class encapsulation. When you think encapsulation, you should think non-member functions.

While discussing the degrees of encapsulation, he explains that the amount of encapsulation in a class is inversely proportional to the number of functions that may break if the implementation of the class has to change. And that being the case, it becomes clear that a class with n member functions is more encapsulated than a class with n+1 member functions.

I was then an avid C++ programmer and this article hit me like a bolt. This was possibly the first instance of an exposition (that I found) on OO that makes you think away from the kingdom of nouns. Keep the number of verbs hanging off the nouns to a minimum, use the C++ namespace as the module of encapsulation, so as to keep the class interfaces complete and minimal.

Shortly afterwards, Herb Sutter, in one his Guru-of-the-Week columns, Monoliths "Unstrung", dissected std::basic_string to identify the set of functions which should have been implemented as member functions instead of the current count of 103.

Very recently, Reg Braithwaite and Buko Obele discussed the inappropriateness of applying the noun/verb metaphor to object oriented programming. It is not correct to model all verbs hanging off the nouns - Obele goes on to say ..

The question of who verbs 'belong to' is always the wrong question; instead it is always a matter of how a new concept can be meaningfully introduced into our existing system.

In fact, long back, the design of the C++ Standard library was based on the concept of making the algorithms (the verbs) the first class citizens implemented generically on the containers that they operate on.

It's not the objects carrying the functions that make the OO way of modeling things - the language has to support powerful abstractions that help programmers write extensible code.

The Expression Problem

This has been one of the classical examples to demonstrate how mainstream OO languages lack abstractions for modeling an extensible solution. The problem has been described simplistically in this paper by Zenger and Odersky :

Suppose we have a datatype which is defined by a set of cases and we have processors which operate on this datatype. There are primarily two directions along which we can extend such a system:
• The extension of the datatype with new data variants,
• The addition of new processors.

The classical OO approach to solve this problem involves designing a polymorphic hierarchy of datatypes, with each concrete subclass modeling a data variant and implementing the set of processors. With this approach, extending datatypes is easy, just add a variant as another subclass. But extending processors is invasive, violates the Open/Closed principle and forces you to dig into every existing data variant. A typical fallout of embracing the noun/verb paradigm of modeling.

The alternate not-a-strictly-OO approach adopted by programmers to solve these problems is by implementing some sort of double dispatch using the Visitor design pattern. This is an example where the solution gets overtly complex and forces you to write code that simulates the run time type dispatch mechanism of the underlying programming language. The Visitor design pattern is yet another example of a workaround for the noun/verb metaphor in mainstream OO languages. The Visitor abstractions allow a physical separation of the processors from the datatype, but the individual processors very much match one-to-one with each individual datatype. In this paper (Matching Objects with Patterns) discussing the Scala language capabilities, Burak Emir et. al. states this problem with Visitors more succinctly :

The visitor design pattern causes a relatively high notational overhead for framework construction, because a visitor class has to be defined and matchWith methods have to be provided in all data variants. The pattern matching itself is disciplined but very verbose, especially for deep patterns. Visitors in their standard setting do not maintain representation independence, because case methods correspond one-to-one to data alternatives.

How can we get around this accidental complexity and make our OO code more succinct ? The answer is simple - more powerful abstractions.

Scala Pattern Matching

Scala offers pattern matching as a solution to the expression problem. Pattern matching has been one of the key features in many functional languages like Erlang - Scala has brought this feature to the OO paradigm. Martin Odersky has been quite a strong advocate of pattern matching, a feature which many purists consider orthogonal to encapsulation of abstractions. But definitely pattern matching in Scala provides a solution away from the typical noun/verb syndrome in today's OO modeling paradigms. Have a look here for a complete example of pattern matching in Scala to solve a problem which would have been much more verbose and complex using visitors.

And Java ?

Java has so long been the kingpin of the noun kingdom and there are tons of Java code that model verbs hanging off the nouns. However, with more and more functional paradigms crying out on the sidelines for their entry into the noun kingdom, things look to be changing. I have been dabbling a bit with the prototype of Closures released by Neal Gafter, and things have already started looking interesting. Closures in Java is definitely a very very welcome feature, and the community has already started battling for their introduction in Java 7. Ricky Clarkson has implemented pattern matching in Java using Neal's prototype. Though at a very early stage, it is indeed very heartening to see more powerful forms of abstractions making their way into the Java programming language. Java, as a platform, has already been enriched with languages like Scala and JRuby contributing many powerful abstractions of functional programming.

Reg has demonstrated the Equivalence relationship using the visitor pattern in a manner conforming to the tradition of verbosity in Java. With pattern matching and closures, things will improve towards more succinctness. Here is a version based on the pattern matching engine of Ricky :

class Main
{
  public static void main(String[] args) {
    Collection coll1 = new ArrayList<Integer>() {{ add(12); add(15); add(10); }};
    Collection coll2 = new TreeSet<Integer>() {{ add(12); add(15); add(10); }};

    // equivalence of List-List and List-Set
    System.out.println(equivalent(coll1, coll2)
        .add(List.class, List.class, {List l1, List l2 => CollectionUtils.isEqualCollection(l1, l2)})
        .add(List.class, Set.class, {List l1, Set s1 => CollectionUtils.isEqualCollection(l1, s1)})
        .done());

    Map<Integer, Integer> coll3 = new HashMap<Integer, Integer>() {{
      put(1, 12);
      put(2, 15);
      put(3, 10);
    }}

    // equivalence of List-List, List-Set, List-Map
    System.out.println(equivalent(coll2, coll3)
        .add(List.class, List.class, {List l1, List l2 => CollectionUtils.isEqualCollection(l1, l2)})
        .add(List.class, Set.class, {List l1, Set s1 => CollectionUtils.isEqualCollection(l1, s1)})
        .add(List.class, Map.class, {List l1, Map m1 => CollectionUtils.isEqualCollection(l1, m1.values())})
        .done());
  }

  public static <T,R> Matcher<T,R> equivalent(T t1, T t2) {
    return new Matcher<T,R>(t1, t2);
  }

  public static <T,R> Matcher<T,R> match(T t1, T t2) {
    return new Matcher<T,R>(t1, t2);
  }

  public static class Matcher<T,R> {
    public final T t1;
    public final T t2;
    public R r;

    public Matcher(T t1, T t2) {
      this.t1 = t1;
      this.t2 = t2;
    }

    public <U1 extends T, U2 extends T> Matcher<T,R> add(Class<U1> aCase, Class<U2> bCase, {U1,U2=>R} f) {
      if (aCase.isInstance(t1) && bCase.isInstance(t2))
        r=f.invoke(aCase.cast(t1), bCase.cast(t2));

      return this;
    }

    public R done() {
      return r;
    }
  }
}

Definitely a more succinct way to model equivalence than what we are used to in the Javaland. And this is only a start. There are quite a few rough edges to smoothen out, particularly with respect to handling generics and type erasure issues. But, I am sure we will see more and more power of abstractions coming into Java with people like Neal Gafter and Doug Lea pitching in. A tonne of thanks to these thoughtleaders for still keeping the Java language an interesting platform to play around with.

Wednesday, October 31, 2007

A Computer Literacy interview with Donald Knuth

A couple of days back, googling some time away while waiting for a flight at the airport, I happened to glean across an interview with Donald Knuth. It was taken around 1993 by Computer Literacy after the publication of his books on CWEB and The Stanford Graphbase. Besides talking about the obvious topics that you would expect out of a Knuth session, he also ruminates on his many other passions like music, piano and writing. We all know Knuth's obsession with TeX and METAFONT. But did you know that the Father of Analysis of Algorithms is an expert in piano music ? He says :

I would like to have friends come to the house and play four-hands piano music. If I could do it every week, I would. I hope to live long enough so that after I've finished my life's work on The Art of Computer Programming, I might compose some music. Just a dream... it might be lousy music, of course.

Go, read the whole piece .. it's a wonderful read about some of the thoughts of the legend.

Friday, October 19, 2007

Clojure is here

I came across this posting Lisp on the JVM in reddit and thought .. what the heck ? What's so great about it when we have already ABCL, KAWA, SISC for the JVM ? In fact the title in reddit is a bit misleading - Clojure is very much like Lisp. It is targetted for the JVM, but more than anything else, the design embodies lots of thoughts towards immutability, functional data structures, concurrency, STM etc. Here is a comment from the author himself on reddit :

Clojure has some tangible, non-superficial differences from Common Lisp and Scheme. They yield something that is different, and might or might not be more suitable depending on your programming style and application domain.

Most of the core data structures are immutable. This is part of an overall design philosophy to make Clojure a good language for concurrent/multi-core programming.

Most of the data structures are extensible abstractions. This is different from Common Lisp where you can't extend the sequence functions to your own data structures, for instance. Even invocability is an abstraction - allowing maps to be functions of their keys and vice-versa.

Clojure extends code-as-data to maps and vectors in a deep way. They have literal reader syntax, the compiler understands them, backquote works with them, they support metadata etc. Because they are efficiently immutable and persistent, they support very Lisp-y recursive usage, shared structure etc, in ways Common Lisp's hash tables and vectors cannot.

Clojure embraces its host platform in ways that the standard Lisps ported to the JVM can't. For instance, Common Lisp's strings could never be Java Strings since the former are mutable and the latter are not. Clojure strings are Java Strings. The Clojure sequence library functions work over Clojure and Java data structures transparently. Etc.

Clojure has metadata as a core concept, not something one could retrofit onto the built-in Common Lisp types.

Clojure is designed for concurrency. Vars (similar to CL special variables) have explicit threading semantics. Clojure has a software transactional memory system. Etc.

In short, Clojure is (non-gratuitously) different. If you don't want different, you don't want Clojure. If you like Lisp and need to write multi-threaded programs for the JVM, you might find something interesting in Clojure.

I had blogged about SISC sometime back and discussed how we could use Scheme as a more consumer friendly XML in your Java application. I think Clojure is going to be the most interesting dynamic language on the JVM very soon. There has never been a better time to learn Lisp !

Monday, October 15, 2007

Domain Modeling with JPA - The Gotchas - Part 4 - JPA is a nice way to abstract Repository implementation

When we model a domain, we love to work at a higher level of abstraction through Intention Revealing Interfaces, where the focus is always on the domain artifacts. Whenever at the domain layer we start coding in terms of SQLs and pulling resultsets out of the database through a JDBC connection, we lose the domain focus. When we start dealing with the persistent hierarchy directly instead of the domain hierarchy, we make the domain model more vulnerable, exposing it to the lower layers of abstraction. By decoupling the domain model from the underlying persistent relational model, JPA provides us an ideal framework for building higher levels of abstraction towards data access. The domain model can now access data in terms of collections of domain entities and not in terms of the table structures where these entities are deconstructed. And the artifact that provides us a unified view of the underlying collection of entities is the Repository.

Why Repository ?

In the days of EJB 2.0, we had the DAO pattern. Data Access Objects also provided with an abstraction of the underlying data model by defining queries and updates for every table of the model. However, the difference of DAOs with repositories are more semantic than technical. Repositories provide a higher level of abstraction and is a more natural habitant of the rich domain model. Repositories offer more controlled access to the domain model only through Aggregate roots implementing a set of APIs which follow the Ubiquitous Language of the domain. Programming at the JDBC level with the DAO pattern, we used to think in terms of individual tables and operations on them. However, in reality, the domain model is much more than that - it contains complex associations between entities and strong business rules that govern the integrity of the associations. Instead of having the domain model directly deal with individual DAOs, we had always felt the need for a higher level of abstraction that would shield the domain layer from the granularities of joins and projections. ORM frameworks like Hibernate gave us this ability and specifications like JPA standardized them. Build your repository to get this higher level of abstraction on top of DAOs.

You build a repository at the level of the Aggregate Root, and provide access to all entities underneath the root through the unified interface. Here is an example ..

@Entity
class Order {
  private String orderNo;
  private Date orderDate;
  private Collection<LineItem> lineItems;
  //..

  //.. methods
}

@Entity
class LineItem {
  private Product product;
  private long quantity;
  //..

  //.. methods
}

The above snippet shows the example of an Aggregate, with Order as the root. For an Order Management System, all that the user needs is to manipulate Orders through intention revealing interfaces. He should not be given access to manipulate individual line items of an Order. This may lead to inconsistency of the domain model if the user does an operation on a LineItem which invalidates the invariant of the Order aggregate. While the Order entity encapsulates all domain logic related to manipulation of an Order aggregate, the OrderRepository is responsible for giving the user a single point of interface with the underlying persistent store.

interface OrderRepository {
  //.. queries
  List<Order> getAll();
  List<Order> getByOrderNo(String orderNo);
  List<Order> getByOrderDate(Date orderDate);
  List<Order> getByProduct(Product product);

  //..
  void write(final Order order);
  //..
}

Now the domain services can use the service of this repository to access orders from the underlying database. This is what Eric Evans calls reconsitution as opposed to construction (which is typically the responsibility of the factory).

JPA to implement Repository

The nice thing to be able to program to a specification is the abstraction that you can enforce on your model. Repositories can be implemented using JPA and can nicely be abstracted away from the actual domain services. A Repository acts like a collection and provides user the illusion of using memory based data structures without exposing the internals of the interactions with the persistent store. Let us see a sample implementation of a method of the above repository ..

class OrderRepositoryImpl implements OrderRepository {
  //..
  //..

  public List<Order> getByProduct(Product product) {
    String query = "select o from Order o, IN (o.lineItems) li where li.product.id = ?1";
    Query qry = em.createQuery(query);
    qry.setParameter(1, product.getId());

    List<Order> res = qry.getResultList();
    return res;
  }
  //..
  //..
}

The good part is that we have used JPA to implement the Repository, but the actual domain services will not contain a single line of JPA code in them. All of the JPA dependencies can be localized within the Repository implementations. Have a look at the following OrderManagementService api ..

class OrderManagementService {
  //..
  // to be dependency injected
  private OrderRepository orderRepository;

  // apply a discount to all orders for a product
  public List<Order> markDown(final Product product, float discount) {
    List<Order> orders = orderRepository.getByProduct(product);
    for(Order order : orders) {
      order.setPrice(order.getPrice() * discount);
    }
    return orders;
  }
  //..
  //..
}

Note that the repository is injected through DI containers like Spring or Guice, so that the domain service remains completely independent of the implementation of the repository.

But OrderRepository is also a domain artifact !

Right .. and with proper encapsulation we can abstract away the JPA dependencies from OrderRepositoryImpl as well. I had blogged on this before on how to implement a generic repository service and make all domain repositories independent of the implementation.