Showing posts with label haskell. Show all posts

Monday, January 14, 2013

A language and its interpretation - Learning free monads

I have been playing around with free monads of late and finding them more and more useful in implementing separation of concerns between pure data and its interpretation. Monads generally don't compose. But if you restrict monads to a particular form, then you can define a sum type that composes. In the paper Data Types a la carte, Wouter Swierstra describes this form as

data Term f a =
    Pure a
    | Impure (f (Term f a))

These monads consist of either pure values or an impure effect, constructed using f. When f is a functor, Term f is a monad. And in this case, Term f is the free monad being the left adjoint to the forgetful functor f.

I am not going into the details of what makes a monad free. I once asked this question on google+ and Edward Kmett came up with a beautiful explanation. So instead of trying to come up with a half assed version of the same, have a look at Ed's response here .

In short, we can say that a free monad is the freeest object possible that's still a monad.

Composition

Free monads compose and help you build larger abstractions which are pure data and yet manage to retain all properties of a monad. Hmm .. this sounds interesting because now we can not only build abstractions but make them extensible through composition by clients using the fact that it's still a monad.

A free monad is pure data, not yet interpreted, as we will see shortly. You can pass it to a separate interpreter (possibly multiple interpreters) which can do whatever you feel like with the structure. Your free monad remains pure while all impurities can be put inside your interpreters.

And so we interpret ..

In this post I will describe how I implemented an interpreter for a Joy like concatenative language that uses the stack for its computation. Of course it's just a prototype and addresses an extremely simplified subset, but you get the idea. The basic purpose is to explore the power of free monads and come up with something that can potentially be extended to a full blown implementation of the language.

When designing an interpreter there's always the risk of conflating the language along with the concerns of interpreting it. Here we will have the 2 completely decoupled by designing the core language constructs as free monads. Gabriel Gonzalez has written a series of blog posts [1,2,3] on the use of free monads which contain details of their virtues and usage patterns. My post is just an account of my learning experience. In fact after I wrote the post, I discovered a similar exercise done for embedding Forth in Haskell - so I guess I'm not the first one to learn free monads using a language interpreter.

Let's start with some code, which is basically a snippet of Joy like code that will run within our Haskell interpreter ..

p :: Joy ()
p = do push 5
       push 6
       add
       incr
       add2
       square
       cube
       end

This is our wish list and at the end of the post we will see if we can interpret this correctly and reason about some aspects of this code. Let's not bother much about the details of the above snippet. The important point is that if we fire it up in ghci, we will see that it's pure data!

*Joy> p
Free (Push 5 (Free (Push 6 (Free (Add (Free (Push 1 (Free (Add (Free (Push 1 (Free (Add (Free (Push 1 (Free (Add (Free (Dup (Free (Mult (Free (Dup (Free (Dup (Free (Mult (Free (Mult (Free End))))))))))))))))))))))))))))))
*Joy>

We haven't yet executed anything of the above code. It's completely free to be interpreted and possibly in multiple ways. So, we have achieved this isolation - that the data is freed from the interpreter. You want to develop a pretty printer for this data - go for it. You want to apply semantics and give an execution model based on Joy - do it.

Building the pure data (aka Builder)

Let's first define the core operators of the language ..

data JoyOperator cont = Push Int cont
                      | Add      cont 
                      | Mult     cont 
                      | Dup      cont 
                      | End          
                      deriving (Show, Functor)

The interesting piece is the derivation of the Functor, which is required for implementing the forgetful functor adjoint of the free monad. Keeping the technical mumbo jumbo aside, free monads are just a general way of turning functors into monads. So if we have a core operator f as a functor, we can get a free monad Free f out of it. And knowing something is a free monad helps you transform transform an operation over the monad (the monad homomorphism) into an operation over the functor (functor homomorphism). We will see how this helps later ..

The other point to note is that all operators take a continuation argument that points to the next operation in the chain. End is the terminal symbol and we plan to ignore anything that the user enters after an End.

Push takes an Int and pushes into the stack, Add pops the top 2 elements of the stack and pushes the sum, Mult does the same for multiplication and Dup duplicates the top element of the stack. End signifies the end of program.

Next we define the free monad over JoyOperator by using the Free data constructor, defined as part of Control.Monad.Free ..

data Free f a = Pure a | Free (f (Free f a))

-- | The free monad over JoyOperator
type Joy = Free JoyOperator

And then follow it up with some of the definitions of Joy operators as operations over free monads. Note that liftF lifts an operator (which is a Functor) into the context of a free monad. liftF has the following type ..

liftF :: Functor f => f a -> Free f a

As a property, a free moand has a forgetful functor as its left adjoint. The unlifting from the monad to the functor is given by the retract function ..

retract :: Monad f => Free f a -> f a

and needless to say

retract . liftF = id

-- | Push an integer to the stack
push :: Int -> Joy ()
push n = liftF $ Push n ()

-- | Add the top two numbers of the stack and push the sum
add :: Joy ()
add = liftF $ Add ()

.. and this can be done for all operators that we wish to support.

Not only this, we can also combine the above operators and build newer ones. Remember we are working with monads and hence the *do* notation based sequencing comes for free ..

-- | This combinator adds 1 to a number. 
incr :: Joy ()
incr = do {1; add}

-- | This combinator increments twice
add2 :: Joy ()
add2 = do {incr; incr}

-- | This combinator squares a number
square :: Joy ()
square = do {dup; mult}

Now we can have a composite program which sequences through the core operators as well as the ones we derive from them. And that's what we posted as our first example snippet of a target program.

An Interpreter (aka Visitor)

Once we have the pure data part done, let's try and build an interpreter that does the actual execution based on the semantics that we defined on the operators.

-- | Run a joy program. Result is either an Int or an error
runProgram :: Joy n -> Either JoyError Int
runProgram program = joy [] program
  where joy stack (Free (Push v cont))         = joy (v : stack) cont
        joy (a : b : stack) (Free (Add cont))  = joy (a + b : stack) cont
        joy (a : b : stack) (Free (Mult cont)) = joy (a * b : stack) cont
        joy (a : stack) (Free (Dup cont))      = joy (a : a : stack) cont
        joy _ (Free Add {})                    = Left NotEnoughParamsOnStack
        joy _ (Free Dup {})                    = Left NotEnoughParamsOnStack
        joy [] (Free End)                      = Left NotEnoughParamsOnStack
        joy [result] (Free End)                = Right result
        joy _ (Free End)                       = Left NotEmptyOnEnd
        joy _ Pure {}                          = Left NoEnd

runProgram is the interpreter that takes a free monad as its input. Its implementation is quite trivial - it just matches on the recursive structure of the data and pushes the appropriate results on the stack. Now if we run our program p using the above interpreter we get the correct result ..

*Joy> runProgram p
Right 7529536
*Joy>

Equational Reasoning

Being Haskell and being pure, we obviously can prove some bits and pieces of our program as mathematical equations. At the beginning I said that End is the end of the program and anything after End needs to be ignored. What happens if we do the following ..

runProgram $ do {push 5; incr; incr; end; incr; incr}

If you have guessed correctly we get Right 7, which means that all operations after end have been ignored. But can we prove that our program indeed does this ? The following is a proof that end indeed ends our program. Consider some operation m follows end ..

end >> m

-- definition of end
= liftF $ End >> m

-- m >> m' = m >>= \_ -> m'
= liftF $ End >>= \_ -> m

-- definition of liftF
= Free End >>= \_ -> m

-- Free m >>= f = Free (fmap (>>= f) m)
= Free (fmap (>>= \_ -> m) End)

-- fmap f End = End
= Free End

-- liftF f = Free f (f is a functor)
= liftF End

-- definition of End
= end

So this shows that any operation we do after end is never executed.

Improving the bind

Free monads offer improved modularity by allowing us to separate the building of pure data from the (possibly) impure concerns of interpreting it with the external world. But a naive implementation leads to a penalty in runtime efficiency as Janis Voigtlander discusses in his paper Asymptotic Improvement of Computations over Free Monads. And the Haskell's implementation of free monads had this inefficiency where the asymptotic complexity of substitution was quadratic because of left associative bind. Edward Kmett implemented the Janis trick and engineered a solution that gets over this inefficiency. I will discuss this in a future post. And if you would like to play around with the interpreter, the source is there in my github.

Thursday, January 03, 2013

strict : recursion :: non-strict : co-recursion

Consider the very popular algorithm that uses a tail recursive call to implement a map over a List. Here's the implementation in F#

let map converter l =
  let rec loop acc = function
    | [] -> acc
    | hd :: tl -> loop (converter hd :: acc) tl
  List.rev (loop [] l)

Scala is also a statically typed functional programming language, though it uses a completely different trick and mutability to implement map over a sequence. Let's ignore it for the time being and imagine we are being a bonafide functional programmer.

The above code uses a very common idiom of accumulate-and-reverse to implement a tail recursive algorithm. Though Scala stdlib does not use this technique and uses a mutable construct for this implementation, we could have done the same thing in Scala as well ..

Both Scala and F# are languages with strict evaluation semantics as the default. What would a similar tail recursive Haskell implementation look like ?

map' f xs = reverse $ go [] xs
    where
    go accum [] = accum
    go accum (x:xs) = go (f x : accum) xs

Let's now have a look at the actual implementation of map in Haskell prelude ..

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

Whoa! It's not a tail recursion and instead a body recursive implementation. Why is that ? We have been taught that tail recursion is the holy grail since it takes constant stack space and all ..

In a strict language the above implementation of map will be a bad idea since it uses linear stack space. On the other hand the initial implementation is tail recursive. But Haskell has non-strict semantics and hence the last version of map consumes only one element of the input and yields one element of the output. The earlier versions consume the whole input before yielding any output.

In a lazy language what you need is to make your algorithm co-recursive. And this needs co-data. Understanding co-data or co-recursion needs a brief background of co-induction and how it differs from induction.

When we define a list in Haskell as

data [a] = [] | a : [a]

it means that the set "List of a" is the smallest set such that [] is in the List of a, and if [a] is in the List of a and a is in a, then a : [a] is in List of a. This is the inductive definition os a List and we can use recursion to implement various properties of a List. Also once we have a recursive definition, we can use structural induction to prove various properties of the data structure.

If an inductive definition on data gives us the smallest set, a co-inductive definition on co-data gives us the largest set. Let's define a Stream in Haskell ..

data Stream a = Cons a (Stream a)

First to note - unlike the definition of a List, there's no special case for empty Stream. Hence no base case unlike the inductive definition above. And a "Stream of a" is the largest set consisting of a pair of an "a" and a "Stream of a". Which means that Stream is an infinite data structure i.e. the range (or co-domain) of a Stream is infinite. And we implement properties on co-inductive data structures using co-recursive programs. A co-recursive program is defined as one whose range is a type defined recursively as the greatest solution of some equation.

In terms of a little bit of mathematical concepts, if we model types as sets, then an infinite list of integers is the greatest set X for which there is a bijection X ≅ ℤ x X and a program that generates such a list is a co-recursive program. As its dual, the type of a finite list is the least set X for which X ≅ 1 + (ℤ x X), where 1 is a singleton set and + is the disjoint union of sets. Any program which consumes such an input is a recursive program. (Note the dualities in italics).

Strictly speaking, the co-domain does not necessarily have to be infinite. If you have read my earlier post on initial algebra, co-recursion recursively defines functions whose co-domain is a final data type, dual to the way that ordinary recursion recursively defines functions whose domain is an initial data type.

In case of primitive recursion (with the List data type above), you always frame the recursive step to operate on data which gets reduced in size in subsequent steps of recursive calls. It's not so with co-recursion since you may be working with co-data and infinite data structures. Since in Haskell, a List can also be infinite, using the above co-recursive definition of map, we can have

head $ map (1+) [1..]

which invokes map on an infinite list of integers. And here the co-recursive steps of map operate successively on sets of data which are not less than the earlier set. So, it's not tail recursion that makes an efficient implementation in Haskell, you need to make the co-recursive call within the application of a constructor.

As the first post of the new year, that's all friends. Consider this as the notes from someone learning the principles of co-inductive definitions. Who knew co-recursion is so interesting ? It will be more interesting once we get down into the proof techniques for co-recursive programs. But that's for a future post ..

Friday, August 10, 2012

Category Theory and Programming - Reinforcing the value of generic abstractions

Ok .. so my last post on the probable usefulness of Category Theory in programming generated an overwhelming response all over the places. It initiated lots of discussions which are always welcome and offers a great conduit towards enlightenment of everyone. From all these discussions, we can summarize that it's not a boolean answer whether category theory has any usefulness in making you a better programmer. Category theory is abstract, probably more abstract than what our normal programmer's mind can comprehend. Here's what one mathematician's bias tells us on HN ..

"This may be a mathematician's bias, but Category Theory is somewhat underwhelming because you can't really do anything with it. It's actually too abstract--you can model basically every branch of mathematics with Categories, but you can't actually prove very much about them from this perspective. All you get is a different vocabulary to talk about results that you already knew. It's also less approachable because its objects of study are other branches of mathematics, whereas most other branches have number, vectors, and functions as standard objects."

Now let me digress a bit towards personal experience. I don't have a strong math background and have only started learning category theory. I don't get most of the mathy stuff that category theorists talk about that I cannot relate to my programming world. But the subset that Pierce talks about and which I can relate to when I write code gives me an alternate perspective to think about functional abstractions. And, as I mentioned in my last post, category theory focuses more on the morphisms than on the objects. When I am programming in a functional language, this has a strong parallel towards emphasizing how abstractions compose. Specifically point free style composition has its inspiration from category theory and the way it focuses on manipulating arrows across objects.

Focusing on Generic Abstractions

Category theory gives us Functors, Applicatives, Monads, Semigroups, Monoids and a host of other abstractions which we use extensively in our everyday programming. Or at least I try to. These have taught me to think in terms of generic abstractions. I will give you a practical example from my personal experience that served as a great eye opener towards appreciating the value of generic abstractions ..

I program mostly in Scala and some times in Haskell. And one thing I need to do frequently (and it's not at all something unique to me) is apply a function f to a collection of elements (say of type Foo) with the caveat that the function may sometimes return no result. In Scala the obvious way to model this is for the function to return an Option type. Just for the record, Option is the equivalent of Maybe in Haskell and can be one of the two concrete types, Some (indicating the presence of a value) and None (indicates no value). So, applying f: Foo => Option[Bar] over the elements of a List[Foo], gives me List[Option[Bar]]. My requirement was to convert the result into an Option[List[Bar]], which would have a value only if all the Foos in the List gave me Some[Bar], and None otherwise. Initially I implemented this function as a specialized utility for Option data type only. But then I realized that this function could very well be applicable for many other data types like Either, List etc. And this aha! moment came courtsey of Scalaz which has a function sequence that does the exact same thing and works for all Traversible data structures (not just Lists) and containing anything that's an instance of an Applicative Functor. The Scala standard library doesn't have generic abstractions like Monad, Monoid or Semigroup and I constantly find reinventing them all the time within real world domain models that I program. No wonder I have Scalaz as a default dependency in almost all of my Scala projects these days.

So much for generic abstractions .. Now the question is do I need to know Category Theory to learn the generic abstractions like Applicative Functors or Monads ? Possibly no, but that's because some of the benevolent souls in the programming community have translated all those concepts and abstractions to a language that's more approachable to a programmer like me. At the same time knowing CT certainly gives you a broader perspective. Like the above quote says, you can't pinpoint and say that I am using this from my category theory knowledge. But you can always get a categorical perspective and a common vocabulary that links all of them to the world of mathematical composition.

Natural Transformation, Polymorphism and Free Theorems

Consider yet another aha! moment that I had some time back when I was trying to grok Natural Transformations, which are really morphisms between Functors. Suppose I have 2 functors F and G. Then a natural transformation η: F → G is a map that takes an object (in Haskell, a type, say, a) and returns a morphism from F(a) to G(a), i.e.

η :: forall a. F a → G a

Additionally, a natural transformation also needs to satisfy the following constraint ..

For any f :: A → B, we must have fmap_G f . η = η . fmap_F f

Consider an example from Haskell, the function reverse on lists, which is defined as

reverse :: [a] -> [a], which we can more specifically state as reverse :: forall a. [a] -> [a] ..

Comparing reverse with our earlier definition of η, we have F ≡ [] and G ≡ []. And since for lists, fmap = map, we have the other criterion for natural transformation as map f . reverse = reverse . map f. And this is the free theorem for reverse!

Not only this, we can get more insight from the application of natural transformation. Note the forall annotation above. It literally means that η (or reverse) works for any type a. That is, the function is not allowed to peek inside the arguments and its behavior does not depend on the exact type of a. We can freely substitute one type for another and yet have the same behavior of the function. Hence a natural transformation is all about polymorphic functions.

Summary

I agree that Category Theory is not a must read to do programming, particularly if you are not using a typed functional language. But often I find a category diagram or a categorical analysis reveals a lot of insight about an abstraction, particularly with respect to how it composes with other similar abstractions in the world.

Monday, July 30, 2012

Does category theory make you a better programmer ?

How much of category theory knowledge should a working programmer have ? I guess this depends on what kind of language the programmer uses in his daily life. Given the proliferation of functional languages today, specifically typed functional languages (Haskell, Scala etc.) that embeds the typed lambda calculus in some form or the other, the question looks relevant to me. And apparently to a few others as well. In one of his courses on Category Theory, Graham Hutton mentioned the following points when talking about the usefulness of the theory :

Building bridges—exploring relationships between various mathematical objects, e.g., Products and Function
Unifying ideas - abstracting from unnecessary details to give general definitions and results, e.g., Functors
High level language - focusing on how things behave rather than what their implementation details are e.g. specification vs implementation
Type safety - using types to ensure that things are combined only in sensible ways e.g. (f: A -> B g: B -> C) => (g o f: A -> C)
Equational proofs—performing proofs in a purely equational style of reasoning

Many of the above points can be related to the experience that we encounter while programming in a functional language today. We use Product and Sum types, we use Functors to abstract our computation, we marry types together to encode domain logic within the structures that we build and many of us use equational reasoning to optimize algorithms and data structures.

But how much do we need to care about how category theory models these structures and how that model maps to the ones that we use in our programming model ?

Let's start with the classical definition of a Category. [Pierce] defines a Category as comprising of:

a collection of objects
a collection of arrows (often called morphisms)
operations assigning to each arrow f an object dom f, its domain, and an object cod f, its codomain (f: A → B, where dom f = A and cod f = B
a composition operator assigning to each pair of arrows f and g with cod f = dom g, a composite arrow g o f: dom f → cod g, satisfying the following associative law:

f: A → B

g: B → C

h: C → D

h o (g o f) = (h o g) o f

for each object A, an identity arrow id_A: A → A satisfying the following identity law:

f: A → B

id_B o f = f

f o id_A = f

Translating to Scala

Ok let's see how this definition can be mapped to your daily programming chores. If we consider Haskell, there's a category of Haskell types called Hask, which makes the collection of objects of the Category. For this post, I will use Scala, and for all practical purposes assume that we use Scala's pure functional capabilities. In our model we consider the Scala types forming the objects of our category.

You define any function in Scala from type A to type B (A => B) and you have an example of a morphism. For every function we have a domain and a co-domain. In our example, val foo: A => B = //.. we have the type A as the domain and the type B as the co-domain.

Of course we can define composition of arrows or functions in Scala, as can be demonstrated with the following REPL session ..

scala> val f: Int => String = _.toString
f: Int => String = <function1>

scala> val g: String => Int = _.length
g: String => Int = <function1>

scala> f compose g
res23: String => String = <function1>

and it's very easy to verify that the composition satisfies the associative law.

And now the identity law, which is, of course, a specialized version of composition. Let's define some functions and play around with the identity in the REPL ..

scala> val foo: Int => String = _.toString
foo: Int => String = <function1>

scala> val idInt: Int => Int = identity(_: Int)
idInt: Int => Int = <function1>

scala> val idString: String => String = identity(_: String)
idString: String => String = <function1>

scala> idString compose foo
res24: Int => String = <function1>

scala> foo compose idInt
res25: Int => String = <function1>

Ok .. so we have the identity law of the Category verified above.

Category theory & programming languages

Now that we understand the most basic correspondence between category theory and programming language theory, it's time to dig a bit deeper into some of the implicit correspondences. We will definitely come back to the more explicit ones very soon when we talk about products, co-products, functors and natural transformations.

Do you really think that understanding category theory helps you understand the programming language theory better ? It all depends how much of the *theory* do you really care about. If you are doing enterprise software development and/or really don't care to learn a language outside your comfort zone, then possibly you come back with a resounding *no* as the answer. Category theory is a subject that provides a uniform model of set theory, algebra, logic and computation. And many of the concepts of category theory map quite nicely to structures in programming (particularly in a language that offers a decent type system and preferably has some underpinnings of the typed lambda calculus).

Categorical reasoning helps you reason about your programs, if they are written using a typed functional language like Haskell or Scala. Some of the basic structures that you encounter in your everyday programming (like Product types or Sum types) have their correspondences in category theory. Analyzing them from CT point of view often illustrates various properties that we tend to overlook (or take for granted) while programming. And this is not coincidental. It has been shown that there's indeed a strong correspondence between typed lambda calculus and cartesian closed categories. And Haskell is essentially an encoding of the typed lambda calculus.

Here's an example of how we can explain the properties of a data type in terms of its categorical model. Consider the category of Products of elements and for simplicity let's take the example of cartesian products from the category of Sets. A cartesian product of 2 sets A and B is defined by:

A X B = {(a, b) | a ∈ A and b ∈ B}

So we have the tuples as the objects in the category. What could be the relevant morphisms ? In case of products, the applicable arrows (or morphisms) are the projection functions π₁: A X B → A and π₂: A X B → B. Now if we draw a category diagram where C is the product type, then we have 2 functions f: C → A and g: C→ B as the projection functions and the product function is represented by : C → A X B and is defined as <F, G>(x) = (f(x), g(x)). Here's the diagram corresponding to the above category ..

and according to the category theory definition of a Product, the above diagram commutes. Note, by commuting we mean that for every pair of vertices X and Y, all paths in the diagram from X to Y are equal in the sense that each path forms an arrow and these arrows are equal in the category. So here commutativity of the diagram gives
π₁ o <F, G> = f and
π₂ o <F, G> = g.

Let's now define each of the functions above in Scala and see how the results of commutativity of the above diagram maps to the programming domain. As a programmer we use the projection functions (_1 and _2 in Scala's Tuple2 or fst and snd in Haskell Pair) on a regular basis. The above category diagram, as we will see gives some additional insights into the abstraction and helps understand some of the mathematical properties of how a cartesian product of Sets translates to the composition of functions in the programming model.

scala> val ip = (10, "debasish")
ip: (Int, java.lang.String) = (10,debasish)

scala> val pi1: ((Int, String)) => Int = (p => p._1)
pi1: ((Int, String)) => Int = <function1>

scala> val pi2: ((Int, String)) => String = (p => p._2)
pi2: ((Int, String)) => String = <function1>

scala> val f: Int => Int = (_ * 2)
f: Int => Int = <function1>

scala> val g: Int => String = _.toString
g: Int => String = <function1>

scala> val `<f, g>`: Int => (Int, String) = (x => (f(x), g(x)))
<f, g>: Int => (Int, String) = <function1>

scala> pi1 compose `<f, g>`
res26: Int => Int = <function1>

scala> pi2 compose `<f, g>`
res27: Int => String = <function1>

So, as we claim from the commutativity of the diagram, we see that pi1 compose `<f, g>` is typewise equal to f and pi2 compose `<f, g>` is typewise equal to g. Now the definition of a Product in Category Theory says that the morphism between C and A X B is unique and that A X B is defined upto isomorphism. And the uniqueness is indicated by the symbol ! in the diagram. I am going to skip the proof, since it's quite trivial and follows from the definition of what a Product of 2 objects mean. This makes sense intuitively in the programming model as well, we can have one unique type consisting of the Pair of A and B.

Now for some differences in semantics between the categorical model and the programming model. If you consider an eager (or eager-by-default) language like Scala, the Product type fails miserably in presence of the Bottom data type (_|_) represented by Nothing. For Haskell, the non-strict language, it also fails when we consider the fact that a Product type needs to satisfy the equations (fst(p), snd(p)) == p and we apply the Bottom (_|_) for p. So, the programming model remains true only when we eliminate the Bottom type from the equation. Have a look at this comment from Dan Doel in James Iry's blog post on sum and product types.

This is an instance where a programmer can benefit from knwoledge of category theory. It's actually a bidirectional win-win when knowledge of category theory helps more in understanding of data types in real life programming.

Interface driven modeling

One other aspect where category theory maps very closely with the programming model is its focus on the arrows rather than the objects. This corresponds to the notion of an interface in programming. Category theory typically "abstracts away from elements, treating objects as black boxes with unexamined internal structure and focusing attention on the properties of arrows between objects" [Pierce]. In programming also we encourage interface driven modeling, where the implementation is typically abstracted away from the client. When we talk about objects upto isomorphism, we focus solely on the arrows rather than what the objects are made of. Learning programming and category theory in an iterative manner serves to enrich your knowledge on both. If you know what a Functor means in category theory, then when you are designing something that looks like a Functor, you can immediately make it generic enough so that it composes seamlessly with all other functors out there in the world.

Thinking generically

Category theory talks about objects and morphisms and how arrows compose. A special kind of morphism is Identity morphism, which maps to the Identity function in programming. This is 0 when we talk about addition, 1 when we talk about multiplication, and so on. Category theory generalizes this concept by using the same vocabulary (morphism) to denote both stuff that do some operations and those that don't. And it sets this up nicely by saying that for every object X, there exists a morphism id_X : X → X called the identity morphism on X, such that for every morphism f: A → B we have id_B o f = f = f o id_A. This (the concept of a generic zero) has been a great lesson at least for me when I identify structures like monoids in my programming today.

Duality

In the programming model, many dualities are not explicit. Category theory has an explicit way of teaching you the dualities in the form of category diagrams. Consider the example of Sum type (also known as Coproduct) and Product type. We have abundance of these in languages like Scala and Haskell, but programmers, particularly people coming from the imperative programming world, are not often aware of this duality. But have a look at the category diagram of the sum type A + B for objects A and B ..

It's the same diagram as the Product only with the arrows reversed. Indeed a Sum type A + B is the categorical dual of Product type A X B. In Scala we model it as the union type like Either where the value of the sum type comes either from the left or the right. Studying the category diagram and deriving the properties that come out of its commutativity helps understand a lot of theory behind the design of the data type.

In the next part of this discussion I will explore some other structures like Functors and Natural Transformation and how they map to important concepts in programming which we use on a daily basis. So far, my feeling has been that if you use a typed functional language, a basic knowledge of category theory helps a lot in designing generic abstractions and make them compose with related ones out there in the world.

Monday, July 11, 2011

Datatype generic programming in Scala - Fixing on Cata

The paper Functional Programming with Overloading and Higher-Order Polymorphism by Mark P Jones discusses recursion and fixpoints in a section Recursion schemes: Functional programming with bananas and lenses with all examples in Haskell. The paper is an excellent read for anyone interested in an overall landscape of functional programming concepts. In case you haven’t read it, stop reading this post and have a look at it.

The section on recursion in the original paper discusses type level fixpoints and how it can be used to define generic abstractions like catamorphism and anamorphism. These are used extensively in datatype generic programming, where you can define generic combinators parameterized by the shape of the data. A very common example of a combinator parameterized by the shape or type constructor is fold, which works with many recursive structures like List, Tree etc.

I have been trying to use Scala to do the same examples that the paper models in the section on recursion. I was curious to find out if Scala's type system is expressive enough to implement higher order abstractions like the type level fixpoint combinator that the paper implements using Haskell. Oliveira and Gibbons also made similar studies on the suitability of Scala for datatype generic programming and have translated Gibbons' ORIGAMI for a subset of the GoF patterns in Scala.

In order to learn type level fixpoints, I started with value level fixpoints using untyped lambda calculus. Once you study the properties of a fixed point for functions, you can find out the mappings with type level fixpoints. The former takes functions and maps types, while the latter takes type constructors and maps kinds.

I start with an introduction and some proofs for value level fixpoints. Most of it is pretty basic stuff - still I wanted to say these things as it may prove useful to someone learning my way. In case you are familiar with this, feel free to skip to the next section.

Value level fixpoint

Lambda calculus helps us split a recursive function definition into 2 parts - a non-recursive computable definition that abstracts the recursive call away to an additional parameter and a fixpoint operator that encodes the recursion part. This way we take the recursion off the main function that we model. Here’s the usual factorial that goes through this process and lands up as a fixed point computation ..

FACT = λn.IF (= n 0) 1 (* n (FACT (- n 1)))) // λ calculus

FACT = (λfac (λn.IF (= n 0) 1 (* n (fac (- n 1)))))) FACT // beta abstraction

Note here we have isolated the main factorial function as an abstraction that does not have any recursion. Call the above equation ..

FACT = H FACT

and this encodes the recursion part. H is a function which when applied to FACT gives back a FACT. We call FACT a fixpoint of H.

What then is a fixpoint combinator ?

A fixpoint combinator is a function Y which takes another function as argument and generates its fixpoint. e.g. in the case above Y when applied on H will give us the fixpoint FACT.

Y H = FACT        // from definition of Y    #1

FACT = H FACT     // from above              #2

Y H = H FACT      // applying #2 in #1       #3

Y H = H (Y H)     // applying #1 in #3       #4

Note we started with FACT as the model of our factorial. It’s an interesting exercise to see that FACT 4 indeed results in 24 following the above formula.

So Y is the magic that helps us express any recursive function as a non-recursive computation. In fact Y can be expressed as a lambda expression without using recursion ..

Y = (λh. (λx. h(x x)) (λx. h(x x)))

Let’s see what Y H evaluates to ..

Y H

= (λh. (λx. h(x x)) (λx. h(x x))) H    // definition of Y

-> (λx. H(x x)) (λx. H(x x))                 // beta reduction

-> H ((λx. H(x x)) (λx. H(x x)))             // beta reduction

-> H (Y H)

So that’s cool .. we have successfully derived the Y combinator as the lambda expression.

In the above we did the derivation of the fixed point combinator for any recursive function, which is a value. The subject of today’s post is the fixed combinator for types. We will see how the above maps to the same model when applied at the type level.

Y for types

In this post I will model the type level fixpoint combinator in Scala showing that Scala’s type system has the power to express all the data type generic abstractions that Mark does using Haskell.

Here’s what we saw as fixed point for values (functions) ..

Y H = H (Y H)

The fixed point combinator for types is usually called Mu .. In Haskell we will say ..

data Mu f = In (f (Mu f))

Note the correspondence .. while Y takes functions and maps across types, Mu takes type constructors and maps across kinds ..

GHCi> :t Y

Y :: (a -> a) -> a

GHCi> :k Mu

Fix :: (* -> *) -> *

Now in Scala we model Mu as a case class that takes a type constructor as parameter. Like the paper we only consider unary type constructors - it’s not that difficult to generalize it to higher arities ..

case class Mu[F[_]](out: F[Mu[F]])

Just like with value level fixpoint Y we can isolate the recursive function into a non recursive computation, we can do the same thing on types with Mu. Consider the following data type declaration for modeling Natural Numbers ..

trait Nat

case object Zero extends Nat

case class Succ(n: Nat) extends Nat

.. a recursive type .. let’s break the recursion by introducing an additional type parameter ..

trait NatF[+S]

case class Succ[+S](s: S) extends NatF[S]

case object Zero extends NatF[Nothing]



type Nat = Mu[NatF]

.. and modeling the actual type Nat as a fixpoint of NatF.

Mu is actually a functor fixpoint, which means that it works on functors (more specifically covariant functors). In our case we need to define a functor for NatF. No problem .. scalaz is there .. and here’s the functor instance for NatF ..

implicit object functorNatF extends Functor[NatF] {

  def fmap[A, B](r: NatF[A], f: A => B) = r match {

    case Zero => Zero

    case Succ(s) => Succ(f(s))

  }

}

And we define a couple of convenience functions for the zero natural number and the successor function ..

def zero: Nat = Mu[NatF](Zero)

def succ: Nat => Nat = (s: Nat) => Mu[NatF](Succ(s))

What we did with Mu

So far we defined a datatype as a fixpoint of a functor. Instead of making the datatype definition recursive we abstracted the recursion within the fixpoint combinator. And we did all these as a general strategy that can be applied to many other recursive datatypes.

Let’s apply the same pattern to a List data type .. ok we use an IntList, a list of integers, following the paper.

trait IntListF[+S]

case object Nil extends IntListF[Nothing]

case class Cons[+S](x: Int, xs: S) extends IntListF[S]



// the integer list as a fixpoint of IntListF

type IntList = Mu[IntListF]



// convenience functions for the constructors

def nil = Mu[IntListF](Nil)

def cons = (x: Int) => (xs: IntList) => Mu[IntListF](Cons(x, xs))

.. and the functor instance ..

implicit object functorIntListF extends Functor[IntListF] {

  def fmap[A, B](r: IntListF[A], f: A => B) = r match {

    case Nil => Nil

    case Cons(n, x) => Cons(n, f(x))

  }

}

Note the similarity in structure for both the datatypes Nat and IntList and how the type constructors in both the cases IntListF and NatF determine the shape of the computation for the respective datatypes. This means that the datatype definition is parameterized by a type constructor that determines the shape of the data that will be modeled by the datatype.

And now for the cata

Now with the above abstraction of Mu in place, we can define a combinator that can be shown to be more generalized than a fold .. it’s catamorphism, which we define as ..

def cata[A, F[_]](f: F[A] => A)(t: Mu[F])(implicit fc: Functor[F]): A = {

  f(fc.fmap(t.out, cata[A, F](f)))

}

For recursive datatypes folds will have different types, but cata is a more general abstraction that is capable of defining all of the operations on the datatype. cata is similar to a fold, just more generic. The signature of fold varies with the datatype on which it's applied. But the above cata definition is generic enough to model functions on type constructors for which there's a matching functor instance.

In this blog post Tony Morris discusses a catamorphism on an Option data type in Scala. He defines a cata which is specific for that data type. The above definition is at a higher level of abstraction and is parameterized on the shape of the data that it takes. The following snippets use the same cata to define functions for Nat as well as IntList. Datatype generic abstractions FTW.

Have a look at the following functions on Nat, all defined in terms of the cata combinator ..

def fromNat = cata[Int, NatF] {

  case Zero => 0

  case Succ(n) => 1 + n

} _ 



scala> fromNat(succ(succ(zero)))

res14: Int = 2



def addNat(m: Nat, n: Nat) = cata[Nat, NatF] {

  case Zero => m

  case Succ(x) => succ(x)

} (n)



scala> fromNat(addNat(succ(zero), succ(zero)))

res15: Int = 2

scala> fromNat(addNat(succ(zero), succ(succ(zero))))

res16: Int = 3



def mulNat(m: Nat, n: Nat) = cata[Nat, NatF] {

  case Zero => zero

  case Succ(x) => addNat(m, x)

} (n)



scala> fromNat(mulNat(succ(succ(succ(zero))), succ(succ(zero))))

res1: Int = 6

scala> fromNat(mulNat(succ(succ(succ(zero))), zero))

res2: Int = 0



def expNat(m: Nat, n: Nat) = cata[Nat, NatF] {

  case Zero => succ(zero)

  case Succ(x) => mulNat(m, x)

} (n)



scala> fromNat(expNat(succ(succ(succ(zero))), zero))

res0: Int = 1

scala> fromNat(expNat(succ(succ(succ(zero))), succ(succ(zero))))

res1: Int = 9

.. and some more on IntList using the same cata ..

def sumList = cata[Int, IntListF] {

  case Nil => 0

  case Cons(x, n) => x + n

} _



scala> sumList(cons(1)(cons(2)(cons(3)(nil))))

res1: Int = 6



def len = cata[Nat, IntListF] {

  case Nil => zero

  case Cons(x, xs) => succ(xs)

} _



scala> fromNat(len(cons(1)(cons(2)(cons(3)(nil)))))

res1: Int = 3

Monday, May 23, 2011

Combinators as the sublanguage of DSL syntax

In my presentation on DSLs in PhillyETE I had talked about organizing DSL syntax around a sub language of combinators. The underlying implementation model needs to be segregated from the engine that renders the syntax of your DSL. Doing this you can decouple your semantic model and reuse it in the context of other DSLs or even in some entirely different context. I called this the model-view architecture of DSLs. In this post I would like to extend my thoughts along similar lines and claim that publishing combinators that wire your model elements is a very effective way towards compositionality of your DSL syntax.

Functions compose naturally - hence if your DSL publishes combinators then your client can compose them to form larger structures. But for this composition to be effective, you need to have the proper types for each of them - yes, we are talking about type-safe composition.

In Scala I tend to use the curried syntax a lot when I design combinators. Let’s have a look at an example .. we assume that we have a domain model designed using some paradigm - it can be OO, it can be functional or maybe a mix of the two. When we implement a business rule, we use these model elements, wire them up and present a proper syntax to the user. As an example here's a domain rule for enrichment of a securities trade as it goes through its processing pipeline ..

get the tax/fee ids for a trade
calculate tax/fee values
enrich trade with net cash amount

Here are a few snippets of combinators that help users implement the rule ..

// get the list of tax/fees applicable for this trade

// depends on the market

val forTrade: Trade => Option[List[TaxFeeId]] = {trade =>

  taxFeeForMarket.get(trade.market) <+> taxFeeForMarket.get(Other)

}



// all tax/fees for a specific trade

val taxFeeCalculate: Trade => List[TaxFeeId] => List[(TaxFeeId, BigDecimal)] = {t => tids =>

  tids zip (tids ? valueAs(t))

}



val enrichTradeWith: Trade => List[(TaxFeeId, BigDecimal)] => BigDecimal = {trade => taxes =>

  taxes.foldLeft(principal(trade))((a, b) => a + b._2)

}

Note each of these combinators return a function instead of a concrete business abstraction. Of course combinators are supposed to return a function only. But my point is that with such a strategy it becomes easier to build larger abstractions out of these smaller ones. And these combinators will be domain facing - each of them implements a snippet of a business rule. Hence take special care to name them appropriately so that you have that ubiquitous language going within your DSL syntax. Here’s how we combine the above to form a larger business rule that implements the “enrichment of trade” functionality ..

// enrichment of trade

// implementation follows problem domain model

val enrich = for {

  // get the tax/fee ids for a trade

  taxFeeIds      <- forTrade 



  // calculate tax fee values

  taxFeeValues   <- taxFeeCalculate 



  // enrich trade with net amount

  netAmount      <- enrichTradeWith 

}

yield((taxFeeIds map taxFeeValues) map netAmount)



// enriching a trade

trade map enrich should equal(…)

.. very concise, very succinct and brings out the domain semantics very explicitly. This composition was possible only because we used the curried syntax in designing our combinators. Haskell has curry-by-default - no wonder it revels in designing of embedded DSLs.

So, is combinators as the building block of DSL syntax a good idea to follow ?

I have been using this strategy for some time now and it has been working out quite nicely for me. It’s not essential that your underlying domain model has to be purely functional. I have been using libraries like scalaz in scala that offers a good hierarchy of typeclasses which help a lot in designing generic combinators that work across an unrelated family of abstractions. In my PhillyETE talk I discussed how functors allows you to map over entities across diverse abstractions like List, Option and Functions. And once you make your combinators generic, they tend to be adopted within the DSL as part of the ubiquitous language.