Don't have a COW, man? What Haskell teaches us about writing Enterprise-scale software
Berlin Brown asked:
I read programming.reddit religiously but I rarely see something that could be used in a non-startup environment. Am I wrong, or are you considering deploying a haskell enterprise web application? And if the stuff discussed isn’t relevant for the next 5 years (i.e. an erlang based webapp) will it ever be relevant?
Yes, I use what I read on
programming.reddit.com in my day job. That’s one of the reasons I
have this day job: it’s part of what I do to sift through all of the cool stuff and find the things that are practical today.
Since you mentioned
Haskell:
Consider a multi-threaded application with shared memory, like a really big web server that has some big shared collection of things in memory. From time to time you add things to the big collection, from time to time you remove them.
Pessimism and coarse-grained lockingOne way to arbitrate multiple threads is to have one copy of the collection with strict locking protocols that apply to its coarse-grained operations like
add
,
remove
, and
fetch
.
In a language like Java, you can
synchonize
those methods and you’re done. You have just implemented
mutex locking: only one thread can use the collection at a time. If one thread is fetching something from the collection, all other threads must wait, even if all they want to do is fetch things as well.
That sucks
tbng qvpx, especially if you do lots of reading: why should thread
546
have to wait to fetch something just because thread
532
is currently fetching something?
1Read and write locksThe next improvement is to have two kinds of locks:
read locks and
write locks. Two or more threads can lock the collection for reading, but if any thread locks the collection for writing, all of the other threads have to wait until it is done.
This works for about 17 clock ticks, and then you fix the bug by adding a new rule: if a thread wants a write lock but one or more threads have read locks, it has to wait, but any other threads that want read locks can’t have them. Even though the only threads with locks have read locks, they still have to wait.
The thread waiting to write gets a kind of pending write lock that blocks all other threads from taking out new locks. And then you fix the next bug by saying that all threads waiting wait in a priority queue so that the read threads aren’t starved by write threads and the write threads aren’t starved by read threads.
Purely Functional Data Structures takes you step by step through the design and implementation of copy on write collections. These collections
can be used in purely functional languages, but they are just as useful in multi-paradigm languages like Java, Ruby, or Python handling multiple threads and performance optimization. The author’s
thesis is available on line for free.
You now have a system that is pretty fast a long as you don’t write things very often. For example, you could build a fairly nice cache using read-write locking provided it is tuned so that you get lots of hits and only rarely have to drop things from the cache or add things to the cache. But if you’re doing something like maintaining a big index in memory where you have to make lots of updates, the writes will slow everything down.
These kinds of locking protocols are called
pessimistic protocols: you assume bad things will happen and prevent them from happening up front by blocking threads from executing until it is safe to proceed.
Multi-version concurrency controlAnother way to arbitrate multiple threads is to make copies of the collection whenever you perform an update.
2 You maintain multiple
versions of the collection. When a thread needs the collection, it grabs the latest copy. When it wants to
remove
or
add
elements, it writes a new copy without disturbing an existing copy.
This works really well in that threads that only want to read are never blocked. They always run at full speed, even if another thread is in the middle of an update. Hand-waving over how you figure out the whole “latest copy” thing, this scheme doesn’t work so well for threads that write.
The problem is one of
serialization: this word means, if some set of operations takes place on the collection, the result must be the same as if the operations were conducted one at a time on the collection. There is no guarantee of the
order of the operations, just that the result is the same as if they had been carried out in some order.
Let’s use an example. Say our collection is a Map. It starts empty:
{ }
Operation
A
adds an element:
{...a: "A"...}
As does operation
B
:
{...b: "B"...}
And operation
C
:
{...c: "C"...}
If we start with an empty hash and perform all three operations, the result should be
{ a: "A", b: "B", c: "C" }
, exactly the same result as if each operation were performed serially, one after the other. But what happens if each operation is performed by a thread that grabs the initial copy,
{}
and writes its result back to the collection? Something called a
race condition: the result will be
{ a: "A" }
,
{ b: "B" }
, or
{ c: "C" }
, with the “winner” being the last one to write its result.
We can fix this problem in a couple of ways. One way is to keep the versions so that reading threads work at full speed, but use mutexes for write locks so that only one thread can write at a time. That’s simple, and if you can figure out a cheap way to make copies, works pretty well.
That’s your pessimistic protocol again (threads that write have to wait on other threads that write), but it’s a huge win for threads that read.
Making this work is tricky. Copying the entire thing is expensive, so you need to do clever tricks where you only copy the things that change and share the things that don’t change. And you can get a big, big win if you can avoid write conflicts by arbitrating conflict at a finer grain. For example, a
HashMap
uses a set of linked lists. If two different threads write to different “buckets,” you can merge their results rather than rolling one back and starting again.
One of the best books ever written on the subject of implementing fault tolerant concurrency (either on a single system or a distributed network) is
Concurrency Control and Recovery in Database Systems.
Don’t be fooled by the word “database”—the techniques described are just as useful for implementing distributed algorithms like MapReduce, concurrent data structures like high-performance collections, or for building multi-threaded communication systems based on queues.
Like all classics, it’s also available
online for free.
There is a lot of extra overhead for this if a thread wants to write while another thread is reading a version, so it is only a big win if writes are fairly rare. Remember, one of the big wins is that reads
never wait on writes because they work with immutable versions of the collection.
Depending on how many threads you have, what kinds of operations are most likely, and other factors, this kind of system can be orders of magnitude faster than coarse-grained pessimistic locking.
Sometimes you want a slightly different protocol. The aforementioned is often called
single write, many reads. It requires threads that plan on writing to know in advance they need to write. But for something like a cache, you don’t know you need to write until you miss the cache. And then it’s too late to get a write lock.
Optimism and many writes, many readsThe easiest way to avoid having to pre-declare whether a thread is a reader or a writer is by letting all of the threads do as they please. They all get the latest version and chug happily along.
When they are finished, if they never executed an
add
or
remove
we let go of the copy of the collection and we’re done. If a thread wants to write, it makes a copy as above and writes to its copy. But it doesn’t have to grab a lock while it is writing, so writes don’t wait on other writes.
Now, if a thread
has executed a write (an
add
or
remove
), when it is done we check the result to see if it violates serializability.
For example, we can number our versions. Let’s say that
{}
is version
0
. The first thread to finish, let’s say it’s the thread performing operation
B
, creates its result:
{ b: "B" }
. Now it checks the collection to see if anyone has updated it since
B
read the collection. The collection is still at version
0
, so
B
can write its result.
B
writes
{ b: "B" }
to the collection and calls it version
1
.
Next
A
finishes and notices that the collection is at version
1
. This is a problem, because
A
is working with an updated version
0
, so it has to throw out its work, fetch version
1
, and try again. This is exactly the same thing as using a source code control system like
Subversion to resolve conflicts.
This many reads, many writes strategy is called an
optimistic protocol because you do work in the hope that nothing will cause you to throw it out and try again. It’s a big win if writes do happen at the same time, but they rarely conflict.
For example, if you have a good strategy for
merging writes, this is huge.
So what?Well, it would be great if you didn’t have to reinvent the wheel and have to work out all of the implications when you want to make a really fast shared collection in a multi-threaded environment. What you want is someone to share a wealth of experience about how to make really fast copies of things by only changing the little bits that you change instead of the whole thing, and so forth.
I like languages which say, “No, you don't want to write it the way you’re thinking. There’s a vastly better way to solve this whole class of problems.” Me: brain explodes
Where do you go for that kind of information? How about to people who spend all day thinking about collections that cannot change because their programming languages are purely functional?
Yes, what I’ve just described is
exactly how languages like Haskell implement mutable collections like dictionaries and lists. In purely functional languages, collections never change. Adding something to a collection is really creating a new collection with an extra element. This is the exact same implementation that we need for building optimistically locked collections in a multi-threaded environment!
Haskell teaches us extremely useful techniques for writing Enterprise-scale software.
And more techniques: Hard-core concurrency considerations
- Now, you might be saying, “what a waste, this is like locking a table in a database when we should be locking rows.” But if you look at the database closely, it does lock the table when you perform certain actions like deleting a row. Or it does something more complicated, and now that you’ve read the entire post, you know what it really does.
- Making Copies on Writes is called copy on write semantics, or COW for short. Chew on that for a while.
Labels: java, lispy, popular