Scala has a nice functional
Queue
implementation as part of
scala.collection.immutable
. It implements the standard technique of using a pair of lists
[L, R]
, where
L
is the first portion of the queue and
R
is the rear portion
in reverse order. As a result, the head of
L
is the first element of the queue and head of
R
points to the last element of the queue. Hence both can be fetched in
O(1) time. Items are enqueued from the rear end, i.e. a
O(1) insert in
R
, while they are dequeued from the front. Again a (supposedly)
O(1) removal from
L
.
A nice trick indeed .. Here is the snippet from the Scala library ..
class Queue[+A](elem: A*) extends Seq[A] {
protected val in: List[A] = Nil
protected val out: List[A] = elem.elements.toList
def enqueue[B >: A](elem: B) = mkQueue(elem :: in, out)
def dequeue: (A, Queue[A]) = {
val (newOut, newIn) =
if (out.isEmpty) (in.reverse, Nil)
else (out, in)
if (newOut.isEmpty) throw new NoSuchElementException("queue empty")
else (newOut.head, mkQueue(newIn, newOut.tail))
}
}
Now have a close look at
dequeue
. When we try to do a
dequeue
, with
out
being empty,
in
needs to be reversed, a definite
O(n) operation. But, for a list of length
n
, this occurs only when there has been
n
insertions since last reversal. Hence
dequeue
still remains an
amortized O(1) operation. The point is whether you can afford an occasional
O(n) ..
In his
paper "Simple and Efficient Purely Functional Queues and Deques", Okasaki discusses a technique that uses lazy lists and incremental computation to improve the above worst case complexity of
dequeue
to
O(log n). In fact, later in the paper, he also implements an improvement of this version using full memoization and lazy languages to implement a worst case
O(1) dequeue
. Okasaki has written a beautiful
thesis on functional data structures and their analyses techniques, which later has been published as a
book. A highly recommended piece for the interested ones ..
Being LazyThe main trick of Okasaki's
O(log n) dequeue
algorithm is to make
L
a lazy list. Instead of doing the above
reverse
in a single step as a strict operation, we can have
[L, R]
incrementally changed to
[L ++ reverse(R), []]
, so that when the actual need for the reverse comes, we already have
R
reversed and appended to
L
. The meat lies in the reverse operation being done incrementally and the ++ operation being done lazily. Okasaki's paper describes the algorithm and it's analysis in great detail.
Using Scala StreamsScala offers lazy lists in the form of
Stream[A]
, where
Stream.cons
is defined as ..
object Stream {
object cons {
def apply[A](hd: A, tl: => Stream[A]) = new Stream[A] {
}
}
}
and offers a lazy concatenation ..
How about using
Stream[A]
for the list
L
and try out Okasaki's algorithm to get a faster functional persistent
Queue
in Scala ? Name it
OQueue[+A]
..
class OQueue[+A] {
protected val front: Stream[A] = Stream.empty
protected val sizef: Int = 0
protected val sizer: Int = 0
protected val rear: List[A] = Nil
}
Okasaki calls the incremental adjustment of
[L, R]
operation as a
rotate
. Here is the implementation of
rotate
in
OQueue
, which gets called during construction of the
OQueue
. Note that
OQueue
(and Scala's
Queue
), being functional, is also persistent - hence every mutating operation returns a new version of the data structure. This is unlike imperative data structures that implement destructive updates and maintain only the latest version. Hence
rotate
gets called as part of
OQueue
construction. However,
rotate
is not called for
every call of the constructor - it is called only when |R| = |L| + 1 and we need to start the process of incremental reversal. And the trick is to start the process of reversing
R
just at the moment
R
gets longer than
L
and interleave the append to
L
with reverse of
R
. The details are too gory for a blog post and can be found with all it's complexities in the original paper.
class OQueue[+A] {
protected def rotate[A](xs: Stream[A], ys: List[A], rys: Stream[A]): Stream[A] = ys match {
case y :: ys1 =>
if (xs isEmpty) Stream.cons(y, rys)
else
Stream.cons(xs.head, rotate(xs.tail, ys.tail, Stream.cons(y, rys)))
case Nil =>
}
}
and now the constructor / factory method for constructing an
OQueue[A]
..
class OQueue[+A] {
protected def makeQ[A](f: Stream[A], sf: Int, r: List[A], sr: Int): OQueue[A] = {
if (sr <= sf) {
new OQueue[A]() {
override protected val front = f
override protected val sizef = sf
override protected val rear = r
override protected val sizer = sr
}
} else {
new OQueue[A]() {
override protected val front = rotate(f, r, Stream.empty)
override protected val sizef = sf + sr
override protected val rear = Nil
override protected val sizer = 0
}
}
}
}
The other methods are fairly trivial and nicely fall in place ..
class OQueue[+A] {
def isEmpty: Boolean = sizef == 0
def enqueue[B >: A](elem: B) = {
makeQ(front, sizef, elem :: rear, sizer + 1)
}
def dequeue: (A, OQueue[A]) = {
(front.head, makeQ(front.tail, sizef - 1, rear, sizer))
}
def elements: Iterator[A] = (front.force ::: rear.reverse).elements
def length = (sizef + sizer)
}
Fast ?Analyzing lazy, persistent functional data structures is hard and so is benchmarking them. I did not do an elaborate benchmarking of
OQueue
with
scala.collection.immutable.Queue
. But from whatever I did, I could not get any speedup over the Scala library implementation. Scala, unlike Haskell, is not a lazy language. And lazy languages offer memoization of function arguments i.e. arguments are evaluated only when they are needed and once evaluated, they are cached for future reuse. In the above implementation of
OQueue
, we need to evaluate
front.tail
a number of times - an effectively lazy language can make this operation
O(1) through memoization. Also, Scala, being implemented on top of the JVM, is not the best platform for heavy recursion and the above implementation of
rotate
is also not tail recursive, that Scala could optimize. Okasaki implemented the algorithm in SML, a lazy functional language and possibly got effective speedups over the traditional paired list implementation of queues.
I am not a Scala expert, any suggestions or critiques regarding the above implementation will be most welcome. Of course, I plan to do some more micro-benchmarking. But I think there may be dominant constants that offset the asymptotic complexity figures of the above analysis. Besides, the above implementation does not take any advantage of memoization, which is yet another secret sauce behind speedups of functional implementations. Anyway, it was, I feel, a worthwhile exercise trying my hands on implementing a functional data structure.
PostscriptI just discovered that Rich Hickey addresses this problem in a different way in
Clojure. He has used the traditional paired list implementation, but without
R
storing the rear end in a reversed manner. Instead he uses a
PersistentVector
as the rear - and his
PersistentVector
is based upon a trie based implementation that gives near constant time access to elements.