Non-strict memoization
I’ve written a few posts about functional memoization. In one of them, Luke Palmer commented that the memoization methods are correct only for strict functions, which I had not noticed before. In this note, I correct this flaw, extending correct memoization to non-strict functions as well. The semantic notion of least upper bound (which can be built of unambiguous choice) plays a crucial role.
Edits:
- 2010-07-13: Fixed the non-strict memoization example to use an argument of
undefined
(⊥) as intended. - 2010-07-23: Changed spelling from “nonstrict” to the much more popular “non-strict”.
- 2011-02-16: Fixed minor typo. (“constraint on result” → “constraint on the result type”)
What is memoization?
In purely functional programming, applying a function to equal arguments gives equal results. However, the second application is as costly as the first one. The idea of memoization, invented by Donald Michie in the 1960s, is to cache the results of applications and reuse those results in subsequent applications. Memoization is a handy technique to know, as it can dramatically reduces expense while making little impact on an algorithm’s simplicity.
Early implementations of memoization were imperative. Some sort of table (e.g., a hash table) is initialized as empty. Whenever the memoized function is applied, the argument is looked up in the table. If present, the corresponding result is returned. Otherwise, the original function is applied to the argument, and the result is stored in the table, keyed by the argument.
Functional memoization
Can memoization be implemented functionally (without assignment)? One might argue that it cannot, considering that we want the table structure to get filled in destructively, as the memoized function is sampled.
However, this argument is flawed (like many informal arguments of impossibility). Although we want a mutation to happen, we needn’t ask for one explicitly. Instead, we can exploit the mutation that happens inside the implementation of laziness.
For instance, consider memoizing a function of booleans:
memoBool :: (Bool -> b) -> (Bool -> b)
In this case, the “table” can simply be a pair, with one slot for the argument False
and one for True:
type BoolTable a = (a,a)
memoBool f = lookupBool (f False, f True)
lookupBool :: BoolTable b -> Bool -> b
lookupBool (f,_) False = f
lookupBool (_,t) True = t
For instance, consider this simple function and a memoized version:
f1 b = if b then 3 else 4
s1 = memoBool f1
The memo table will be (f False, f True)
, i.e., (4,3)
.
Checking that s1
is equivalent to f1
:
s1 False ≡ lookupBool (4,3) False ≡ 4 ≡ f1 False
s1 True ≡ lookupBool (4,3) True ≡ 3 ≡ f1 True
Other argument types have other table representations, and these table types can be defined systematically and elegantly.
Now, wait a minute! Building an entire table up-front doesn’t sound like the incremental algorithm Richie invented, especially considering that the domain type can be quite large and even infinite. However, in a lazy language, incremental construction of data structures is automatic and pervasive, and infinite data structures are bread & butter. So the computing and updating doesn’t have to be expressed imperatively.
While lazy construction can be helpful for pairs, it’s essential for infinite tables, as needed for domain types that are enornmously large (e.g., Int
), and even infinitely large (e.g., Integer
, or [Bool]
).
However, laziness brings to memoization not only a gift, but also a difficulty, namely the challenge of correctly memoizing non-strict functions, as we’ll see next.
A problem with memoizing non-strict functions
The confirmation above that s1 ≡ f1
has a mistake: it fails to consider a third possible choice of argument, namely ⊥.
Let’s check this case now:
s1 ⊥ ≡ lookupBool (4,3) ⊥ ≡ ⊥ ≡ f1 ⊥
The ⊥ case does not show up explicitly in the definition of lookupBool
, but is implied by the use of pattern-matching against True
and False
.
For the same reason (in the definition of if-then-else
), f1 ⊥ ≡ ⊥
, so indeed s1 ≡ f1
.
The key saving grace here is that f1
is already strict, so the strictness introduced by lookupBool
is harmless.
To see how memoization add strictness, consider a memoizing a non-strict function of booleans:
f2 b = 5
s2 = memoBool f2
The memo table will be (f False, f True)
, i.e., (5,5)
.
Checking that s2
is equivalent to f2
:
s2 False ≡ lookupBool (5,5) False ≡ 5 ≡ f2 False
s2 True ≡ lookupBool (5,5) True ≡ 5 ≡ f2 True
However,
s2 ⊥ ≡ lookupBool (5,5) ⊥ ≡ ⊥
The latter equality is due again to pattern matching against False
and True
in lookupBool
.
In contrast, f2 ⊥ ≡ 5
, so s2 ≢ f2
, so memoBool
does not correctly memoize.
Non-strict memoization
The bug in memoBool
comes from ignoring one of the possible boolean values.
In a lazy language, Bool
has three possible values, not two.
A simple solution then might be for the memo table to be a triple instead of a pair:
type BoolTable a = (a,a,a)
memoBool h = lookupBool (h ⊥, h False, h True)
Table lookup needs one additional case:
lookupBool :: BoolTable a -> Bool -> a
lookupBool (b,_,_) ⊥ = b
lookupBool (_,f,_) False = f
lookupBool (_,_,t) True = t
I hope you read my posts with a good deal of open-mindedness, but also with some skepticism.
This revised definition of lookupBool
is not legitimate Haskell code, and for a good reason.
If we could write and run this kind of code, we could solve the halting problem:
halts :: a -> Bool
halts ⊥ = False
halts _ = True
The problem here is not just that ⊥ is not a legitimate Haskell pattern, but more fundamentally that equality with ⊥ is non-computable.
The revised lookupBool
function and the halts
function violate a fundamental semantic property, namely monotonicity (of information content).
Monotonicity of a function h
means that
∀ a b. a ⊑ b ⟹ h a ⊑ h b
where “⊑” means has less (or equal) information content, as explained in Merging partial values.
In other words, if you tell f
more about an argument, it will tell you more about the result, where “more” (really more-or-equal) includes compatibility (no contradiction of previous knowledge).
The halts
function is nonmonotonic, since, for instance, ⊥ ⊑ 3
, and h ⊥ ≡ False
and h 3 ≡ True
, but False ⋢ True
.
(False
and True
are incompatible, i.e., they contradict each other.)
Similarly, the function lookupBool (5,3,4)
is nonmonotonic, which you can verify by applying it to ⊥ and to False
.
Although ⊥ ⊑ False
, h ⊥ ≡ 5
and h False ≡ 3
, but 5 ⋢ 3
.
Similarly, ⊥ ⊑ True
, h ⊥ ≡ 5
and h True ≡ 5
, but 5 ⋢ 4
.
So this particular memo table gets us into trouble (nonmonotonicity).
Are there other memo tables (b,f,t)
that lead to monotonic lookup?
Re-examining the breakdown shows us a necessary and sufficient condition, which is that b ⊑ f
and b ⊑ t
.
Look again at the particular use of lookupBool
in the definition of memoBool
above, and you’ll see that
b ≡ h ⊥
f ≡ h False
t ≡ h True
so the monotonicity condition becomes h ⊥ ⊑ h False
and h ⊥ ⊑ h True
.
This condition holds, thanks to the monotonicity of all computable functions h
.
So the triple-based lookupBool
can be semantically problematic outside of its motivating context, but never as used in memoBool
.
That is, the triple-based definition of memoBool
correctly specifies the (computable) meaning we want, but isn’t an implementation.
How might we correctly implement memoBool
?
In Lazier function definitions by merging partial values, I examined the standard Haskell style (inherited from predecessors) of definition by clauses, pointing out how that style is teasingly close to a declarative reading in which each clause is a true equation (possibly conditional). I transformed the standard style into a form with modular, declarative semantics.
Let’s try transforming lookupBool
into this modular form:
lookupBool :: BoolTable a -> Bool -> a
lookupBool (b,f,t) = (λ ⊥ → b) ⊔ (λ False → f) ⊔ (λ True → t)
We still have the problem with λ ⊥ → b
(nonmonotonicity), but it’s now isolated.
What if we broaden the domain from just ⊥ (for which we cannot dependably test) to all arguments, i.e., λ _ → b
(i.e., const b
)?
This latter function is the least one (in the information ordering) that is monotonic and contains all the information present in λ ⊥ → b
.
(Exercise: prove.)
Dissecting this function:
const b ≡ (λ _ → b) ≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ True → b)
So
const b ⊔ (λ False → f) ⊔ (λ True → t)
≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ True → b) ⊔ (λ False → f) ⊔ (λ True → t)
≡ (λ ⊥ → b) ⊔ (λ False → b) ⊔ (λ False → f) ⊔ (λ True → b) ⊔ (λ True → t)
≡ (λ ⊥ → b) ⊔ (λ False → (b ⊔ f)) ⊔ (λ True → (b ⊔ t))
≡ (λ ⊥ → b) ⊔ (λ False → f ) ⊔ (λ True → t )
under the condition that b ⊑ f
and b ⊑ t
, which does hold in the context of our use (again by monotonicity of the h
in memoBool
).
Therefore, in this context, we can replace the nonmonotonic λ ⊥ → b
with the monotonic const b
, while preserving the meaning of memoBool
.
Behind the dancing symbols in the proof above lies the insight that we can use the ⊥ case even for non-⊥ arguments, because the result will be subsumed by non-⊥ cases, thanks to the lubs (⊔).
The original two non-⊥ cases can be combined back into their more standard (less modular) Haskell form, and we can revert to our original strict table and lookup function. Our use of ⊔ requires the result type to be ⊔-able.
memoBool :: HasLub b => (Bool -> b) -> (Bool -> b)
type BoolTable a = (a,a)
memoBool h = const (h ⊥) ⊔ lookupBool (h False, h True)
lookupBool :: BoolTable b -> Bool -> b
lookupBool (f,_) False = f
lookupBool (_,t) True = t
So the differences between our original, too-strict memoBool
and this correct one are quite small: the HasLub
constraint and the “const (f ⊥) ⊔
“.
The HasLub
constraint on the result type warns us of a possible loss of generality.
Are there types for which we do not know how to ⊔?
Primitive types are flat, where ⊔ is equivalent to unamb
; and there are HasLub
instances for functions, sums, and products.
(See Merging partial values.)
HasLub
could be derived automatically for algebraic data types (labeled sums of products) and trivially for newtype
.
Perhaps abstract types need some extra thought.
Demo
First, import the lub package:
{-# LANGUAGE Rank2Types #-}
{-# OPTIONS -Wall #-}
import Data.Lub
And define a type of strict memoization. Borrowing from Luke Palmer‘s MemoCombinators package, define a type of strict memoizers:
type MemoStrict a = forall r. (a -> r) -> (a -> r)
Now a strict memoizer for Bool
, as above:
memoBoolStrict :: MemoStrict Bool
memoBoolStrict h = lookupBool (h False, h True)
where
lookupBool (f,_) False = f
lookupBool (_,t) True = t
Test out the strict memoizer. First on a strict function:
h1, s1 :: Bool -> Integer
h1 = b -> if b then 3 else 4
s1 = memoBoolStrict h1
A test run:
*Main> h1 True
3
*Main> s1 True
3
Next on a non-strict function:
h2, s2 :: Bool -> Integer
h2 = const 5
s2 = memoBoolStrict h2
And test:
*Main> h2 undefined
5
*Main> s2 undefined
*** Exception: Prelude.undefined
Now define a type of non-strict memoizers:
type Memo a = forall r. HasLub r => (a -> r) -> (a -> r)
And a non-strict Bool
memoizer:
memoBool :: Memo Bool
memoBool h = const (h undefined) `lub` memoBoolStrict h
Testing:
*Main> h2 undefined
5
*Main> n2 undefined
5
Success!
Beyond Bool
To determine how to generalize memoBool
to types other than Bool
, consider what properties of Bool
mattered in our development.
- We know how to strictly memoize over
Bool
(i.e., what shape to use for the memo table and how to fill it). Bool
is flat.
The first condition also holds (elegantly) for integral types, sums, products, and algebraic types.
The second condition is terribly restrictive and fails to hold for sums, products and most algebraic types (e.g., Maybe
and []
).
Consider a Haskell function h :: (a,b) -> c
.
An element of type (a,b)
is either ⊥
or (x,y)
, where x :: a
and y :: b
.
We can cover the ⊥ case as we did with Bool
, by ⊔-ing in const (h ⊥)
.
For the (x,y)
case, we can proceed just as in strict memoization, by uncurrying, memoizing the outer and inner functions (of a
and of b
respectively), and recurrying.
For details, see Elegant memoization with functional memo tries.
Similarly for sum types.
(A value of type Either a b
is either ⊥, or Left x
or Right y
, where x :: a
and y :: b
.)
And by following the treatment of products and sums, we can correctly memoize functions over any algebraic type.
Related work
Lazy Memo-functions
In 1985, John Hughes published a paper Lazy Memo-functions, in which he points out the laziness-harming property of standard memoization.
[…] In a language with lazy evaluation this problem is aggravated: since verifying that two data-structures are equal requires that each be completely evaluated, all memoised functions are completely strict. This means they cannot be applied to circular or infinite arguments, or to arguments which (for one reason or another) cannot yet be completely evaluated. Therefore memo-functions cannot be combined with the most powerful features of lazy languages.
John gives a laziness-friendlier alternative, which is to use the addresses rather than contents in the case of structured arguments. Since it does force evaluation on atomic arguments, I don’t think it preserves non-strictness. Moreover, it leads to redundant computation when structured arguments are equal but not pointer-equal.
Conclusion
Formulations of function memoization can be quite elegant and practical in a non-strict/lazy functional language. In such a setting, however, I cannot help but want to correctly handle all functions, including non-strict ones. This post gives a technique for doing so, making crucial use of the least upper bound (⊔) operator described in various other posts.
Despite the many words above, the modification to strict memoization is simple: for a function h
, given an argument x
, in addition to indexing a memo trie with x
, also evaluate h ⊥
, and merge the information obtained from these two attempts (conceptually run in parallel).
Indexing a memo trie forces evaluation of x
, which is a problem when h
is non-strict and x
evaluates to ⊥.
In exactly that case, however, h ⊥
is not ⊥, and so provides exactly the information we need.
Moreover, information-monotonicity of h
(a property of all computable functions) guarantees that h ⊥ ⊑ h x
, so the information being merged is compatible.
Note that this condition is even stronger than compatibility, so perhaps we could use a more restricted and more efficient alternative to the fully general least upper bound. The technique in Exact numeric integration also used this restricted form.
How does this method for correct, non-strict memoization work in practice? I guess the answer mainly depends on the efficiency and robustness of ⊔ (or of the restricted form mentioned just above). The current implementation could probably be improved considerably if brought into the runtime system (RTS) and implemented by an RTS expert (which I’m not).
Information ordering and ⊔ play a central role in the denotational semantics of programming languages.
Since first stumbling onto a use for ⊔
(initially in its flat form, unamb
), I’ve become very curious about how this operator might impact programming practice as well as theory.
My impression so far is that it is a powerful modularization tool, just as laziness is (as illustrated by John Hughes in Why Functional Programming Matters).
I’m looking for more examples, to further explore this impression.
Nikolay Orlyuk:
There may be a problem with memoization of nonstrict result.
14 July 2010, 3:57 amsnd (s3 (undefined))
conal:
Hi Nikolay,
Let’s give it a try:
Testing:
Looks good.
14 July 2010, 8:36 amSebastian Fischer:
Very interesting!
Your remarks under “Beyond Bool” are a little sketchy. I was missing a demo of memoizing a function like
and . take 10
onrepeat True
andFalse:undefined
.I guess it works as expected but it would be nice to be convinced by an example!
14 July 2010, 11:49 amJob Vranish:
Neat! Any idea what a memo table type for Int would look like? It seems like such a thing would have some pretty severe restrictions
14 July 2010, 12:25 pmconal:
Hi Job,
For integral types, even including
14 July 2010, 2:51 pmInteger
(unbounded), one can use a structure like a Patricia tree. See Chris Okasaki and Andy Gill’s paper, Fast Mergeable Integer Maps or theWord
instance ofHasTrie
in Elegant memoization with functional memo tries. For signed types (likeInt
andInteger
), one has to handle negative and non-negative keys. You can find the details in Luke Palmer’s data-inttrie package.conal:
Sebastian:
I was running out of energy, and the post was running long, and hence the sketchy section. I think I’ll release a new library soon, and at the least, I’ll mention it with your example as a comment here.
14 July 2010, 2:54 pmConal Elliott » Blog Archive » Memoizing higher-order functions:
[…] does nonstrict memoization fit in with higher-order memoization? var flattr_wp_ver = '0.9.5'; var flattr_uid = '17223'; var […]
22 July 2010, 11:43 amConal Elliott » Blog Archive » Details for non-strict memoization, part 1:
[…] Non-strict memoization, I sketched out a means of memoizing non-strict functions. I gave the essential insight but did not […]
10 September 2010, 3:37 pm