Update 3: Anonymous points out that Jim Weirich has an inspired exposition of the spreadsheet implementation in Ruby. Very nice.
Python and Ruby are excellent languages. There are just a few small cracks in these languages where if you look carefully through them toward Smalltalk and Lisp, you can see how truly gifted, or maybe just lucky, the designers of those original dynamic languages were. These are just small cracks though, as Jim pointed out in Ruby and Ian Bicking mentioned about Python, for example, that delaying some of the operators with Formulas is difficult or impossible.
This is a lesson for designers to be as regular as possible. (If you are into the theory of regularity in design: pdf)
End of Update 3
I've wondered several times in conversations why the concept of a spreadsheet is not more easily a first-class feature of popular programming languages. There is a long history of constraint programming of various flavors that show up typically as an API in an imperative language or as the dominant style of a new language. (Mozart/Oz comes to mind as a language that emphasizes constraints but does not make them dominant.)
Kimbly points to an approximation in Python...
>>> ss = SpreadSheet()
>>> ss['a1'] = '5'
>>> ss['a2'] = 'a1*6'
>>> ss['a3'] = 'a2*7'
>>> ss['a3']
210
What bothers me about this of course (?) is the "extra-lingual" expressions embedded as text, e.g.
'a1*6'
. A first-class implementation would be done entirely in the language per se. After the text 'a1*6' is passed around, which functions assume it is just text and which assume it is a constrained expression? (We like objects.)
When I have seen a "spreadsheet" implemented in a language, invariably the objects are exposed through a user interface primarily, and programmatically through an awkward API at best. (With certain exceptions, in particular Lisp, which make programmatic integration essentially transparent, everything looks the same.)
In Smalltalk, maybe, using points rather than the alphanumeric column/row convention...
ss := Spreadsheet new.
ss cell: 1@1 value: 5.
ss cell: 1@2 value: (ss cell: 1@1) * 6.
ss cell: 1@3 value: (ss cell: 1@2) * 7.
(ss cell: 1@3) value
210
Spreadsheets should be useful without a user interface. Constraints should be a part of contemporary programming languages. Why aren't they?
Update: Matthias Ernst asks in a comment...
Don't you rather want:
ss cell: 1@2 value: [(ss cell: 1@1) * 6]
Add cell value caching and invalidation after change.
The use of a block here is intended
to "delay" the computation until a change to a dependency forces a revaluation.
The problem then is the programmer has to know where to put the blocks to delay computation and where to force the revaluations. Instead of blocks, though, my intention is to use some new object, a Formula
, say, and to make those objects implicit as much as possible.
So the message Spreadsheet>>cell: aPoint value: aFormula
would store the formula at the cell's point in the spreadsheet. If aFormula
is not a Formula
, then it is assumed to be a constant, and so the spreadsheet would wrap the constant in a subclass of Formula
, say ConstantFormula
. (Probably the thing to do is to send the argument the message Object>>asFormula
which would simply return itself if it is already a formula.)
The result of the message Spreadsheet>>cell: aPoint
is a Formula
(which may be a ConstantFormula
) and so to get its value, send it the message Formula>>value
. (e.g. the last expression in the example returns 210
.
But what about (ss cell: 1@1) * 6
? The expression (ss cell: 1@1)
results in the formula at the point 1@1
. The result of multiplying a formula by a constant is a new formula that does the obvious multiplication when evaluated. (And since multiplication is commutative and Smalltalk allows you to redefine "system" methods, numbers should understand (through double dispatching) that the result of 6 * (ss cell: 1@1)
is also a formula.)
*"is not a formula" does not mean your code should test for the class of the object. Rather a message, say Object>>isFormula
, should be sent to the object. the default implementation for all objects returns false
. The implementation of Formula>>isFormula
returns true
. Other implementations may return true as well if they are intended to behave as formulas rather than as constants in a spreadsheet.
**Double Dispatching: Yes, multiple argument dispatching in CLOS is better, and yes, Python has this in 2.4. Maybe something should go into Smalltalk, but the good news is you are free to try it then tell the rest of us. (Dan Ingalls, Ralph Johnson, Andrew Black)
Update 2: Gavin McGovern writes in a comment...
One of my favorite spreadsheets is Levoy's "Spreadsheets for Images",
Bit backwards: he essentially put Tcl (+ image stuff) in a spreadsheet. But a neat example nonetheless of a alternative user interface.
Nice. Thanks. This is kind of like
Kurt Piersol's spreadsheet in Smalltalk he presented at OOPSLA '86. He uses image processing as an example of operating on non-numeric data. (Unfortunately or not, he also extends the syntax for operating on spreadsheets. From the examples above, I am not yet convinced this is necessary.)