JOST A MON

The idle ramblings of a Jack of some trades, Master of none

Welcome, folks, to the LXXVII edition of the Mathematics Blog Carnival. We have a wide-ranging litany of articles, although - despite our best efforts - not seventy-seven of them. Still, quite a few to whet an appetite or three.

According to custom, we must start with the oddities of the number. Instead, we'll just intersperse the facts amongst the various articles.

77 is a deficient number.

Sol Lederman presents Curve stitching with Mathematica posted at Playing With Mathematica.
Meanwhile, does anyone remember the NBC adventure series 'The Tales of the 77th Bengal Lancers'?
Ever heard of curves with infinite perimeter and zero area? Read That's Impossible! One Giant Nerdgasm at Consumed By Wanderlust.
And, of course, Jesus of Nazareth is supposed to be of the 77th generation from Adam.
What does the Great Pyramid tell us about ancient Egyptian mathematics? Dave Richeson reveals an interesting consequence in Division by Zero.
77 is evil.
Alexander Bogomolny presents Areas on the Graphs of Power Functions posted at CTK Insights.
77 is also vile!
At Travels in a Mathematical World,  Peter Rowlett presents a collection of podcasts and videos from the Math/Maths Week 2010, and from Young Researchers in Mathematics 2011.
77 is the number of digits of the 12th perfect number. Somewhat uncannily, 77 also is the number of integer partitions of the number 12.
Mike Croucher ponders whether graphical calculators have outlived their usefulness at Walking Randomly.
77 is the sum of three squares, 42 + 52 + 62, as well as the sum of the first eight prime numbers.
Speaking of calculators, did you know you could multiply on your fingers? I didn't, but Math and Multimedia reveals some tricks.
77 is the atomic number of the element iridium. Does anyone remember Motorola's ill-fated ventured of the same name that was supposed to revolutionise global telecommunications?
While the little folk do elementary mathematics on their fingers, the powerhouses of the discipline get their breakthroughs at the most peculiar places and moments in time. Dick Lipton lists some of them in Godel's Lost Letter and P = NP.
77 is the largest number that cannot be written as a sum of distinct numbers whose reciprocals sum to 1.
Pat Ballew highlights the quotation 'old mathematicians don't die, they just go off on a tangent', and illustrates nicely the properties of tangents to a cubic at Pat's Blog.
77 is not a sum of two squares - but it is a sum of 2 squares!
At Short Sharp Science, Catherine de Lange reveals how tattoos (unsightly at the best of times, heheh) become even unsightlier with age. Mathematicians have developed a model that describes the aging of tattoos. (Do you think the picture below looks like a tattoo of 77? No? Dash it.)
And IT History has a little piece on the beginnings of computer user groups - all the way back in 1952!


Speaking of beginnings, it's the centenary of IBM. Take a look at this celebratory post at Antipodes: Reflections from an Australian Expatriate in France?

"Wannabe professional gambler" Zac mixes up probability and ethical humanism in his post Gambling Theory at Zac Sky.

SquareCircleZ ponders what is the correct graph of arccot(x)?

Alex Bellos discovers that there are more to triangle centres than he had previously imagined (and revealed to us in his book Alex's Adventures in Numberland).

Roice has some clever Geodesic Saddles.

And Joe Manausa shows how Tallahassee residents need to wait till 2018 for their house prices to return to equilibrium in his case study. Long time to wait, eh?

And just so that we Anglospeakers don't feel too alone, we are pleased to reveal that the Spanish blogosphere has its own Carnival of Mathematics. The latest installment is by Juan Martínez-Tébar Giménez at Los Matemáticos no son gente seria, and it showcases entertaining pieces on, among a couple of dozen other things, Tartaglia and Cardano, the Nash conjecture, the decipherment of a wartime diary, and the centenary of the Royal Spanish Mathematical Society.

That's it for this month, people. Please do take a look at our sister carnival - Math Teachers at Play -  and also note that you can follow the Carnival of Math on Twitter: @Carnivalofmath. The next Carnival of Mathematics should come up around Jun 3, 2011. Please send in your submissions here.

What do you think of this sentence?
Ms. Ebadi is hardly afraid of jail, having spent time there, but she probably understands that what the West wants Muslim so-called moderates to say and to promote is merely a vision of a secular culture imported from the West, a vision that doesn't carry much weight with a people that is moving, albeit very slowly, to a democracy that is self-defined and that may not be recognizable to Westerners, accustomed to defining democracy as either liberal or not a democracy at all.
This is from Hooman Majd's rather good The Ayatollah Begs to Differ: The Paradox of Modern Iran.

Or how about this?
Although anonymity excuses the Persian from ta'arouf, and public speaking is the antithesis of anonymous behaviour, Ahmad Khatami and others who make such speeches are speaking on behalf of the nation (or the clerical establishment) and against another nation, and the collective 'we' makes them impersonal outbursts that some Iranian politicians today, with a sense of power that Shias haven't felt in centuries, believe appeal to the masses of their supporters who are more accustomed to being the downtrodden and oppressed majority of society than a people that can strike back against any injustice.
The book's full of sentences like this. Is it just me, or does Mr Majd expect a deep recursion stack in the brains of all his readers?

May 24, 2010

Encoding Man

At a party recently, I met an elderly gent called Bruce. An affable and well-spoken man, he happened to mention that he had worked in telecommunications and computer software from the 1950s onwards. In the US, he was employed by Marconi at a time when the first inter-exchange trunks were being established, and their signalling protocols were being designed. These served to connect local and regional telephone exchanges so that long-distance calls could be made without need for an operator. These days, telephone exchanges are connected with fibre-optic cables; microwave transmission is, I think, still common in less developed parts of the world. But Bruce worked at the time that predated even microwaves, he said. First VHF, then the 200MHz spectrum. Naturally I wanted to know all about it, and when he realised that I was a fellow telecomms type (at least in spirit these days), he was glad to talk. He travelled quite a bit, he said, all over the US, and, later, across Europe.

"Then I worked with the very first mainframes," he said. He spent many years with ICL, that British computing behemoth that fell apart, and was acquired by Fujitsu. Not that the union with Fujitsu was any more successful, of course. PCs were taking over the world, and even though many people at organisations such as Honeywell and Unisys and IBM and Fujitsu could see the writing on the wall, institutional inertia led to these mainframe manufacturers getting increasingly sidelined. But all that was in the future when in the 1960s he began working on COBOL.

COBOL! Who even mentions this programming language these days? The last I ever had anything to do with this was as part of a Programming Languages Lab at IISc, where the instructor was so bored and disinterested that I spent his classes playing tic-tac-toe with myself. That was in 1992. Short years later, enterprising Indian engineers were writing clever parsers and translators to convert COBOL to more modern programming languages in keeping with the large-scale effort on the part of mainframe owners to switch to PCs. They made more money than I can shake my finger at; then they parleyed all that experience in sorting out the Y2K bug for an increasingly panicked world.

Upon close questioning, Bruce revealed that he was instrumental in the internationalisation projects for COBOL. Essentially, the idea was to allow applications to provide seamless interfaces to users in various (human) languages without having to rewrite the underlying code. With all the expertise garnered from this effort, in later life Bruce became the editor for the first Unicode draft. He spent months, he said, going over every line of that 1000-page document. It aimed to standardise computer encodings for every written script on the planet. The ASCII codes for the Latin script could only provide 255 characters, and so in countries such as China where the script consists of a few thousand characters, they developed their own encodings. Without a streamlined system, Chinese computers would suddenly find it impossible to understand, say, a Russian one. I speak extremely loosely, of course. But with Unicode, you could have a uniform encoding for all possible languages - Amharic and Mandarin and even North American Cree. Is Klingon also Unicoded? No, but Unicode is flexible enough to allow it.

"You can thank me for my sleepless nights," said Bruce.

Others have depended on Bruce for his abilities. Even if he doesn't speak or read all the world's languages, he can identify most of them from their writing. At his local barber's, the proprietor, an Italian woman, once showed him a book left to her by her father. He recognised it to be written in the Devanagari script. He asked a colleague - a cellist - in his orchestra if she could read it. She, a Bengali Scotswoman, couldn't.

It turned out to be a Hindi book of poetry published in Gorakhpur. The Italian barber was keen to know if it was a rare book; if so, she might make some money selling it. He had to tell her, rather regretfully, that it was far from rare. The print run had been 120,000 copies.

Feb 26, 2009

Goldbach's Comet

As I'm slightly at loose end at the moment, I'm flexing my fingers with a bit of R coding. To those unaware of the wondrous possibilities offered by this programming language, I can only say it's good stuff. Its capabilities have been outlined here and there, most recently in the New York Times, and is the programming environment of choice for statisticians and epidemiologists and others.

I'm intrigued by Goldbach's Comet, which - contrary to your astronomical visions - has less to do with the cosmos than with mathematics. In 1742, Christian Goldbach conjectured in a letter to Leonhard Euler that every even number greater than 2 can be expressed as the sum of two primes. In common with lots of theorems in number theory, this is more easily stated than proved. In fact, the closest anyone has come to proving this assertion is Olivier Ramaré, who in 1995 showed that every even number is the sum of at most 6 primes.

We can introduce the Goldbach function (or partition) at this point. G(n) is defined as the number of ways the number n can be expressed as the sum of two primes. If n is odd, we force G(n) = 0. For even n , e.g., G(10) = 2 because 10 = 3 + 7 = 5 + 5.

If we now plot the even numbers against their Goldbach numbers, we get a graph that looks like a fine spray tapering to a point, as in the figure below. This is Goldbach's Comet.


Now, what's the easiest way to compute G(n)? The brute-force way, of course. A computer doesn't groan and creak at repetitive exercise, so code such as the following will do the job - for small numbers n.
library(gmp)
library(gtools)
goldbach <- function(n)
{
count <- 0
if (even(n)) {
for (i in seq(3, n / 2, by = 2)) {
if ((isprime(i) > 0) & (isprime(n - i) > 0)) {
count <- count + 1
}
}
}
count
}

Note that I'm not really doing any bounds checking here: the function will collapse for n < 6.

Now, the thing about R is that it's a vectorised language, and works rather well on arrays. It even provides a routine called Vectorize() to convert a function such as goldbach() that takes a single argument into one that takes an array. So if you wanted to compute goldbach() for 6, 8, ..., 3000, you don't have to write a loop to do it (although you could); instead, you'd just do the following
n <- 3000
x <- seq(6, n, by = 2)
y <- Vectorize(goldbach)(x)
And then you can plot the comet
plot(x = x, y = y, type = "p", col = "red", lwd = 2)
The limitations of the brute-force approach are quite evident even with this toy example. It takes my machine about 10.5 seconds to compute G(n) for even numbers up to 3000. For every argument n, it generates all the odd numbers x less than n / 2, and checks if each of x and n - x is prime. The total number of possible pairs tested for all even numbers up to n is, therefore, about the square of n. The primality test is another computational hog (isprime() does trial divisions up to a certain size of n, and probabilistic tests beyond that). So what quick improvements can we make?

For one thing, we can pre-generate the list of primes until n / 2, and then test if for every prime p in the list, (n - p) is also in the list. We immediately get rid of the multiple calls to isprime() in the original code. Here's the modified routine
goldbach2 <- function(n)
{
count <- 0
if (even(n)) {
x <- 1 : n
x <- x[isprime(x) > 0]
p <- x[x <= n / 2]
for (i in p) {
if ((n - i) %in% x) {
count <- count + 1
}
i <- x[i + 1]
}
}
ifelse(n == 6, 1, count)
}
This does slightly better: 9.2 seconds, an improvement of about 12%. The operation that takes the longest is %in%, which searches the array of primes to see if every (n - i) is in it. It's not efficient to call it separately for every candidate in (n - p). It's optimised for vector operations, and here's where R's superb vector handling comes to the fore. We can replace the entire for-loop with just a couple of lines of code, to get
goldbach3 <- function(n)
{
count <- 0
if (even(n)) {
x <- 1 : n
x <- x[isprime(x) > 0] # Generate all primes up to n
np <- x[x >= n / 2]
p <- x[x <= n / 2]
count <- sum((n - p) %in% np)
}
ifelse(n == 6, 1, count)
}
This does the same job in 3.3 seconds. Better, eh? Note here that (n - p) creates a vector of differences between the argument n and the list p of primes smaller than n / 2. The %in% operator is vectorised, as I said above, and it zips through the entire vector (n - p) to test for membership in the vector np. Wherever there's a match, R tags a TRUE (a boolean with numeric value 1) and everywhere else a FALSE (numerically 0). Summing the booleans gives us the count of primes that add up to n.

But if R is such a deliciously vectorisable language, why call the function goldbach3() for every even number? We could, instead, generate the entire sequence of Goldbach partitions within the function call itself, and do without using any loops at all, not even the artificial Vectorize() routine that we used on each of the previous tests. We will create the list of primes smaller than n as before. We then create all possible sums of pairs of these primes. The R function outer() enables us to do this rapidly. Of course, outer() creates a matrix, but we are only interested in the upper triangle of it (it is a symmetric matrix, obviously), so we extract that bit by means of the upperTriangle() routine. We can then bash it back into an array and remove all entries that exceed n. Finally, the R function hist() can be used to compute the number of times each entry in this array repeats itself, and that becomes the required series of Goldbach partitions
goldbach4 <- function(n)
{
xx <- 1 : n
xx <- xx[isprime(xx) > 0]
xx <- xx[-1]
z <- as.numeric(upperTriangle(outer(xx, xx, "+"),
diag = TRUE))
z <- z[z <= n]
hist(z, plot = FALSE,
breaks = seq(4, max(z), by = 2))$counts
}
This takes 0.01 seconds to run! It is an incredible speedup from an admittedly shoddy original code. Of course, I note that I'm now trading space (i.e. memory) for time: I'm creating a large matrix internally within the function. As n increases, I'll sooner or later hit the largest size that R can support. I tried running this bit of code for n = 100000, and it collapsed with the message that R cannot allocate space for a vector of size 390MB. But for n = 70000, the code runs in 7.4 seconds.

I'm sure experienced R users will be able to improve this code still further. Mathematicians active in the area have even faster algorithms for the computation of Goldbach's function. For further details check out the papers below.

Richstein, J. Verifying the Goldbach Conjecture upto 4 . 10 14, Mathematics of Computation, Vol 70, Number 236, pp 1745-1749.

Liang W., et al, Fractals in the statistics of Goldbach partition, ArXiv, March 2006.

  I attended the latest useR! conference this week, which lasted three days (Aug 12-14) at Dortmund. We've been using R at work for about a year or so now, and there's much to learn about this incredible statistical language and its ancillary 'packages', tools developed by academicians and practitioners in a variety of disciplines, all available free and online, supported by a vast cast of enthusiasts and gurus. So here I was, toting my little bag of goodies, and ambling from room to room to listen to some very interesting presentations across the R user spectrum.

The conference was held at the Statistics department of the Technical University of Dortmund. Only two weeks earlier, torrential rains had flooded the large auditorium of the department. Stains were still visible, but a massive cleanup before our arrival meant that everything appeared as it should - ready for four hundred visitors.

Unlike Curving Normality who blogged regularly and live from the various sessions he attended, I am doing this from back at home. I did take notes during the talks on a little pad that was given to us courtesy Google, but my handwriting these days is much worse than it used to be, and I doubt even a pharmacist would be able to make much of it. Still, here goes - summaries of three of the more interesting talks I attended.

The first talk I attended was on Loss Functions by a professor of actuarial studies, Vincent Goulet. Actuaries are interested in such things as ruin, the distribution of insurance claims, and the probabilistic properties of insurance payouts. To model these events, Goulet introduced an R package named 'actuar', in which he proposed several families of distributions, including those called censored. These, in particular, are interesting to insurance specialists. A client with a deductible is not going to make a claim if he thinks his damages cost less than the deductible to make good. In such a situation, the distribution of claims would be left-censored. Likewise, insurance payouts are usually capped by the size of the cover. Payouts, therefore, follow a right-censored distribution.

This is of relevance to us in finance as well, especially if we want to model the effects of management fees on a portfolio. Because irrespective of how well a manager performs, he always gets his management fee, the returns to a client from the manager are always that much less than the manager's overall performance. For the manager's revenue stream, on the other hand, there's an effective floor - and that can be modelled by a left-censored distribution.

An example of the multifarious uses of statistical tools came from Miriam Marusiakova, of the Charles University of Prague, who presented an R package 'forensic' to help in DNA fingerprinting. It is well-known that the DNA composition of any two human beings (other than identical twins) are distinct. Since it is unfeasible to compare DNA strands in their entirety from various possible sources,  forensic scientists have isolated certain markers that can serve as witnesses for distinction. Unfortunately, these markers themselves do not provide sufficiency of difference, and so a statistical analysis is required to determine how likely it is that the DNA found at a location came from one person or many.

Naturally, this is important! Let's say that a certain amount of DNA was found at a crime scene. There is a victim V and a suspect S. There are three possibilities: a portion of the DNA is known to be of the victim; there's only one type of DNA, suspected to be that of the offender; there's several sources of DNA found. The prosecution's hypothesis is that some of the DNA came from S. The defence's hypothesis is that the remaining DNA came from persons unknown U. How to determine which of the hypotheses is the correct one?

Miriam explained that external information needs to feed into the statistics. For instance, certain genetic factors are present to a greater or smaller degree in various populations. Not incorporating these factors into the statistics leads to overstating the case against the defendant. The classical probability law, named the Hardy-Weinberg law, is inapplicable in this case.

Another twist is that the offender and/or the victim, and the defendant are related! Usually, there is an assumption that the offender and the victim are independent. Miriam's package enables the analysis of all the above possibilities.

As a concrete example, she showed the widely different conclusions that could be drawn from the O.J. Simpson murder trial of the early 1990s in California. If the various factors and match probabilities were not estimated correctly, it was as easy to prove that the DNA found at the crime site was Simpson's as it was not. Lesson - estimate accurately and account for all possible variations.

The effervescent Janet Rosenbaum of Harvard University produced one of the most entertaining examples of research I've ever come across. She dealt with the notorious abstinence (virginity) pledge in the USA, and examined whether the sexual behaviour of teenagers who took the pledge was any different from those who didn't. (See here for some of her work, and this news report at the Washington Post.) I can do no better than quote from her very thorough abstract at the conference:

Objective: The US government spends over $200 million annually on abstinence-promotion programs, including virginity pledges, and measures abstinence program effectiveness as the proportion of participants who take a virginity pledge. Past research used non-robust regression methods. This paper examines whether adolescents who take virginity pledges are less sexually active than matched non-pledgers.

Previous researchers had compared the sexual behaviour of pledging teenagers against the general population of teenagers, and concluded that, indeed, the former were less likely to have had sex, and had a lower incidence of sexually-transmitted disease. For the US conservatives, this was brilliant news, meaning they could cut federal funding for contraception and women's sexual health, and provide instead abstinence coaching and use the numbers of pledgers as a metric of success.

But, of course, the comparison is not fair. The correct thing to do would be to match pledging teens with non-pledging teens who have similar backgrounds and ideologies. After all, the people who take the pledge are not average US teens. Many of them are from evangelical families, deeply religious, often born-again. When this matching is done, the results are quite clear:

Five years post-pledge, 84% of pledgers denied having ever pledged. Pledgers and matched non-pledgers did not differ in premarital sex, STDs, anal, and oral sex. Pledgers had 0.1 fewer past year partners, but the same number of lifetime sexual partners and age of first sex. Pledgers were 10 percentage-points less likely than matched non-pledgers to use condoms in the last year, and also less likely to use birth control in the past year and at last sex.

The behaviour of pledging and non-pledging teens is statistically identical! Worse, one to five years after having taken the pledge, 84% of those teens denied having pledged. Egregiously, many who had sex before taking the pledge declared themselves virgins shortly thereafter. To add insult to injury, pledgers were often more ignorant of contraception when they did succumb and have sex, and were thus less likely to protect themselves from disease or pregnancy before marriage.

Rosenbaum concluded that federal funds would be better spent in teaching effective birth and STD control than on abstinence measures.

Other interesting presentations:

  1. Tomoaki Nakatani, ccgarch: An R package for modelling multivariate GARCH with conditional correlations.
  2. Rory Winston, Real-Time Market Data Interfaces in R. (How to connect to Reuters from R)
  3. Susana Barbosa, ArDec: Autoregressive-based time series decomposition in R.
  4. Ray Brownrigg, Tricks and Traps for Young Players.
  5. Wei-Han Liu, A Closer Examination of Extreme Value Theory Modelling in Value-at-Risk Estimation.
  6. R. Ferstl, J. Hayden, Hedging Interest-Rate Risk with the Dynamic Nelson-Siegel Model.

Recently, I came across the slightly weird world of constrained writers. These are people who write essays and stories and poems under self-imposed restrictions. The univocalists write texts with only one vowel. Others pen compositions missing a given letter. Certain others concoct palindromic verse (or prose). Acrostics and anagrams appeal to a shadowy few.

It has been pointed out that much of classical literature has depended on arbitrary restraints. The sonnet is an example. Haiku is another. Under the aegis of Raymond Queneau and François de Lionnais on 24 November 1960 was founded Ouvroir de Littérature Potentielle, or Oulipo, to promote and focus on writing subjected to restrictions, and indeed, to imagine new constraints, and find old examples. The founding example of the oeuvre remains Queneau's 100,000,000,000,000 Poems, which is a set of ten sonnets with interposable lines so that any of the ten first lines can be followed by any of the ten second lines and so on, a combinatorial explosion that leads to the title of the work. At this particular instant, my brain is somewhat slower than usual, and I am unable to verify with nCr, if this is indeed correct.

A famous lipogram, i.e. a text that excludes at least one letter of the alphabet, is that by Georges Perec, titled La disparition. This avoided the letter 'e', and was translated into English by Gilbert Adair as 'A Void', similarly avoiding 'e'. Ian Monk, a writer of considerable panache in this paradigm, took exception to Adair's piece. In a rambling piece (scroll down a bit), he wrote:

Although I would not go so far as to concur fully with Vladimir Nabokov's vision of a "gray Clio of translation", [...], for a lipogrammatic translator, an ability to find a way of saying what his original says, without addition or omission, in so far as his idiom allows him so to do. Adair is witty, and a good wordsmith, but his translation totally fails to do this

He goes on to provide examples of added verbosity on Adair's part, and complains that A suspicion starts that Adair is simply showing off, without any thought for his original. He then concludes with a neat twist

Although working without that most common symbol of all, writing paragraphs full of original insight and/or blatant rubbish is child's play. My fumbling dispatch amply displays this fact, I think.

Under severe constraints, David Shulman came up with this sonnet in 1936:

Washington Crossing the Delaware

A hard, howling, tossing, water scene:
Strong tide was washing hero clean.
"How cold!" Weather stings as in anger.
O silent night shows war ace danger!

The cold waters swashing on in rage.
Redcoats warn slow his hint engage.
When general's star action wish'd "Go!"
He saw his ragged continentals row.

Ah, he stands -- sailor crew went going,
And so this general watches rowing.
He hastens -- Winter again grows cold;
A wet crew gain Hessian stronghold.

George can't lose war with 's hands in;
He's astern -- so, go alight, crew, and win!

Can anyone figure out what the constraint here is?

Where there's Oulipo, there's Ou-x-po, where x stands for various combinations of letters that will represent other genres. Oubapo - for example - for comics is well served by Art Spiegelman. Palindromic comics, anyone? Interestingly, literary critics have already begun to study the construction and ramifications of this. Check out this particularly apropos analysis - taken entirely out of context, I hasten to add - by Jan Baetens:

"creation and re-creation, i.e. constraints which can be considered generative (they produce new works) and constraints which can be considered transformational (they modify existing works)" (Groensteen 1997: 17). Of course, one could scrutinise here the very distinction between creation and re-creation, but this is not what I intend to do. What interests me is a critical analysis of the notion of "production" (or "generation")...

That famous acrostician and humorist, Charles Lutwidge Dodgson, came up with this piece:

A boat, beneath a sunny sky
Lingering onward dreamily
In an evening of July -

Children three that nestle near,
Eager eye and willing ear,
Pleased a simple tale to hear -

Long has paled that sunny sky:
Echoes fade and memories die:
Autumn frosts have slain July.

Still she haunts me, phantomwise,
Alice moving under skies
Never seen by waking eyes.

Children yet, the tale to hear,
Eager eye and willing ear,
Lovingly shall nestle near.

In a Wonderland they lie,
Dreaming as the days go by,
Dreaming as the summers die:

Ever drifting down the stream -
Lingering in the golden gleam -
Life, what is it but a dream?

Surely it's clear what this one is about?

In Classical Chinese (but not when pronounced in Modern Mandarin), the Lion Eating Poet in the Stone Den comprises 92 repetitions of the sound 'shi'. Of course, Chinese is a notoriously tonal language, so 'shi' sounds differently throughout this work:

《施氏食獅史》
石室詩士施氏,嗜獅,誓食十獅。
氏時時適市視獅。
十時,適十獅適市。
是時,適施氏適市。
氏視是十獅,恃矢勢,使是十獅逝世。
氏拾是十獅屍,適石室。
石室濕,氏使侍拭石室。
石室拭,氏始試食是十獅。
食時,始識是十獅,實十石獅屍。
試釋是事。

The advent of computing technology has broadened the choices available. Consider this. I am unable to find any minutes of this ACM/IEEE meeting on Constrained Poetry and Prose. Surely, though, this would have been a rollicking evening.

(Dammit, I lived barely two miles from Rusty Scupper at that time. If only I had known this was going on.)

A very clever bit of poetry, in the vein of Edgar Allan Poe's The Raven, is the following effort by Mike Keith, who organised that ACM/IEEE meeting I mentioned above. What's going on here?

Poe, E.
Near a Raven


Midnights so dreary, tired and weary.
Silently pondering volumes extolling all by-now obsolete lore.
During my rather long nap - the weirdest tap!
An ominous vibrating sound disturbing my chamber's antedoor.
"This", I whispered quietly, "I ignore".

Look at the number of letters in each word of the poem starting with its title. 3 1 4 1 5 9 2 6 ...? Ring a bell? What about the title of this post?

This is all reminiscent of the obfuscated C program which by suitable expansion of the hyphens could improve the approximation of pi. I am unable to get it to format it correctly here, so you might as well take a look at its own page. Hint: you need to compile under the old Kernighan & Richie standard, not ANSI.

Here's another mathematically inspired piece of writing. Here, the number of words in each sentence follows the Fibonacci sequence
Pen. Paper. Steady Hands. Write a word. Write another word after that. Then, write two words to make a sentence. Next, write a three word sentence, followed by a sentence of five words. Pretty soon you're finishing yr. eight word sentence, yr. thirteen word sentence, topping it off with a whopping twenty-one word sentence!

The next paragraph should be shorter than the last - you're going backwards, now. Thirteen, eight, five, three, two, one and one. It ends like it begins. With a word. You're finished. Done. Finito.
But, of course, this bit of comic repartee is way better.

Enjoy!

It's been years since I last read the Communications of the ACM (the Association for Computing Machinery, perhaps the top organisation for the field of Computing). In the days before the internet, leather-bound volumes of past issues of this journal were one source of copious and top-notch information on every field of computing research. By the time I left IISc, the world-wide web was becoming quite the phenomenon, and access to cutting-edge research that much easier online. I had switched to telecomms in my professional life, and found little of relevance in the CACM until the explosive spread of mobile telephony and data prompted a series of superb articles on wireless communications and computing.

That was in the mid-to-late 1990s. In the new century, my interest in telecomms began to wilt and, concomitantly, my reading of the CACM. I would still look at the Turing Award stories in the magazine, thrilled to recognise a name of a computing science God being acknowledged by his peers. But now, in 2008, as I say, it has been years since I last looked at the magazine.

This year is its 50th anniversary, in honour of which, the ACM has made a digital version of its January 2008 issue available on its website. In it, I found this
ODE TO CODE
(Stephen B. Jenkins)

Much have I travell'd in the realms of code,
And many goodly programs have I seen.
I've voyaged far to conferences umpteen,
Attending to the wisdom there bestowed.

Yet as I've moved along the winding road
Of my career (a journey not serene),
Only one source of knowledge has there been
Of worth enough to prompt of me an ode.

Communications has for 50 years,
Been there to help each of us on our way,
By giving us the things they had to say.
So, as the start of its sixth decade near
Please join me wishing it "Happy Birthday."

Oct 29, 2007

The Loss of Knowledge

The great libraries in the West, repositories of final reference and archives of everything published in their lands, are in deep trouble, and the cause is not money. As copyrighted materials explode in number, the likes of the Library of Congress, or the Bodleian at Oxford or the British Library, find themselves lacking - shelf space. The issues at hand are how to store the terabytes of information that is spewed our by the recording and publishing industries, and how to make these terabytes accessible to the general public.

The microfiche has become the storage and retrieval mode of choice. Miniaturise texts, store them in microdots, index the fiches, and read them in a specialised device. Efficient? Not entirely. The comfort of holding a book is no longer available, and the scrolling mechanism is ungainly at scanning and searching. The last time I tried it, I got motion-sickness.

Lovely capabilities such as Turning the Pages are far too expensive to stem the tide. They work superbly at disseminating knowledge, but are not scalable enough for the millions of miles of print that have been produced over human history.

Up-to-the-moment techniques exist: compact discs, data DVDs, optical storage. Cheaply available large hard drives can store entire shelves-worth of books and music in a small space. Still, there are problems. In the case of music, recording techniques have changed dramatically over the years. Ethnomusicology archives, in particular, suffer from the shift from analog to digital. As playback systems such as reel-to-reel players and wax cylinders face obsolescence, it becomes increasingly difficult to extract the content in those formats. Similarly, old style information storage devices become unreadable when companies manufacturing the readers go out of business. As a case in point, imagine the effort involved even in transferring data from the old five-and-a-quarter-inch floppies to three-and-a-half-inch disks. Leave alone, then, the complexity of moving from mutually incompatible storages.

The book is a consummate invention, having evolved over millennia to become the convenient and appropriate method of dissemination of text. Even in this technocrazy world, the sales of audiobooks and e-books and online books are far outstripped by paper and binding. While we mourn the visible loss of human knowledge owing to the wilful destruction of libraries by war or fanaticism, we fail to realise that the repositories of our culture and heritage face oblivion because of something even more insidious. Sadly, it is technology, which we treat as the solution to these ills, that proves to be a prime contributor to the extirpation of our memory.

References:

Lucien Polastron, Books on Fire: discusses the long story of the wilful and careless destruction of books, by thuggish politicians, unsavoury religious fervour, and ignorance. As the Economist points out, the author addresses, almost by way of a coda, a development that threatens books today, very much alive in our major libraries. This is the replacement of real books by sometimes unreadable copies on microfiche.

Bruce Felman's Bytes, Copyright, and Info-Survival points out that the accelerating transfer of information from paper to computer files may be a boon to academia, but it is creating questions about who owns what data. Then there's the issue of making sure none of it gets lost in the ether.

It was recently pointed out to me that I don't seem to have any postings about computers in this blog. I replied that, while my background is in computer science, I do not find its mundane representation (to wit, software engineering and hardware) particularly interesting. I had caused a (very) minor scandal among my seniors in IISc when I told them I was not keen on programming. This lack of enthusiasm did not affect my career much: I became a fairly efficient programmer. But I strove to get out of the field as soon as I could: the mercenary in me was determined to find another occupation that paid at least as well.

This doesn't mean, of course, that I am not fascinated by the possibilities offered by the latest developments in computer science, theory and practice, especially when it comes to the dissemination of knowledge across the world. Not too long ago, Google caused a rumpus when it began its much heralded digitisation of library books to make them available online. Before them, the public-domain enterprise of the Gutenberg Project started to provide free textual access to (and many images of) out-of-copyright books, documents and records to anybody with internet access. Likewise, a backlash against slow-to-publish, reluctant-to-cede-copyright, and very expensive academic journals has resulted in free repositories such as ArXiv, the SSRN, and the much-accessed DefaultRisk.com, providing immediate access to papers in such varied domains as physics, economics, mathematical finance, and history.

Gratifyingly, English is not the only language served on the Web. The remarkable Russian archive of classical literature, complete with annotations by researchers, provides a similar function for students of that culture.

However, these efforts pale in comparison to the British Library's online and incredibly sophisticated readers of rare books in their collection. The system is called Turning the Pages. The virtuosity of this display is exemplified by the rendering of Leonardo papers, as well as of selected writings by Jane Austen, the lovely Qu'ran of Baybars and Iyasu's Ethiopian Bible. The brilliance of the solution is that it is portable, machine-independent, and will work with equal facility in any language.

An example of the portability is the Royal Society's exhibition of Robert Hooke's notebooks. You can turn pages, magnify particular sections of interest, study Hooke's writings in their original form, even extract the text for later study.

The underlying driver is Shockwave, famous in programming circles for their superb design facilities for gaming and other user interfaces, incorporating animation and film, special effects and decoration. This is fairly CPU and bandwidth intensive, which is possibly its only drawback.

And who are the creators of this magnificent exemplar of programming? Why, our own London local Armadillo Systems. I guess it can no longer be said that the nation of shopkeepers lags behind in software prowess.

Isn't this just the bee's knees? The field of archaeology is beginning to tap into the very latest of computing technology. Geographical Information Systems. GPS. Databases. Pattern matching algorithms. Amazing, what?