Not a post I'd have wanted to make on Christmas day, but that's how it goes sometimes.
Lars Arge just passed away, on Dec 23. For those of us who've been following his battles with cancer, this might not come as a total shock, but there was always hope, and that's no longer an option.
It's hard to imagine this in 2020, but there was a time not that long ago (at least in my mind) when "big data" wasn't really a thing. Companies were acquiring lots of data, and "GIGA byte" was a thing, but there was no real appreciation of the computational challenge associated with big data.
A paper by Aggarwal and Vitter in 1998 made the first step towards changing that, introducing the external memory model as a way to think about computations when you have memory access that are cheap (in RAM) and expensive (on disk).
It's a diabolically simple model: all main memory access is free, and any disk access costs 1 unit (but you can get a block of data of size B for that one unit of access). It's not meant to be realistic, but like the best computational models, it's meant to isolate the key operations that are expensive so that we can study how algorithm design needs to change.
Lars was one of the foremost algorithm designers for this new world of external memory. His Ph.D thesis laid out ideas for how to build data structures that are external memory efficient, and his research over the next many decades, in true Tarjan/Hopcroft form, built the fundamental structures and concepts one would need to even think about efficient algorithm design, with many clever ideas around batching queries, processing data in main memory to prepare for queries, and streaming access to disk when appropriate.
Formal algorithmic models are often misunderstood. They look simplistic, miss many of the details that seem relevant in practice, and appear to encourage theoretical game playing divorced from reality. But a formal model at its best does its work invisibly. It shifts the way we think about a framework. It fosters the design of new paradigms for efficient algorithms, and it allows us to layer optimizations on that move a system from theory to practice without ever having to compromise the underlying design principles.
Lars was a force of nature in this area. I first remember meeting him in 1998 at AT&T Labs when I was interning and he was visiting there. He had boundless energy for this space, and seemingly wanted to turn everything into an external memory algorithm, whether it was geometry, data structures, or even the most basic algorithms like sorting. His intuition was the best kind of algorithmic intuition: build up the core primitives, and the rest would follow.
And this is exactly what happened. The field exploded. For a while, "big data algorithms" WERE external memory algorithms. There was no other way to even talk about big data. And that spawned even more models. Streaming algorithms were inspired by external memory and the realization that a one pass stream was an effective way to work with large data. Cache-oblivious algorithms asked about what would happen if we took the same two-part hierarchy with main memory and disk and extended it to the cache. Semi-external memory models asked how we might modify the base model for graph computations. The MapReduce framework from the early 2000s generalized the external memory model to handle newer kinds of streaming/memory-limited architectures, in turn to be followed by Spark and so many other models.
I'd go as far as to say this: all of the conceptual developments we see today in big data computations at some level can be traced back to work on external memory algorithms, and that was driven by Lars (and his collaborators).
It wasn't just the papers he wrote. Lars was a leader in shaping the field. Early in the 2000s he moved back from Duke University to Aarhus University, and from there started to build what would become one of the foremost institutes for thinking about big data, first as a BRICS center and then as the appropriately named MADALGO Institute.
Many of us who had anything to do with big data visited MADALGO at some point in our careers. I spent one of the best summers of my life being hosted by him during my sabbatical - my children still remember that summer we spent in Aarhus and wish we could go back each year. He instinctively knew that the best way to foster the area was to facilitate a generation of researchers who would bring their own ideas to Aarhus, mix and exchange them, and then go away and share them with the world.
And he wasn't merely content with that. He wanted to demonstrate the power of his perspective beyond just the realm of academia. He started a company SCALGO that applied the principles of external memory algorithms (and so much more) to help with modeling geospatial data. I remember distinctly him telling me the first time he demonstrated SCALGO products in a forum with other companies doing GIS work and how the performance of their system blew the other products out of the water. For someone (at the time) deeply embedded in the theory of computer science, I was astounded and encouraged by this validation of formal thinking.
Lars was a giant in our field (his email address was always large@..., and this worked more appropriately than one would ever dream of). But he was also a giant both in real life and in his personality. He was the warmest, most fun person to be around. He seemed almost ego-free, and often downplayed his own accomplishments, claiming that his main talent was hanging around with smarter people. He was extremely generous with his time and resources (which is why so many of us were able to visit Aarhus and benefit from being at MADALGO)
He was the life of any party -- I still remember when he hosted the Symposium on Computational Geometry in Denmark. It felt like we were at a post-battle Viking celebration (and yes he got up on a table and shouted "SKÅL" over and over again while an actual pig was roasting on a spit nearby). I remember him taking me to a Denmark-Sweden soccer game and warning me not to wear anything with blue on it. I remember us going for go-kart racing and his stream of trash talking.
Lars was the entire package: a great person, a great researcher, a visionary leader, and a canny entrepreneur. I will miss him greatly.