Code queries are useful for enforcing coding conventions, navigating a large code base, and for identifying locations to refactor. The program understanding community has long advocated the use of a relational database to facilitate such code queries, but relational queries over code have not found widespread use.
We argue this is due to a lack of scalability (it takes too long to evaluate queries), and a lack of a modern query language with all the tool support that entails (writing code queries in SQL can take many pages). In this talk we demonstrate a new system that goes some way towards alleviating both problems.
First, we demonstrate IQL, an object-oriented query language. It allows the concise expression of complex queries; due to its object-oriented features, it is easy to extend and modify existing queries. Another benefit of object-orientation is that it gives natural tool support, for instance for auto-completion.
The semantics of IQL is defined by translation to Datalog. Datalog is a logic programming language like Prolog, but it lacks data structures. As a consequence, all queries in Datalog are guaranteed to terminate, and they have a very simple semantics, enabling aggressive optimisations.
While Datalog has been extensively studied in the theoretical database community, it lacks a proper type system. We introduce a type system with subtyping to account for the class hierarchy of IQL. Type inference is achieved via an abstract interpretation that maps each relation between runtime values to a relation between binding sites.
We implement Datalog itself via a compiler to procedural SQL; a configuration file allows us one
target different relational database systems as the backend. The compiler performs many
optimisations, ranging from straightforward constant propagation to a form of the well-known
`magic sets' transformation. It also performs type specialisation, based on the abstract
interpretation in the type inference algorithm.