© 2020 Strange Loop
Commercial and open source database systems consist of millions of lines of highly optimized C code. Yet, their performance on individual queries falls 10x or 100x short of what a hand-written, specialized, implementation of the same query can achieve.
In a recent joint project at Oracle Labs and the DATA Lab at EPFL, we have set out to implement a database query engine in Scala. With just about 3000 lines of Scala code, our prototype supports the full TPCH benchmark suite and runs queries several times as fast as highly-tuned commercial systems (> 10x peak speedup).
This talk will focus on the key design principle that sets the system apart from other DB engines: where other systems interpret query plans, operator by operator, we generate and compile low-level C code for whole queries using the LMS (Lightweight Modular Staging) framework.
In particular, the talk will discuss powerful generative programming patterns such as mixed-stage data structures that contains both static and dynamic parts (e.g. static schema and dynamic values for data records) and staged interpreters which can be mechanically turned into compilers (e.g. for SQL queries or regular expressions).
These generative programming techniques provide a high degree of abstraction without performance penalty, and make Scala and LMS a more productive alternative to C for systems-level programming where the last drop of performance matters.
Tiark Rompf is a researcher at Oracle Labs. His work focuses on runtime code generation, advanced compiler technology, and associated language support. From 2008 to 2014 he was a member of Martin Odersky's Scala team at EPFL where he developed the LMS compiler framework and made various contributions to the Scala language and toolchain (delimited continuations, efficient immutable data structures, compiler speedups, type system work). He is a regular speaker at conferences.