Strange Loop

2009 - 2023

/

St. Louis, MO

Querying Big Data Rapidly and Robustly with Cascalog

Cascalog is a tool for querying data on Hadoop with Clojure in a concise, expressive, and highly readable manner. Cascalog combines two cutting edge technologies in Clojure and Hadoop and resurrects an old one in Datalog. Cascalog is high performance, flexible, and robust.


Most query languages, like SQL, Pig, and Hive, are custom languages -- and this leads to huge amounts of accidental complexity. Constructing queries dynamically by doing string manipulation is haphazard and leads to further complexity such as SQL injection attacks. The nature of Cascalog being a domain specific language in Clojure avoids these accidental complexities and allows a programmer to manipulate queries as first-class entities within the language. The Datalog syntax of Cascalog is simpler and more expressive than SQL-based languages.


Besides being a valuable tool in itself, Cascalog is a demonstration of the power of the Clojure programming language. Building an integrated query language like Cascalog is just not possible in any other language.


This talk will include a live demo of Cascalog.

Nathan Marz

Nathan Marz