Nathan Marz

Storm: Twitter's scalable realtime computation system

Storm makes it easy to write and scale complex realtime computations on a cluster of computers, doing for realtime processing what Hadoop did for batch processing. Storm guarantees that every message will be processed. And it’s fast — you can process millions of messages per second with a small cluster. Best of all, you can write Storm topologies using any programming language.

Storm has a wide range of use cases. The basic use case is “stream processing”: processing a stream of new data and updating databases in realtime. Unlike the standard approach of doing stream processing with queues and workers, Storm is fault-tolerant and scalable.

Another use case is “continuous computation”: streaming the results of a query to clients to visualize in realtime. An example is streaming trending topics on Twitter into browsers.

A third use case is “distributed RPC”: computing an intense query on the fly in parallel. With distributed RPC, a Storm topology is a distributed function that you can invoke like a normal function.

In this talk, I’ll release Storm as open-source. I’ll show how Storm’s simple programming model makes realtime computation easy, robust, and even fun.

Nathan Marz

Twitter
@nathanmarz

Bio:

Nathan was the lead engineer at BackType which was acquired by Twitter in July of 2011. He primarily programs in Clojure and is the author of numerous open-source projects, most notably Cascalog, ElephantDB, and Storm. Nathan enjoys speaking and has spoken about his work at conferences such as Cloud Connect, the Hadoop Summit, POSSCON, Gluecon, and Strange Loop. He writes a blog at http://nathanmarz.com.