Strange Loop

2009 - 2023

/

St. Louis, MO

Immutable Data Science with Datomic, Spark and Kafka

We would like to present our company's approach to data science architecture, which is novel in its use of the Datomic database (in a microservices architecture) and the integration of Datomic with Spark. We leverage several unique properties of Datomic, Spark, and Kafka to achieve scalable real time analysis against production data without resorting to traditional ETL techniques:

This architecture is an alternative to the popular "lambda" and "kappa" architectures, and will be of primary interest to architects looking for innovation in modern technologies, engineers interested in Clojure, Datomic and Datalog, modelers hungry for data, and analysts making data-based decisions. In addition, we work with sensitive personal information that remains encrypted at rest, making our solution relevant for those interested in information security.

In summary, this solution has allowed us to avoid ETL / database synchronization pipelines while preserving scalability and isolation of transactional and analytical use cases.

Konrad Scorciapino

Konrad Scorciapino

Nubank

Konrad Scorciapino is a data architecture engineer at Nubank. It was love at first sight with parentheses and his beautiful wife. He organizes the "Clojure São Paulo" and "Machine Learning São Paulo" user groups and his 2015 year resolution is to read 60 books (40 to go!)

Mauro Lopes

Mauro Lopes

Nubank

Mauro is a data architecture engineer at Nubank. He holds a Master's degree in Graph Theory, plays badminton and has Erdős number 3.