© 2009-2023 Strange Loop | Privacy Policy
 
    Slack currently uses Elasticsearch as its primary centralized log search platform. At our scale of one petabyte of logs per week, we face three major issues with our cluster: due to large spikes in volume our logs tend to be delayed, limiting our real-time visibility into our systems; we often fail to ingest logs due to field conflicts; and at 50 clusters Elasticsearch is operationally complex.
We built KalDB, a new lucene based cloud-native log store to address the issues we experienced with Elasticsearch. This project prioritizes fresh logs over older during large spikes to maintain real-time visibility, and automatically handles field conflicts by employing a schema on read. The cloud-native aggregator/leaf/tailer architecture enables first-class support for Kubernetes, and employs techniques like S3 backed storage to reduce infrastructure cost and automate operations.
 
    Suman Karumuri is a Sr. Staff Software Engineer and the tech lead for Observability at Slack. Suman Karumuri is an expert in distributed tracing and was a tech lead of Zipkin and a co-author of OpenTracing standard, a Linux Foundation project via the CNCF. Previously, Suman Karumuri has spent several years building and operating petabyte scale log search, distributed tracing and metrics systems at Pinterest, Twitter and Amazon. In his spare time, he enjoys board games, hiking and playing with his kids.