© 2020 Strange Loop
Very few people know that inside's Apache Kafka's binary protocol for publishing and retrieving messages hides another protocol - a generic, extensible protocol for managing work assignments between multiple instances of a client application.
When multiple Kafka consumers in the same consumer group subscribe to a set of topic partitions, Kafka knows how to assign a subset of topic partitions to each consumer and how to handle failover automatically. What is less known is that this assignment is determined by the consumer client itself and that the same protocol can be used by any application for both leader election and task assignment.
In this session we'll dive into the internals of this little-known assignment protocol -- the binary network protocol and the Java APIs. We'll look in detail at how Kafka Consumers, Connect and Streams API use this protocol for task management. And finally we'll show how you too can extend this protocol to implement task assignment in your application with an algorithm of your choice - even if it doesn't use Kafka for anything else.
Gwen is a principal data architect at Confluent helping customers achieve success with their Apache Kafka implementation. She has 15 years of experience working with code and customers to build scalable data architectures, integrating microservices, relational and big data technologies. She currently specializes in building real-time reliable data processing pipelines using Apache Kafka. Gwen is an author of "Kafka - the Definitive Guide", "Hadoop Application Architectures", and a frequent presenter at industry conferences. Gwen is also a committer on the Apache Kafka and Apache Sqoop projects. When Gwen isn't coding or building data pipelines, you can find her pedaling on her bike exploring the roads and trails of California, and beyond.