Workflow
Apache Kafka is the collection of topics which are separated into one or more partitions and partition is a sequence of messages, where index identifies each message (also we call an offset). At the end of a partition, the incoming messages are written, such that they are read by the consumers. By replicating the messages to different brokers, durability is maintained.
Kafka provides both pub-sub and queue based messaging system in a fast, reliable, persisted, fault-tolerance and zero downtime manner. In both cases, producers simply send the message to a topic and consumer can choose any one type of messaging system depending on their need.
Workflow of Pub-Sub Messaging
In Apache Kafka, the step wise workflow of the Pub-Sub Messaging is
- Kakfa Producers send message to a topic at regular intervals.Â
- Kafka broker stores all messages in the partitions configured for that particular topic. It ensures the messages are equally shared between partitions. If the producer sends two messages and there are two partitions, Kafka will store one message in the first partition and the second message in the second partition.
- Kafka Consumer subscribes to a specific topic.
- Once the consumer subscribes to a topic, Kafka offers the current offset of the topic to the consumer and save the offset in the Zookeeper ensemble.
- Also, the consumer will request the Kafka in a regular interval, for new messages (like 100 Ms).
- Kafka will forward the messages to the consumers as soon as received from producers.
- The consumer will receive the message and process it.
- Once the messages are processed, consumer will send an acknowledgement to the Kafka broker.
- Once Kafka receives an acknowledgement, it changes the offset to the new value and updates it in the Zookeeper. Since offsets are maintained in the Zookeeper, the consumer can read next message correctly even during server outrages.
- This above flow will repeat until the consumer stops the request.
- Consumer has the option to rewind/skip to the desired offset of a topic at any time and read all the subsequent messages.
Workflow of Kafka Queue Messaging/Consumer Group
In a queue messaging system instead of a single consumer, a group of consumers having the same Group ID will subscribe to a topic. In simple terms, consumers subscribing to a topic with same Group ID are considered as a single group and the messages are shared among them. This system’s workflow is
- Kakfa Producers send message to a topic at regular intervals.Â
- As similar to the earlier scenario, here also Kafka stores all messages in the partitions configured for that particular topic.
- A single consumer subscribes to a specific topic, assume Topic-01 with Group ID as Group-1.
- In the same way as Pub-Sub Messaging, Kafka interacts with the consumer until new consumer subscribes to the same topic, Topic-01 with the same Group ID as Group-1.
- As the new customers arrive, share mode starts in the operations and shares the data between two Kafka consumers. Moreover, until the number of Kafka consumers equals the number of partitions configured for that particular topic, the sharing repeats.
- Although, the new consumer in Kafka will not receive any further message, once the number of Kafka consumers exceeds the number of partitions. It happens until any one of the existing consumer unsubscribes. This scenario arises because in Kafka there is a condition that each Kafka consumer will have a minimum of one partition and if no partition remains blank, then new consumers will have to wait.Â
- In addition, we also call it Kafka Consumer Group. Hence, Apache Kafka will offer the best of both the systems in a very simple and efficient manner.
What is the role of ZooKeeper in Apache Kafka?
Apache Zookeeper serves as the coordination interface between the Kafka brokers and consumers. Also, we can say it is a distributed configuration and synchronization service. Basically, ZooKeeper cluster shares the information with the Kafka servers. Moreover, Kafka stores basic metadata information in ZooKeeper Kafka, such as topics, brokers, consumer offsets (queue readers) and so on.Â
Since all the critical information is stored in the Zookeeper and it normally replicates this data across its ensemble, failure of Kafka broker / Zookeeper does not affect the state of the Kafka cluster. Kafka will restore the state, once the Zookeeper restarts. This gives zero downtime for Kafka. The leader election between the Kafka broker is also done by using Zookeeper in the event of leader failure.Â