Kafka Terminology
In this tutorial, we are going to discuss Apache Kafka Terminology. Basically, Kafka’s architecture contains a few key terms, like topics, producers, consumers, brokers, and many more. To understand Apache Kafka in detail, we must understand these key terms first.
Below is the list of most prominent Kafka terminologies which may help us to build a strong foundation of Kafka knowledge.
1. Kafka Broker
There are one or more servers available in the Apache Kafka cluster, basically, these servers (each) are what we call a broker. A Kafka server, a Kafka broker, and a Kafka node all refer to the same concept and are synonyms.
2. Kafka Topics
A topic is a category of messages in Kafka. The producers publish the messages into topics and the consumers read the messages from topics. Data is stored in topics. A topic is divided into one or more partitions. In addition, all Kafka messages are generally organized into Kafka topics.
3. Kafka Partitions
Kafka topics are divided into a number of partitions, which contains messages in an unchangeable sequence. Each message in a partition is assigned and identified by its unique offset. A topic can also have multiple partition logs like the click-topic has in the image to the right. This allows for multiple consumers to read from a topic in parallel.
4. Kafka Producers
Producers are the publisher of messages on one or more Kafka topics. Producers send data to Kafka brokers. Every time a producer publishes a message to a broker, the broker simply appends the message to the last segment file. Actually, the message will be appended to a partition. The producer can also send messages to a partition of their choice.
5. Kafka Consumers
Consumers read data from brokers. Consumers subscribe to one or more topics and consume published messages by pulling data from the brokers.
6. Kafka offset
The offset is a unique identifier of a record within a partition. It denotes the position of the consumer in the partition.
7. Kafka Consumer Group
A consumer group includes the set of consumer processes that are subscribing to a specific topic. Consumers can join a group called a consumer group. A consumer group includes the set of consumer processes that are subscribing to a specific topic. Each consumer in the group is assigned a set of partitions to consume from. They will receive messages from a different subset of the partitions in the topic. Kafka guarantees that a message is only read by a single consumer in the group.
Consumers pull messages from topic partitions. Different consumers can be responsible for different partitions. Kafka can support a large number of consumers and retain large amounts of data with very little overhead. By using consumer groups, consumers can be parallelized so that multiple consumers can read from multiple partitions on a topic, allowing a very high message processing throughput. The number of partitions impacts the maximum parallelism of consumers as you cannot have more consumers than partitions.
8. Kafka Log Anatomy
Another way to view a partition is as a log. A data source writes messages to the log and one or more consumers read from the log at the point in time they choose. In the diagram below a data source is writing to the log and consumers A and B are reading from the log at different offsets.
9. Kafka Message Ordering and Client Acknowledgments
In Kafka, the order of the messages delivered from a certain partition and messages received by the partition is the same.
10. Node in Kafka
In the Apache Kafka cluster, a node is a single computer.
11. Kafka Cluster
A group of computers that are acting together in order to achieve a common purpose is what we call a cluster. In Kafka also, it has the same meaning i.e. a group of computers, each having one instance of Kafka broker.
12. Kafka Replicas
Here, the word replica refers to a backup. That means a replica of a partition is a “backup” of a partition. Basically, we use replicas in order to prevent data loss, they never read or write data.
13. Kafka Message
In one line, Message in Kafka is a piece of information that travels from the producer to a consumer through Apache Kafka.
14. Kafka Leader
A node that is responsible for all reads and writes for the given partition is what we call a Kafka Leader. So, every partition consists of one server, which acts as a leader.
15. Follower in Kafka
Simply put, a node that follows leader instructions is what we call a follower. The basic usage of a follower is, that if any leader fails, any of these followers will automatically become the new leader. However, it plays as the normal consumer, which pulls messages and also updates its own data store.
16. Kafka Data Log
Messages are preserved through Kafka, especially for a considerable amount of time. That means consumers can read as per their convenience. Since Kafka is configured to keep messages for 24 hours but somehow the consumer is down for time greater than 24 hours, in that case, the consumer will lose messages. Still, it is possible to read that message from the last known offset, only if the downtime on part of the consumer is just 60 minutes.
17. Kafka Connector API
The API which permits building as well as running reusable consumers or producers that connects existing applications or data systems to Kafka topics, we use the Connector API.