Kafka Theory
- Cluster
- Rack
- Broker
- Every broker in Kafka is a “bootstrap server” which knows about all brokers, topics and partitions (metadata) that means Kafka client (e.g. producer, consumer etc) only need to connect to one broker in order to connect to entire cluster.
- At all times, only one broker should be the controller, and one broker must always be the controller in the cluster
- Topic
- Kafka takes bytes as input without even loading them into memory (that’s called zero copy)
- Brokers have defaults for all the topic configuration parameters
- Partition
- Topic can have one or more partition.
- It is not possible to delete a partition of topic once created.
- Order is guaranteed within the partition and once data is written into partition, its immutable!
- If producer writes at 1 GB/sec and consumer consumes at 250MB/sec then requires 4 partition!
- Segment
Partitions are made of segments (.log files)
At a time only one segment is active in a partition
log.segment.bytes=1 GB
(default) Max size of a single segment in byteslog.segment.ms=1 week
(default) Time Kafka will wait before closing the segment if not fullSegment come with two indexes (files)
An offset to position index (.index file): Allows Kafka where to read to find a message
A timestamp to offset index (.timeindex file): Allows Kafka to find a message with a timestamp
log.cleanup.policy=delete
(Kafka default for all user topics) Delete data based on age of data (default is 1 week)log.cleanup.policy=compact
Delete based on keys of your messages. Will delete old duplicate keys after the active segment is committed. (Kafka default for topic __consumer_offsets)Log cleanup happen on partition segments. Smaller/more segments means the log cleanup will happen more often!
The cleaner checks for work every 15 seconds (log.cleaner.backoff.ms)
log.retention.hours= 1 week
(default) number of hours to keep data forlog.retention.bytes = -1
(infinite default) max size in bytes for each partitionOld segments will be deleted based on log.retention.hours or log.retention.bytes rule
The offset of message is immutable.
Deleted records can still be seen by consumers for a period of delete.retention.ms=24 hours (default)
- Offset
- Partition is having its own offset starting from 0.
- Topic Replication
- Replication factor = 3 and partition = 2 means there will be total 6 partition distributed across Kafka cluster. Each partition will be having 1 leader and 2 ISR (in-sync replica).
- Broker contains leader partition called leader of that partition and only leader can receive and serve data for partition.
- Replication factor can not be greater then number of broker in the Kafka cluster. If topic is having a replication factor of 3 then each partition will leave on 3 different brokers.
- Producer
- Automatically recover from errors: LEADER_NOT_AVAILABLE, NOT_LEADER_FOR_PARTITION, REBALANCE_IN_PROGRESS
- Non retriable errors: MESSAGE_TOO_LARGE
- When produce to a topic which doesn’t exist and auto.create.topic.enable=true then kafka creates the topic automatically with the broker/topic settings num.partition and default.replication.factor
- Producer Acknowledgment
acks=0
: Producer do not wait for ack (possible data loss)acks=1
: Producer wait for leader ack (limited data loss)acks=all
: Producer wait for leader+replica ack (no data loss)
acks=all must be used in conjunction with min.insync.replicas
(can be set at broker or topic level)min.insync.replica=2
implies that at least 2 brokers that are ISR(including leader) must acknowledge
e.g. replication.factor=3
, min.insync.replicas=2
, acks=all
can only tolerate 1 broker going down, otherwise the producer will receive an exception NOT_ENOUGH_REPLICAS on send
- Safe Producer Config
min.insync.replicas=2
(set at broker or topic level)retries=MAX_INT
: number of reties by producer in case of transient failure/exception. (default is 0)max.in.flight.per.connection number=5
: number of producer request can be made in parellel (default is 5)acks=all
enable.idempotence=true
: producer send producerId with each message to identify for duplicate msg at kafka end. When kafka receives duplicate message with same producerId which it already committed. It do not commit it again and send ack to producer (default is false)
- High Throughput Producer using compression and batching
compression.type=snappy
: value can be none(default), gzip, lz4, snappy. Compression is enabled at the producer level and doesn’t require any config change in broker or consumer Compression is more effective in case of bigger batch of messages being sent to kafkalinger.ms=20
:Number of millisecond a producer is willing to wait before sending a batch out. (default 0). Increase linger.ms value increase the chance of batching.batch.size=32KB or 64KB
: Maximum number of bytes that will be included in a batch (default 16KB). Any message bigger than the batch size will not be batched
- Message Key
- Producer can choose to send a key with message.
- If key= null, data is send in round robin
- If key is sent, then all message for that key will always go to same partition. This can be used to order the messages for a specific key since order is guaranteed in same partition.
- Adding a partition to the topic will loose the guarantee of same key go to same partition.
- Keys are hashed using “murmur2” algorithm by default.
- Consumer
- Per thread one consumer is the rule. Consumer must not be multi threaded.
records-lag-max
(monitoring metrics) The maximum lag in terms of number of records for any partition in this window. An increasing value over time is your best indication that the consumer group is not keeping up with the producers.
Consumer Group
Consumer Offset
- When consumer in a group has processed the data received from Kafka, it commits the offset in Kafka topic named _consumer_offset which is used when a consumer dies, it will be able to read back from where it left off.
- Delivery Semantics
At most once
: Offset are committed as soon as message batch is received. If the processing goes wrong, the message will be lost (it won’t be read again)At least once
(default): Offset are committed after the message is processed.If the processing goes wrong, the message will be read again. This can result in duplicate processing of message. Make sure your processing is idempotent. (i.e. re-processing the message won’t impact your systems). For most of the application, we use this and ensure processing are idempotent.- Exactly once: Can only be achieved for Kafka=>Kafka workflows using Kafka Streams API. For Kafka=>Sink workflows, use an idempotent consumer.
- Consumer Offset commit strategy
enable.auto.commit=true
& synchronous processing of batches**: with auto commit, offset will be committed automatically for you at regular interval (auto.commit.interval.ms=5000 by default) every time you call .poll(). If you don’t use synchronous processing, you will be in “at most once” behavior because offsets will be committed before your data is processed.- enable.auto.commit=false & manual commit of offsets (recommended)
- Consumer Offset reset behavior
auto.offset.reset=latest
: will read from the end of the logauto.offset.reset=earliest
: will read from the start of the logauto.offset.reset=none
: will throw exception of no offset is found- Consumer offset can be lost if hasn’t read new data in 7 days. This can be controlled by broker setting offset.retention.minutes
- Consumer Poll Behaviour
fetch.min.bytes = 1
(default): Control how much data you want to pull at least on each request. Help improving throughput and decreasing request number. At the cost of latency.max.poll.records = 500
(default): Controls how many records to receive per poll request. Increase if your messages are very small and have a lot of available RAM.max.partition.fetch.bytes = 1MB
(default): Maximum data returned by broker per partition. If you read from 100 partition, you will need a lot of memory (RAM)fetch.max.bytes = 50MB
(default): Maximum data returned for each fetch request (covers multiple partition). Consumer performs multiple fetches in parallel.
- Consumer Heartbeat Thread
- Heartbeat mechanism is used to detect if consumer application in dead.
session.timeout.ms=10 seconds
(default) If heartbeat is not sent in 10 second period, the consumer is considered dead. Set lower value to faster consumer rebalancesheartbeat.interval.ms=3 seconds
(default) Heartbeat is sent in every 3 seconds interval. Usually 1/3rd of session.timeout.ms
- Consumer Poll Thread
- Poll mechanism is also used to detect if consumer application is dead.
max.poll.interval.ms = 5 minute
(default) Max amount of time between two .poll() calls before declaring consumer dead. If processing of message batch takes more time in general in application then should increase the interval.
- Kafka Guarantees
- Messages are appended to a topic-partition in the order they are sent
- Consumer read the messages in the order stored in topic-partition
- With a replication factor of N, producers and consumers can tolerate upto N-1 brokers being down
- As long as number of partitions remains constant for a topic ( no new partition), the same key will always go to same partition
- Client Bi-Directional Compatibility
- an Older client (1.1) can talk to Newer broker (2.0)
- a Newer client (2.0) can talk to Older broker (1.1)
- Kafka Connect
- Source connect: Get data from common data source to Kafka
- Sink connect: Publish data from Kafka to common data source
- Zookeeper
ZooKeeper servers will be deployed on multiple nodes. This is called an ensemble. An ensemble is a set of 2n + 1 ZooKeeper servers where n is any number greater than 0. The odd number of servers allows ZooKeeper to perform majority elections for leadership.
At any given time, there can be up to n failed servers in an ensemble and the ZooKeeper cluster will keep quorum. If at any time, quorum is lost, the ZooKeeper cluster will go down.
In Zookeeper multi-node configuration, initLimit and syncLimit are used to govern how long following ZooKeeper servers can take to initialize with the current leader and how long they can be out of sync with the leader.
If tickTime=2000, initLimit=5 and syncLimit=2 then a follower can take (tickTimeinitLimit) = 10000ms to initialize and may be out of sync for up to (tickTimesyncLimit) = 4000ms
In Zookeeper multi-node configuration, The server.* properties set the ensemble membership. The format is server.=::. Some explanation:
myid is the server identification number. In this example, there are three servers, so each one will have a different myid with values 1, 2, and 3 respectively. The myid is set by creating a file named myid in the dataDir that contains a single integer in human readable ASCII text.
This value must match one of the myid values from the configuration file. If another ensemble member has already been started with a conflicting myid value, an error will be thrown upon startup.
leaderport is used by followers to connect to the active leader. This port should be open between all ZooKeeper ensemble members.
electionport is used to perform leader elections between ensemble members. This port should be open between all ZooKeeper ensemble members.
Kafka CLI
- Start a zookeeper at default port 2181
|
|
- Start a kafka server at default port 9092
|
|
- Create a kafka topic with name my-first-topic
|
|
- List all kafka topics
|
|
- Describe kafka topic my-first-topic
|
|
- Delete kafka topic my-first-topic
|
|
Note: This will have no impact if delete.topic.enable is not set to true
- Find out all the partitions without a leader
|
|
- Produce messages to Kafka topic my-first-topic
|
|
- Start Consuming messages from kafka topic my-first-topic
|
|
- Start Consuming messages in a consumer group from kafka topic my-first-topic
|
|
- List all consumer groups
|
|
- Describe consumer group
|
|
- Reset offset of consumer group to replay all messages
|
|
- Shift offsets by 2 (forward) as another strategy
|
|
- Shift offsets by 2 (backward) as another strategy
|
|
Default Ports
Zookeeper:
- Zookeeper: 2181
- Zookeeper Leader Port: 3888
- Zookeeper Election Port (Peer port): 2888
Kafka:
- Broker: 9092
- REST Proxy: 8082
- Schema Registry: 8081
- KSQL: 8088