1 year ago

#347484

test-img

Battle_Slug

Kafka streams changelog consumption rate drops during state rebuilding

I recently started working with Kafka, and I'm having hard time debugging the changelog consumption rate drop during the state rebuild.

TL;DR: The shape of the graph from Grafana showing the changelog lag after deleting the PVC and the pod and waiting for the pod to start running again looks like this, and this shape doesn't look to me like what I'd expect:

enter image description here

The graph indicates that the lag in the changelog topic is being consumed pretty fast from the beginning, but it slows down over time. The process is stretched over 30 minutes for a changelog of 14GB size. More information about the most recent config:

  • Provider: AWS
  • storageClass: io1
  • storageSize: 3TB
  • podMemory: 25GB
  • JVM memory: 16GB
  • UPD: 24 partitions, no data skew

RocksDB params:

  • writeBuffer: 2MB
  • blockSize: 32KB
  • max Write Buffer Number: 4
  • min Write Buffer Number To Merge: 2

The process I follow is just deleting PVCs and the pods and measure the time it takes for a pod to start running and the changelog topic's lag go back to 0.

Results of my tuning sessions:

  • increased the storage size from 750GB to 3TB, result: rebuilding state for 14GB topic changed from 68 mins to 50 mins, no change in the graph shape;
  • changed the storage class from gp2 to io1, result: rebuilding state for 14GB topic changed from 50 mins to 30 mins, no change in the graph shape;
  • changed RocksDB max Write Buffer Number from 2 to 4 and min Write Buffer Number To Merge from 1 to 2; result: no change in speed neither in the graph shape;
  • changed pod memory from 14GB to 25 GB and JVM memory from 9GB to 16GB, no change in speed neither in the graph shape.

Where else should I look? The situation looks to me like memory saturation, but garbage collection time stays under 5%, and increasing the memory didn't help even a bit. So where else should I look? Thank you!

apache-kafka

apache-kafka-streams

rocksdb

0 Answers

Your Answer

Accepted video resources