You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/consumer-tuning.md
+36Lines changed: 36 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -90,3 +90,39 @@ On older zio-kafka versions `withMaxPollInterval` is not available. Use the foll
90
90
⚠️In zio-kafka versions 2.2 up to 2.5.0 it may also be necessary to increase the `runloopTimeout` setting.
91
91
When no stream is processing data for this amount of time (while new data is available), the consumer will halt with a
92
92
failure. In zio-kafka 2.5.0 `runloopTimeout` defaults to 4 minutes, a little bit lower than `max.poll.interval.ms`.
93
+
94
+
## Using metrics to tune the consumer
95
+
96
+
Zio-Kafka exposes [metrics](metrics.md) that can be used to further tune the consumer. To interpret these metrics you need to know how zio-kafka works internally.
97
+
98
+

99
+
100
+
The runloop is at the heart of every zio-kafka consumer.
101
+
It creates a zstream for each partition, eventually this is the zstream your applications consumes from.
102
+
When the zstream starts, and every time the records queue is empty, it sends a request for data to the runloop.
103
+
The request causes the runloop to resume the partition so that the next poll may receive records.
104
+
Any received records are put in the records queue.
105
+
When the queue reaches a certain size (as determined by the configured `FetchStrategy`), the partition is paused.
106
+
Meanwhile, the zstream reads from the queue and emits the records to your application.
107
+
108
+
An optimally configured consumer has the following properties:
109
+
110
+
- the zstreams never have to wait for new records (to get high throughput),
111
+
- most of the time, the record queues are empty (to get low latency and low heap usage).
112
+
113
+
The following strategy can help you get to this state:
114
+
115
+
1. First make sure that `pollTimeout` and `max.poll.records` make sense for the latency and throughput requirements
116
+
of your application.
117
+
2. Configure `partitionPreFetchBufferLimit` to `0`.
118
+
3. Observe metric `ziokafka_consumer_queue_polls` which gives the number of polls during which records are idling in
119
+
the queue.
120
+
4. Increase `partitionPreFetchBufferLimit` in steps until most measurements of the `ziokafka_consumer_queue_polls`
121
+
histogram are in the `0` bucket .
122
+
123
+
During this process, it is useful to observe metric `ziokafka_consumer_queue_size` (number of records in the queues) to
124
+
see if the queues are indeed increasing in size.
125
+
126
+
When many (hundreds of) partitions need to be consumed, the metric `ziokafka_consumer_all_queue_size` should also be
127
+
observed as increasing `partitionPreFetchBufferLimit` can lead to high heap usage. (See 'High number of partitions'
0 commit comments