-
Notifications
You must be signed in to change notification settings - Fork 875
Home
What factors determine the connection count? (#brokers, #topics, #partitions, #client consumer instances, other?)
Refer to: https://github.com/edenhill/librdkafka/wiki/FAQ#number-of-broker-tcp-connections
The number of open connections is determined by the number of brokers. The client writes / reads data directly from the broker that is the leader for the partition of interest and commonly a client will require connections to all brokers.
The worst case number of connections held open by librdkafka is: cnt(bootstrap.servers) + cnt(brokers in Metadata response)
. The minimum number of connections held open is cnt(brokers in Metadata response)
. Currently, librdkafka holds connections open to all brokers whether or not they are needed. In the future, we plan to enhance librdkafka so that disused connections are not maintained.
Currently, we have N topics. We are creating a consumer instance in the application for each topic. Is that acceptable?
It's more efficient to use less clients:
- Each client maintains open connections to all brokers and internally creates 1+(# connections) threads. Non-voluntary context switches may start introducing significant overhead client-side for large numbers of clients.
- Each broker connection has a small server side cost. As a rough indication of the magnitude of this, in a recent benchmark we saw end-to-end latencies reduced by about half as number of (producer) connections was varied from ~200 to ~25000 on a 12 broker cluster, all else equal.
- There is a small fixed cost per broker request (client and server side) and using a single client allows librdkafka to combine all topics into a single request.
- There is additional memory overhead in using separate clients.
On the other hand, the API isn't set up for the subscription set being updated frequently. If you want to change the subscription set dynamically, you'll probably be better off with multiple consumers.