delivery callbacks for batch production #88

chessai · 2019-03-15T15:31:30Z

when producing a message batch, there's no way to have a delivery callback that operates on the entire batch. currently callbacks only work on a single DeliveryReport, which only assumes single-message. Perhaps I'd like to know about the individual messages, but perhaps I'd also like the callback to tell me what the size of the batch was, or maybe even more sophisticated statistics? i can do this before i send it, but most of the time i don't care what these statistics are, and i would rather only see them upon some kind of delivery failure.

The text was updated successfully, but these errors were encountered:

AlexeyRaga · 2019-03-16T07:56:18Z

Not that I know it for sure, but if I remember correctly there is no such thing as a batch callback in librdkafka.

We can set (and we do) this one:

void rd_kafka_conf_set_dr_cb(rd_kafka_conf_t *conf,
			      void (*dr_cb) (rd_kafka_t *rk,
					     void *payload, size_t len,
					     rd_kafka_resp_err_t err,
					     void *opaque, void *msg_opaque));

which gives us an info about individual messages, but at this time it doesn't have any information about original batches.

This is probably due to async communication between librdkafka and the broker: librdkafka has its own internal buffer, and a retry policy etc.
Even if we use rd_kafka_produce_batch, I believe that internally it still can treat messages independently (because they can go to different partitions), so there is no such a thing as "send a batch - get one callback" on a librdkafka level.
At least to my knowledge.

Also, we have a brief discussion about rd_kafka_produce_batch being limited to certain extent, and then I kind of gave up on it:

chessai · 2019-03-16T11:32:09Z

Ah, I see. That's a little inconvenient. I'm curious: why is batch production pointless without "extreme throughput"? What does @edenhill classify as "extreme"? That would be useful to know, because at work we may end up switching to single-message depending on our needs. Batch production was just slightly easier in our case.

AlexeyRaga · 2019-03-18T02:57:12Z

OK, here is my understanding.

I think it just means that librdkafka is so fast even with sending single messages so it is unlikely to be a bottleneck in real life. Single-message is only an API frontend, internally librdkafka is still batching requests to brokers. And sending a message is never a blocking operation: librdkafka just puts a message into its internal queue and releases control immediately.
That's why we need to watch for the delivery reports: Saying send in the client doesn't actually mean that the message has actually been sent over the wire and accepted by the Kafka Cluster.

As I said, librdkafka optimises for throughput and will internally batch messages and send to the brokers, so it doesn't make much difference if we enqueued messages individually.
One difference with batch-producing is that when you produce a batch and you specify the partition ID for that batch, it runs "faster" because it doesn't need to run partitioning algorithm on each of these messages (hashing message keys and potentially routing to different leaders) - the whole batch goes at once.
However, when you don't specify the partition, or make it randomly assigned, the partitioner will still run against every message in a batch.

There could be other micro-optimisations for batches. But the point is that when we do other stuff in our applications, it should hardly be the case that we are limited by the difference between single- and batch- producing.
I don't know the numbers, but I measured it once for one of my apps, and it indeed didn't make any difference at all. To be honest, I didn't expect it to be different since that app wouldn't saturate the upstream to Kafka, not even close (wouldn't produce that many messages).

So I decided that if I need to push many millions of messages as quick as possible, I'll re-visit batching and measure it again to see what batching really squeezes out of it. But I haven't been to that situation yet :)
And working with individual messages is more convenient so I never touched batch-producing since...

AlexeyRaga · 2019-03-18T03:01:41Z

@chessai if you have big throughputs, it would be interesting to see the difference between batching and single-messages. If you end up measuring it, please let me know about your results! :)

edenhill · 2019-03-18T08:00:06Z

@AlexeyRaga is completely right, the batch API is just a slightly more optimized enqueuing interface for some specific use-cases (large chunks of pre-partitioned messages for instance). For the common case (use partitioner) it is not any faster than the single-message API, and lacks support for new field types such as headers and timestamps (see rd_kafka_producev()).

chessai · 2019-03-18T10:48:31Z

@AlexeyRaga @edenhill awesome, thanks for the thorough explanations.

chessai closed this as completed Mar 18, 2019

pbrisbin mentioned this issue Jun 6, 2024

update hw-kafka-client freckle/freckle-app#172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

delivery callbacks for batch production #88

delivery callbacks for batch production #88

chessai commented Mar 15, 2019

AlexeyRaga commented Mar 16, 2019

chessai commented Mar 16, 2019

AlexeyRaga commented Mar 18, 2019

AlexeyRaga commented Mar 18, 2019 •

edited

Loading

edenhill commented Mar 18, 2019

chessai commented Mar 18, 2019

delivery callbacks for batch production #88

delivery callbacks for batch production #88

Comments

chessai commented Mar 15, 2019

AlexeyRaga commented Mar 16, 2019

chessai commented Mar 16, 2019

AlexeyRaga commented Mar 18, 2019

AlexeyRaga commented Mar 18, 2019 • edited Loading

edenhill commented Mar 18, 2019

chessai commented Mar 18, 2019

AlexeyRaga commented Mar 18, 2019 •

edited

Loading