-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimpleProducer randomization of initial round robin ordering #139
Conversation
of partitions to prevent the first message from always being published to partition 0.
I'm having troubles getting the integration test running in order to verify this on my own; however, I suspect it's due to the fact the test assumes that the offset will have been committed against partition 0, except now it's been randomized. |
the initial round-robin ordering of partitions have been randomized
I've modified the integration tests to work with the random round-robin change. Please let me know if anything else is needed. |
bump.. any thoughts on this? |
I agree that starting on partition 0 every time is a problem in many circumstances. But I wonder if this would be better as a different class? Perhaps RandomProducer ? or RandomCycleProducer? |
Are there any use-cases where always starting with partition 0 is useful? |
easier for people who don't understand kafka yet to grok, that's about it. I'm not sure that the SimpleProducer should be expected to solve the even distribution problem. FWIW, at work we address this problem internally by using a custom producer class. |
Thanks for the response, I'm a bit puzzled as to what the SimpleProducer should be used for. If it's merely there as an example then perhaps it should be more appropriately named ExampleProducer as a precaution against using it in production. If the intended use-case is to consistently publish to the first batch of messages to the first partition, then adding comments on how it should be used would be helpful. If the issue is with the implementation of randomly shuffling the list of partitions, then perhaps we could just randomize the first initial starting partition. e.g. if partitions = [0,1,2,3], we could randomly pick to start on partition 2, then cycle 3,4,0,1... As an aside, I like the idea of a RandomProducer which could just randomly pick a partition each time: |
I think originally SimpleProducer was intended to mirror the SimpleConsumer in, well, simplicity. SimpleConsumer was a mirror of the similarly named class in the reference java client. But the java client has no SimpleProducer (partly because production is always partition-specific, not topic), so I think the api was sort of made up. Nonetheless, perhaps a suitable compromise would be to include an optional parameter like 'random_start' that, if set to True, would randomly select the first partition but then cycle as normal from there. |
What is the down side of having the initial partition selection be On Mon, Mar 31, 2014 at 7:54 PM, Dana Powers notifications@github.comwrote:
|
I'm up for adding a random_start boolean to the SimpleConsumer if that's what it'll take; however, I'm more inclined to agree with @wizzat in terms of the batteries-included sensible defaults (i.e. random_start=true) or better yet remove the option completely if there's no valid use-case for always starting with partition 0. I can code up a prototype and update this pull request. Cheers, |
…ion of the sorted list of partition rather than completely randomizing the initial ordering before round-robin cycling the partitions
…ation of the initial partition messages are published to
I've added in a random_start parameter which defaults to false for the time being. Please let me know if anything else is needed. |
bump... Any thoughts on this one? I think the current SimpleProducer suffers from a thundering herd style problem and would like to see it mitigated. If anyone else has a different implementation I'm completely fine with that. |
I think this should be merged as of commit a81be57 (the initial pass with full pseudorandom partition order and passing tests) |
Just so everyone is on the same page: any process which starts up and sends a single message to Kakfa will currently overload partition 0 to the exclusion of all other partitions. A common use case of this would be web servers/microframeworks and many command line utilities. The default of not randomizing the partitions by default is literally dangerous. |
+1 looks good to me. If there are no objections I will commit this later this week. |
bump...Any news on this? There's currently randomization used when picking a host to connect to: It's not that different from wanting to randomize which partition is initially selected for publishing messages to. |
SimpleProducer randomization of initial round robin ordering
Co-authored-by: drewdogg <drewdogg@users.noreply.github.com>
Co-authored-by: drewdogg <drewdogg@users.noreply.github.com>
Hello Folks,
I've noticed that the simple producer will always publish to partition 0, then 1, etc.
To achieve a better uniform distribution of messages per partition, would you consider
randomizing the list of partitions first? This would be especially helpful for cases
where the publisher only publishes a single message before destroying the SimpleProducer instance.