-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PIP-255: Make the partition assignment strategy pluggable #19806
Comments
315157973
changed the title
PIP-XYZ: Assign topic partitions to bundles by round robin
PIP-255: Assign topic partitions to bundles by round robin
Mar 14, 2023
315157973
changed the title
PIP-255: Assign topic partitions to bundles by round robin
PIP-255: Make the partition assignment strategy pluggable
Apr 14, 2023
315157973
changed the title
PIP-255: Make the partition assignment strategy pluggable
PIP-255: Make the partition allocation strategy pluggable
Apr 14, 2023
315157973
changed the title
PIP-255: Make the partition allocation strategy pluggable
PIP-255: Make the partition Assignment strategy pluggable
Apr 14, 2023
315157973
changed the title
PIP-255: Make the partition Assignment strategy pluggable
PIP-255: Make the partition assignment strategy pluggable
Apr 14, 2023
2 tasks
This was referenced Jun 7, 2023
The issue had no activity for 30 days, mark with Stale label. |
15 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Motivation
With all existing Pulsar load balancing algorithms, it is difficult to balance the load of Pulsar cluster nodes. It often happens that some nodes are highly loaded while others are idle, and the CPU of each Broker is very different.
There are three reasons why the existing way will make the cluster unbalanced:
Scope
In this PIP, we are trying to solve the first problem.
Goal
API Changes
When lookup, partitions will be assigned to bundle:
Lookup -> NamespaceService#getBrokerServiceUrlAsync -> NamespaceService#getBundleAsync ->
NamespaceBundles#findBundle
Consistent hashing is now used to assign partitions to bundle in NamespaceBundles#findBundle.
We should add a configuration item
topicBundleAssignmentStrategy
, so that different partition assignment strategy can be dynamically configured.The existing strategy will be used as the default (
topicBundleAssignmentStrategy=ConsistentHashingTopicBundleAssigner.class
)Implementation
TopicBundleAssignmentStrategy
Add a factory to create implementations
bundleOwnershipListeners
, we need to add abundleSplitListener
a. Trigger listeners when the bundle is splitted
b. A strategy that needs to perceive changes in the Bundle will register the corresponding listener
c. When the
NamespaceBundles
is initialized, the implementation class will be created through the factory classImplementation for demonstration:
Goal
Implementation
The client sends a message to a multi-partition Topic, which uses round robin routing by default.
Therefore, we believe that the load of partitions of the same topic is balanced.
We assign partitions of the same topic to bundle by round-robin.
In this way, the difference in the number of partitions carried by the bundle will not exceed 1.
Since we consider the load of each partition of the same topic to be balanced, the load carried by each bundle is also balanced.
Operation steps:
Partition 0 finds a starting bundle through the consistent hash algorithm, assuming it is bundle0, we start from this bundle
By round-robin, assign partition 1 to the next bundle1, assign partition 2 to the next bundle2, and so on
If the number of partitions is less than the number of bundles, will some bundles have a high load?
Since the starting bundle is determined by consistent hashing, the starting point of each topic is different, which can prevent the earlier bundles from becoming hotspots.
When the number of bundles changes, will all partitions be reassigned?
Only when the number of bundles change, all partitions under the same namespace will be reassigned.
Changing the number of broker or partitions, will not trigger reassignment.
We only split when there is a hot bundle.
The current partition assign method makes the load of each bundle approximately balanced, so the bundle split will not be triggered unless it is artificially split.
Of course, we have also tested the time-consuming of assigning all partitions in the entire namespace in the worst case.
Test scenario: 6 * 4C32GB Brokers, the CPU water level of each broker is at 60%, and 50,000 partitions under the namespace are assigned at the same time. It takes about 30s.
Test report
We tested several scenarios, each using the three algorithms we mentioned above.
And every node looks well balanced.
Machine: 4C32G * 6
Test scenario: 200 partitions, after the cluster is stable, restart one of them
Even if the node restarted, the load difference between the nodes does not exceed 10%
Test scenario: Restart multiple nodes in a loop, and observe the final cluster load
Even if the nodes keep restarting, the load difference between the nodes does not exceed 10%
Test scenario: Add a new Broker
Finally, the load difference between the nodes does not exceed 10%
Test scenario: single-partition topic, even unloading the bundle will make the receiving broker to be a new hotspot, observe whether the algorithm will unload frequently
The text was updated successfully, but these errors were encountered: