Hatch rate in distributed mode spawns users in batches equal to number of slaves #896

tortila · 2018-10-01T11:57:55Z

Description of issue / feature request and actual behavior

It looks like hatch rate behavior highly depends on the number of slaves in Locust's distibuted mode.

As an example:
I'm running Locust in distributed mode with master node and 10 slave nodes. I set the test execution to spawn 100 users with hatch rate of 1. It seems that instead of spawning 1 user per second, 10 users (1 on each slave) are being spawned at once in batches.

If I add 5 more slave (summing up to 15 slave nodes in total), and start new test with the same values: 100 users with hatch rate of 1, users are now spawned in batches of 15:

Expected behavior

I would expect hatch rate to behave independent of the number of slaves. In the example above, I expect a smooth increase of 1 user every second.

Environment settings (for bug reports)

OS: Debian Stretch
Python version: 3.6
Locust version: 0.9.0

Steps to reproduce (for bug reports)

As described above

heyman · 2019-10-22T01:27:01Z

Yes, your description matches the current implementation: The slave nodes are unaware of each other and will get an instruction on launching X users with Y hatch rate.

This should only be a potential issue if you have a very low hatch rate (less than number of slave nodes) which I don't think is very common.

Could be fixed but it would add quite a bit of extra complexity, which I currently don't think is justified.

tortila · 2019-10-22T11:05:11Z

@heyman thank you for responding.

This should only be a potential issue if you have a very low hatch rate (less than number of slave nodes) which I don't think is very common.

When I filed this issue it was indeed the case - we used to run Locust in setups with 300 slaves. The reason behind it was that we aimed for a very large scale, and wanted to ramp-up slowly, ideally not changing the number of slaves on the fly as it was very problematic (but that's another story). So with this setup, the smallest possible number of users spawned at once was 300, and it was not small enough, as 300 users generated already a significant amount of load. So to sum up, this feature is important for a narrow use case, but I think it's still important to guarantee a smooth and gradual ramp-up. On top of that I also see it as a surprising and not intuitive behaviour - so maybe if it won't be fixed, at least it deserves a proper documentation.

Maybe you can also take a look at #724 as the issue described there is somehow connected to how users are being distributed between slaves.

heyman · 2019-10-22T15:25:03Z

The reason behind it was that we aimed for a very large scale, and wanted to ramp-up slowly

Ah, that's a use-case I hadn't considered and might not be too uncommon I guess. Depending on the implementation maybe it could be worth fixing after all. And I agree that if we don't fix it, or until we do, the documentation should have a note about it.

heyman · 2019-10-22T15:47:34Z

Documentation updated in d6d87b4

max-rocket-internet · 2019-10-22T16:05:07Z

we used to run Locust in setups with 300 slaves

We are also doing this. We run on k8s and it's more cost effective to scale out with many smaller slaves, as opposed to fewer larger slaves.

Current implementation is that each slave just receives a client and hatch rate that is simply client and hatch rate / number of connected slaves.

There's quite a few issues that would be resolved by allowing the locust master to have a much tighter control over the number of users running on slaves. For example it would enable autoscaling slaves (#1100 #1066 karol-brejna-i/locust-experiments#13) and custom load patterns (#1001)

heyman · 2019-10-22T17:09:24Z

There's quite a few issues that would be resolved by allowing the locust master to have a much tighter control over the number of users running on slaves.

I'm not opposed to fixing this if we can come up with a good implementation. Here's an idea from the top of my head:

Change so that the "hatch" message from master to slaves specifies the number of users to simulate for each Locust class, as well as an optional initial wait time that the slave should sleep for before starting to hatch (which can be used to even out the hatch rate spikes).
Implement a function that calculates a "plan" - that respects the weight attributes - for how many instances of the different locust classes each node should run.

I'm thinking of an API similar to this:
```
>> get_run_plan([User1, User2, User3], user_count=5, runner_count=3)
[{User1:1, User2:1}, {User1:1, User2:1}, {User3:1}]
```
LocustRunner.weight_locusts could be partly replaced by the plan calculation function.
In MasterLocustRunner.start_hatching() get the plan and then send out the corresponding hatch messages to slaves.

Like I said it's from the top of my head, and there might be problems with it that I haven't thought of, or there might be a better ways to implement it.

Thoughts?

max-rocket-internet · 2019-10-24T13:42:36Z

That sounds like a good start!

It would be great if the master would periodically runs the calculation for the given amount of slaves connected, then sends the messages out. Then the number of slaves could be more dynamic, i.e. autoscale.

Would also be great of the plan function could be provided to locust for advanced users that want to replicated traffic shapes that go up and down at specific rates. For example we are interested in reproducing a shape that is like our live environment:

Would also solve #974

heyman · 2019-10-24T14:04:12Z

It would be great if the master would periodically runs the calculation for the given amount of slaves connected, then sends the messages out.

Yes, this could be done every time a new slave node connects or disconnects, if the tests are running. (Maybe with some kind of delay just to let more nodes connect in case many are started at the same time to avoid rebalancing multiple times directly after each other)

Would also be great of the plan function could be provided to locust for advanced users

Good idea.

github-actions · 2021-04-11T20:23:38Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

github-actions · 2021-04-22T01:55:06Z

This issue was closed because it has been stalled for 10 days with no activity.

github-actions · 2021-06-22T01:50:26Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

mboutet · 2021-06-22T02:29:43Z

/remove-lifecycle stale

tortila changed the title ~~Hatch rate < 1 in distributed mode spawns users in unexpected manner~~ Hatch rate in distributed mode spawns users in batches equal to number of slaves Oct 1, 2018

heyman mentioned this issue Mar 10, 2020

Enable statistically meaningful constant RPS load generation distributions #1281

Closed

cyberw added the non-critical label Jun 6, 2020

cyberw added the hacktoberfest See https://hacktoberfest.digitalocean.com for more info label Sep 28, 2020

cyberw removed the hacktoberfest See https://hacktoberfest.digitalocean.com for more info label Oct 31, 2020

This was referenced Nov 7, 2020

Distribution of user classes is not respected and some user classes are just never spawned #1618

Closed

Move User selection responsibility from worker to master in order to fix unbalanced distribution of users and uneven ramp-up #1621

Merged

github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Apr 11, 2021

github-actions bot closed this as completed Apr 22, 2021

cyberw reopened this Apr 22, 2021

cyberw removed the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Apr 22, 2021

github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Jun 22, 2021

cyberw removed the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Jun 22, 2021

cyberw closed this as completed in #1621 Jul 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hatch rate in distributed mode spawns users in batches equal to number of slaves #896

Hatch rate in distributed mode spawns users in batches equal to number of slaves #896

tortila commented Oct 1, 2018 •

edited

Loading

heyman commented Oct 22, 2019 •

edited

Loading

tortila commented Oct 22, 2019

heyman commented Oct 22, 2019

heyman commented Oct 22, 2019

max-rocket-internet commented Oct 22, 2019

heyman commented Oct 22, 2019

max-rocket-internet commented Oct 24, 2019

heyman commented Oct 24, 2019

github-actions bot commented Apr 11, 2021

github-actions bot commented Apr 22, 2021

github-actions bot commented Jun 22, 2021

mboutet commented Jun 22, 2021

Hatch rate in distributed mode spawns users in batches equal to number of slaves #896

Hatch rate in distributed mode spawns users in batches equal to number of slaves #896

Comments

tortila commented Oct 1, 2018 • edited Loading

Description of issue / feature request and actual behavior

Expected behavior

Environment settings (for bug reports)

Steps to reproduce (for bug reports)

heyman commented Oct 22, 2019 • edited Loading

tortila commented Oct 22, 2019

heyman commented Oct 22, 2019

heyman commented Oct 22, 2019

max-rocket-internet commented Oct 22, 2019

heyman commented Oct 22, 2019

max-rocket-internet commented Oct 24, 2019

heyman commented Oct 24, 2019

github-actions bot commented Apr 11, 2021

github-actions bot commented Apr 22, 2021

github-actions bot commented Jun 22, 2021

mboutet commented Jun 22, 2021

tortila commented Oct 1, 2018 •

edited

Loading

heyman commented Oct 22, 2019 •

edited

Loading