Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hatch rate in distributed mode spawns users in batches equal to number of slaves #896

Closed
tortila opened this issue Oct 1, 2018 · 12 comments · Fixed by #1621
Closed

Hatch rate in distributed mode spawns users in batches equal to number of slaves #896

tortila opened this issue Oct 1, 2018 · 12 comments · Fixed by #1621

Comments

@tortila
Copy link

tortila commented Oct 1, 2018

Description of issue / feature request and actual behavior

It looks like hatch rate behavior highly depends on the number of slaves in Locust's distibuted mode.

As an example:
I'm running Locust in distributed mode with master node and 10 slave nodes. I set the test execution to spawn 100 users with hatch rate of 1. It seems that instead of spawning 1 user per second, 10 users (1 on each slave) are being spawned at once in batches.

screen shot 2018-10-01 at 14 25 10

If I add 5 more slave (summing up to 15 slave nodes in total), and start new test with the same values: 100 users with hatch rate of 1, users are now spawned in batches of 15:

screen shot 2018-10-01 at 14 38 49

Expected behavior

I would expect hatch rate to behave independent of the number of slaves. In the example above, I expect a smooth increase of 1 user every second.

Environment settings (for bug reports)

  • OS: Debian Stretch
  • Python version: 3.6
  • Locust version: 0.9.0

Steps to reproduce (for bug reports)

As described above

@tortila tortila changed the title Hatch rate < 1 in distributed mode spawns users in unexpected manner Hatch rate in distributed mode spawns users in batches equal to number of slaves Oct 1, 2018
@heyman
Copy link
Member

heyman commented Oct 22, 2019

Yes, your description matches the current implementation: The slave nodes are unaware of each other and will get an instruction on launching X users with Y hatch rate.

This should only be a potential issue if you have a very low hatch rate (less than number of slave nodes) which I don't think is very common.

Could be fixed but it would add quite a bit of extra complexity, which I currently don't think is justified.

@tortila
Copy link
Author

tortila commented Oct 22, 2019

@heyman thank you for responding.

This should only be a potential issue if you have a very low hatch rate (less than number of slave nodes) which I don't think is very common.

When I filed this issue it was indeed the case - we used to run Locust in setups with 300 slaves. The reason behind it was that we aimed for a very large scale, and wanted to ramp-up slowly, ideally not changing the number of slaves on the fly as it was very problematic (but that's another story). So with this setup, the smallest possible number of users spawned at once was 300, and it was not small enough, as 300 users generated already a significant amount of load. So to sum up, this feature is important for a narrow use case, but I think it's still important to guarantee a smooth and gradual ramp-up. On top of that I also see it as a surprising and not intuitive behaviour - so maybe if it won't be fixed, at least it deserves a proper documentation.

Maybe you can also take a look at #724 as the issue described there is somehow connected to how users are being distributed between slaves.

@heyman
Copy link
Member

heyman commented Oct 22, 2019

The reason behind it was that we aimed for a very large scale, and wanted to ramp-up slowly

Ah, that's a use-case I hadn't considered and might not be too uncommon I guess. Depending on the implementation maybe it could be worth fixing after all. And I agree that if we don't fix it, or until we do, the documentation should have a note about it.

@heyman
Copy link
Member

heyman commented Oct 22, 2019

Documentation updated in d6d87b4

@max-rocket-internet
Copy link
Contributor

we used to run Locust in setups with 300 slaves

We are also doing this. We run on k8s and it's more cost effective to scale out with many smaller slaves, as opposed to fewer larger slaves.

Current implementation is that each slave just receives a client and hatch rate that is simply client and hatch rate / number of connected slaves.

There's quite a few issues that would be resolved by allowing the locust master to have a much tighter control over the number of users running on slaves. For example it would enable autoscaling slaves (#1100 #1066 karol-brejna-i/locust-experiments#13) and custom load patterns (#1001)

@heyman
Copy link
Member

heyman commented Oct 22, 2019

There's quite a few issues that would be resolved by allowing the locust master to have a much tighter control over the number of users running on slaves.

I'm not opposed to fixing this if we can come up with a good implementation. Here's an idea from the top of my head:

  • Change so that the "hatch" message from master to slaves specifies the number of users to simulate for each Locust class, as well as an optional initial wait time that the slave should sleep for before starting to hatch (which can be used to even out the hatch rate spikes).

  • Implement a function that calculates a "plan" - that respects the weight attributes - for how many instances of the different locust classes each node should run.

    I'm thinking of an API similar to this:

    >> get_run_plan([User1, User2, User3], user_count=5, runner_count=3)
    [{User1:1, User2:1}, {User1:1, User2:1}, {User3:1}]
  • LocustRunner.weight_locusts could be partly replaced by the plan calculation function.

  • In MasterLocustRunner.start_hatching() get the plan and then send out the corresponding hatch messages to slaves.

Like I said it's from the top of my head, and there might be problems with it that I haven't thought of, or there might be a better ways to implement it.

Thoughts?

@max-rocket-internet
Copy link
Contributor

That sounds like a good start!

It would be great if the master would periodically runs the calculation for the given amount of slaves connected, then sends the messages out. Then the number of slaves could be more dynamic, i.e. autoscale.

Would also be great of the plan function could be provided to locust for advanced users that want to replicated traffic shapes that go up and down at specific rates. For example we are interested in reproducing a shape that is like our live environment:

Screen Shot 2019-10-24 at 15 40 48

Would also solve #974

@heyman
Copy link
Member

heyman commented Oct 24, 2019

It would be great if the master would periodically runs the calculation for the given amount of slaves connected, then sends the messages out.

Yes, this could be done every time a new slave node connects or disconnects, if the tests are running. (Maybe with some kind of delay just to let more nodes connect in case many are started at the same time to avoid rebalancing multiple times directly after each other)

Would also be great of the plan function could be provided to locust for advanced users

Good idea.

@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Apr 11, 2021
@github-actions
Copy link

This issue was closed because it has been stalled for 10 days with no activity.

@cyberw cyberw reopened this Apr 22, 2021
@cyberw cyberw removed the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Apr 22, 2021
@github-actions
Copy link

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 10 days.

@github-actions github-actions bot added the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Jun 22, 2021
@mboutet
Copy link
Contributor

mboutet commented Jun 22, 2021

/remove-lifecycle stale

@cyberw cyberw removed the stale Issue had no activity. Might still be worth fixing, but dont expect someone else to fix it label Jun 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants