-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Warn if CPU usage is too high (>90%) #1161 #1236
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1236 +/- ##
==========================================
+ Coverage 78.64% 79.43% +0.79%
==========================================
Files 20 20
Lines 1962 2086 +124
Branches 312 371 +59
==========================================
+ Hits 1543 1657 +114
- Misses 333 353 +20
+ Partials 86 76 -10
Continue to review full report at Codecov.
|
In the future we could (should):
But lets leave that for nother day... |
locust/runners.py
Outdated
if current_cpu > 90: | ||
self.cpu_threshold_exceeded = True | ||
logging.warning("Loadgen CPU usage above 90%! This may constrain your throughput and even give inconsistent response time measurements! See https://docs.locust.io/en/stable/running-locust-distributed.html for how to distribute the load over multiple CPU cores or machines") | ||
gevent.sleep(2.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should increase the interval to something like 10 seconds? That would prevent a warning if you have a single task that does something really CPU heavy for a few seconds, and it shouldn't really matter if you get a warning after 2 or 10 seconds.
Also, do we know how much overhead (if any) Process.cpu_percent()
introduces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about increasing the interval, but decided against increasing it, because even a short spike can cause incorrect measurements. I was thinking it is better to warn a few times too many than a few times too few...
I dont know exactly how much overhead it introduces, but I did a few tests and the call itself never took more than 0.2ms to run on my 2018 MacBook pro. I think it just reading a counter somewhere...
If you want, I can bump the interval to 5 seconds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, five seconds sounds good!
When implementing this in the web UI, I think it would make sense to add the current CPU usage for each node to the "Slaves" tab. Therefore, maybe it would make sense to add |
Hmm... yes, having the logic on the master side makes sense. The only problem is that if we want to log a value instead of just a flag of whether the threshold was exceeded then we may need to synchronize sending with measuring (or we might detect high usage but never send that particular metric, and thus miss a spike). To fix this (without adding complex logic) could move the cpu checking for slaves to the heartbeat method and only run cpu_monitoring_greenlet on master + standalone? I can make it so that we only update the cpu usage on every 5th heartbeat. |
Since the heartbeat interval is less than the measurement interval (at least if we go with 5 seconds), I think it should be fine to have the |
ok, I'll do it that way! |
…ave and check on the master if the threshold was exceeded.
LGTY? I havent re-tested distributed yet, but I'll do it before I merge.. |
locust/runners.py
Outdated
self.current_cpu_usage = 0 | ||
self.cpu_threshold_exceeded = False | ||
self.slave_cpu_threshold_exceeded = False | ||
gevent.spawn(self.monitor_cpu) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, since we don't store any reference to this greenlet, it won't get killed. We should probably change so that the LocalLocustRunner.greenlet
is a gevent.pool.Group
instance (just like MasterLocustRunner
and SlaveLocustRunner
) and then make sure the CPU monitor greenlet is spawned from this group.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I'm not sure how to do that. LocustRunner is supposed to be a singleton, so it shouldnt really matter, right? (not that we want to be sloppy :)
If you think this is important, would you mind taking a look yourself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment, it's a singleton when started normally (through main.py). But we do create multiple runner instances within the tests.
Also, it's been proposed, and I think it's a good idea, to work towards an API where one can run Locust programatically, in which case I think the design will be much cleaner if we group the spawned greenlets together (that one can join) and make sure they are killed together.
I can definitely take a look!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've now changed so that we spawn all greenlets through the runner instance's greenlet
attribute which is a gevent.pool.Group
instance.
I also added a test for the CPU warning (and changed so that we use a constant for the montoring interval, so that I could decrease the run time of the test).
…ending whether the threshold has been exceeded or not.
…t() method. We achieve this by spawning all greenlets using the runner’s greenlet attribute which is an instance of gevent.pool.Group().
…t to decrease test run time)
No description provided.