Warn if CPU usage is too high (>90%) #1161 #1236

cyberw · 2020-01-20T10:50:39Z

No description provided.

codecov · 2020-01-20T10:51:53Z

Codecov Report

Merging #1236 into master will increase coverage by 0.79%.
The diff coverage is 76%.

@@            Coverage Diff             @@
##           master    #1236      +/-   ##
==========================================
+ Coverage   78.64%   79.43%   +0.79%     
==========================================
  Files          20       20              
  Lines        1962     2086     +124     
  Branches      312      371      +59     
==========================================
+ Hits         1543     1657     +114     
- Misses        333      353      +20     
+ Partials       86       76      -10

Impacted Files	Coverage Δ
locust/main.py	`34.28% <0%> (-0.15%)`	⬇️
locust/runners.py	`80.9% <79.16%> (+2.6%)`	⬆️
locust/core.py	`87.28% <0%> (-0.43%)`	⬇️
locust/contrib/fasthttp.py	`92.59% <0%> (+1.84%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 53ddf61...cb8b20d. Read the comment docs.

cyberw · 2020-01-20T10:52:51Z

In the future we could (should):

make the threshold configurable
add GUI support

But lets leave that for nother day...

heyman · 2020-01-20T11:20:24Z

locust/runners.py

+                if current_cpu > 90:
+                    self.cpu_threshold_exceeded = True
+                    logging.warning("Loadgen CPU usage above 90%! This may constrain your throughput and even give inconsistent response time measurements! See https://docs.locust.io/en/stable/running-locust-distributed.html for how to distribute the load over multiple CPU cores or machines")
+            gevent.sleep(2.0)


Maybe we should increase the interval to something like 10 seconds? That would prevent a warning if you have a single task that does something really CPU heavy for a few seconds, and it shouldn't really matter if you get a warning after 2 or 10 seconds.

Also, do we know how much overhead (if any) Process.cpu_percent() introduces?

I was thinking about increasing the interval, but decided against increasing it, because even a short spike can cause incorrect measurements. I was thinking it is better to warn a few times too many than a few times too few...

I dont know exactly how much overhead it introduces, but I did a few tests and the call itself never took more than 0.2ms to run on my 2018 MacBook pro. I think it just reading a counter somewhere...

If you want, I can bump the interval to 5 seconds.

Ok, five seconds sounds good!

heyman · 2020-01-20T13:26:16Z

When implementing this in the web UI, I think it would make sense to add the current CPU usage for each node to the "Slaves" tab. Therefore, maybe it would make sense to add cpu_usage to locust.runners.SlaveNode, and then have all the logic for emitting warnings run in the master?

cyberw · 2020-01-20T14:11:38Z

When implementing this in the web UI, I think it would make sense to add the current CPU usage for each node to the "Slaves" tab. Therefore, maybe it would make sense to add cpu_usage to locust.runners.SlaveNode, and then have all the logic for emitting warnings run in the master?

Hmm... yes, having the logic on the master side makes sense.

The only problem is that if we want to log a value instead of just a flag of whether the threshold was exceeded then we may need to synchronize sending with measuring (or we might detect high usage but never send that particular metric, and thus miss a spike).

To fix this (without adding complex logic) could move the cpu checking for slaves to the heartbeat method and only run cpu_monitoring_greenlet on master + standalone?

I can make it so that we only update the cpu usage on every 5th heartbeat.

heyman · 2020-01-20T14:46:26Z

Since the heartbeat interval is less than the measurement interval (at least if we go with 5 seconds), I think it should be fine to have the cpu_monitoring_greenlet store the value on the runner instance, and then include that value in the heartbeat message.

cyberw · 2020-01-20T15:56:52Z

ok, I'll do it that way!

…ave and check on the master if the threshold was exceeded.

cyberw · 2020-01-20T16:34:11Z

LGTY? I havent re-tested distributed yet, but I'll do it before I merge..

locust/runners.py

heyman · 2020-01-20T17:14:13Z

locust/runners.py

+        self.current_cpu_usage = 0
+        self.cpu_threshold_exceeded = False
+        self.slave_cpu_threshold_exceeded = False
+        gevent.spawn(self.monitor_cpu)


Hmm, since we don't store any reference to this greenlet, it won't get killed. We should probably change so that the LocalLocustRunner.greenlet is a gevent.pool.Group instance (just like MasterLocustRunner and SlaveLocustRunner) and then make sure the CPU monitor greenlet is spawned from this group.

Hmm... I'm not sure how to do that. LocustRunner is supposed to be a singleton, so it shouldnt really matter, right? (not that we want to be sloppy :)

If you think this is important, would you mind taking a look yourself?

At the moment, it's a singleton when started normally (through main.py). But we do create multiple runner instances within the tests.

Also, it's been proposed, and I think it's a good idea, to work towards an API where one can run Locust programatically, in which case I think the design will be much cleaner if we group the spawned greenlets together (that one can join) and make sure they are killed together.

I can definitely take a look!

I've now changed so that we spawn all greenlets through the runner instance's greenlet attribute which is a gevent.pool.Group instance.

I also added a test for the CPU warning (and changed so that we use a constant for the montoring interval, so that I could decrease the run time of the test).

locust/runners.py

…ending whether the threshold has been exceeded or not.

…t() method. We achieve this by spawning all greenlets using the runner’s greenlet attribute which is an instance of gevent.pool.Group().

…t to decrease test run time)

Warn if CPU usage is too high (>90%) Fixes #1161

a8d3386

cyberw requested a review from heyman January 20, 2020 10:51

heyman reviewed Jan 20, 2020

View reviewed changes

Send cpu usage (not just threshold exceeded or not) from master to sl…

02a995b

…ave and check on the master if the threshold was exceeded.

heyman reviewed Jan 20, 2020

View reviewed changes

cyberw and others added 4 commits January 21, 2020 08:55

Send cpu usage to master and check thresholds there instead of just s…

d022c5a

…ending whether the threshold has been exceeded or not.

Make sure all greenlets spawned by a runner are cleaned up in the qui…

e1e2b3d

…t() method. We achieve this by spawning all greenlets using the runner’s greenlet attribute which is an instance of gevent.pool.Group().

Add test for CPU warning

186c268

Use a constant for CPU montoring interval (and override it in the tes…

cb8b20d

…t to decrease test run time)

cyberw merged commit 680cf52 into master Jan 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Warn if CPU usage is too high (>90%) #1161 #1236

Warn if CPU usage is too high (>90%) #1161 #1236

cyberw commented Jan 20, 2020

codecov bot commented Jan 20, 2020 •

edited

Loading

cyberw commented Jan 20, 2020

heyman Jan 20, 2020

cyberw Jan 20, 2020 •

edited

Loading

heyman Jan 20, 2020

heyman commented Jan 20, 2020

cyberw commented Jan 20, 2020

heyman commented Jan 20, 2020

cyberw commented Jan 20, 2020

cyberw commented Jan 20, 2020

heyman Jan 20, 2020

cyberw Jan 21, 2020

heyman Jan 22, 2020

cyberw Jan 22, 2020

heyman Jan 22, 2020

Warn if CPU usage is too high (>90%) #1161 #1236

Warn if CPU usage is too high (>90%) #1161 #1236

Conversation

cyberw commented Jan 20, 2020

codecov bot commented Jan 20, 2020 • edited Loading

Codecov Report

cyberw commented Jan 20, 2020

heyman Jan 20, 2020

Choose a reason for hiding this comment

cyberw Jan 20, 2020 • edited Loading

Choose a reason for hiding this comment

heyman Jan 20, 2020

Choose a reason for hiding this comment

heyman commented Jan 20, 2020

cyberw commented Jan 20, 2020

heyman commented Jan 20, 2020

cyberw commented Jan 20, 2020

cyberw commented Jan 20, 2020

heyman Jan 20, 2020

Choose a reason for hiding this comment

cyberw Jan 21, 2020

Choose a reason for hiding this comment

heyman Jan 22, 2020

Choose a reason for hiding this comment

cyberw Jan 22, 2020

Choose a reason for hiding this comment

heyman Jan 22, 2020

Choose a reason for hiding this comment

codecov bot commented Jan 20, 2020 •

edited

Loading

cyberw Jan 20, 2020 •

edited

Loading