-
-
Notifications
You must be signed in to change notification settings - Fork 258
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: thread autoscaling #1266
base: main
Are you sure you want to change the base?
feat: thread autoscaling #1266
Conversation
# Conflicts: # frankenphp.c # frankenphp.go # php_thread.go # worker.go
# Conflicts: # frankenphp.go
# Conflicts: # frankenphp.c # frankenphp.go # phpmainthread.go # phpmainthread_test.go # phpthread.go # state.go # thread-inactive.go # thread-regular.go # thread-worker.go # worker.go
Static binary failure seems to be because of crazywhalecc/static-php-cli#577 |
I added some load test simulations. Not sure yet if I want to keep them in the repo, it sure would require fixing a lot of linting errors. |
Scaling currently works like this:
Here are my findings from running a few load-test scenarios. I decided to just simulate load and latency via a PHP script. Doing an authentic load-test would have always involved setting up an unbiased cloud environment, which might be something for a different day. Keep in mind that the VUs were adjusted for 20 CPU cores: Hello world
The hello world scenario tests raw server performance. It ended up being the only scenario in which a server with lower amount of threads was able to handle more requests. I guess I overestimated the impact of CPU contention in other cases Database simulation
This test simulates 1-2 DB queries on 1-10ms latency with load similar to a Laravel request, probably a very common pattern for a lot of apis. What surprised me most is that in this scenario 5xCPU cores ended up being the ideal amount of threads - which is why I would probably recommend a default setting that at least scales to 5xCPU cores. The reason why 'scaling' was able to handle less requests than 'ideal' is that it takes some time to catch up to the ideal. The overhead of scaling itself is actually relatively negligible and doesn't even appear in the flamegraphs. External API simulation
This test leans more into big latencies. A lot of applications access external apis or microservices that have much higher response times than databases (test ran with 10ms-150ms). The main learning here is that if you know latencies to be this high, it might not be unreasonable to spawn 30xCPU cores. Threads are in general more lightweight than FPM processes, how many workers could reasonably run on 1GB of RAM is something I haven't tested yet though. Computation heavy
This test goes into the other extreme and does almost no IO. Main learning here is: If the server is not IO bound, then anything above 20 CPU cores behaves pretty similar. In this case Scaling did not go over 27 threads due to high CPU usage. This is the only test where checking for CPU usage was beneficial since we save memory by not spawning more threads. Hanging server
This test introduces a 2% chance for a 'hanging' request that takes 15s to complete. I chose this ratio on purpose since it will already make the server hang completely in default settings sometimes. Interestingly, scaling performed better here than spawning a fixed high amount of threads. In some situations being able to spawn 'fresh' threads seems to be beneficial. Timeouts
This is another resilience simulation. An external resources becomes unavailable every other 10s and causes timeouts for all requests. Again, a server with a higher amount of threads performs much better in this situation and can recover faster. On very severe hanging it might also make sense to terminate and respawn threads (something for a future PR). |
@AlliBalliBaba I spot some improvements that can be made (I think -- needs some testing), but trying to explain it in a review would probably take too long of back-and-forth. Is this branch stable enough to just open a PR to your PR? |
} | ||
|
||
func (admin *FrankenPHPAdmin) threads(w http.ResponseWriter, r *http.Request) error { | ||
if r.Method == http.MethodPut { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you could use a switch here.
} | ||
|
||
func (admin *FrankenPHPAdmin) changeThreads(w http.ResponseWriter, r *http.Request, count int) error { | ||
if !r.URL.Query().Has("worker") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you could store the result of Query()
in a variable to prevent parsing the query two times.
You could even directly get the value and check if it is the zero value here.
adminUrl := "http://localhost:2999/frankenphp/" | ||
r, err := http.NewRequest(method, adminUrl+path, nil) | ||
if err != nil { | ||
panic(err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assert.NoError()
?
} | ||
if expectedBody == "" { | ||
_ = tester.AssertResponseCode(r, expectedStatus) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could be an early return instead.
func getAdminResponseBody(tester *caddytest.Tester, method string, path string) string { | ||
adminUrl := "http://localhost:2999/frankenphp/" | ||
r, err := http.NewRequest(method, adminUrl+path, nil) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be NoError
resp := tester.AssertResponseCode(r, http.StatusOK) | ||
defer resp.Body.Close() | ||
bytes, err := io.ReadAll(resp.Body) | ||
if err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same
return d.ArgErr() | ||
} | ||
|
||
v, err := strconv.Atoi(d.Val()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe should we use ParseUint()
to prevent negative values?
The "problem" already exists with NumThreads
.
@@ -128,6 +128,16 @@ A workaround to using this type of code in worker mode is to restart the worker | |||
|
|||
The previous worker snippet allows configuring a maximum number of request to handle by setting an environment variable named `MAX_REQUESTS`. | |||
|
|||
### Restart Workers manually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Restart Workers manually | |
### Restart Workers Manually |
I originally wanted to just create a PR that allows adding threads via the admin API, but after letting threads scale automatically, that PR kind of didn't make sense anymore by itself.
So here is what this PR does:
It adds 4 Caddy admin endpoints
Additionally, the PR also introduces a new directive in the config:
max_threads
.If it's bigger than
num_threads
, worker and regular threads will attempt to autoscale after a request on a few different conditions:This is all still a WIP. I'm not yet sure if
max_threads
is the best way to configure autoscaling or if it's even necessary to have the PUT/DELETE endpoints. Maybe it would also make sense to determine max_threads based on available memory.I'll conduct some benchmarks showing that this approach performs better than default settings in a lot of different scenarios (and makes people worry less about thread configuration).
In regards to recent issues, spawning and destroying threads would also make the server more stable if we're experiencing timeouts (not sure yet how to safely destroy running threads).