-
Notifications
You must be signed in to change notification settings - Fork 840
Optimize v2/groups
request performance
#5973
Conversation
and reduced the amount of work done on the futures.
for `StorageModule`, `SchedulerActions` and `GroupManagerModule` components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-5973-1.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✗ Build of #5973 failed.
See the logs and test results for details.
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-5973-2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✗ Build of #5973 failed.
See the logs and test results for details.
Error message:
Stage Compile and Test failed.
(๑′°︿°๑)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-5973-3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔ Build of #5973 completed successfully.
See details at jenkins-marathon-pipelines-PR-5973-3.
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://s3.amazonaws.com/downloads.mesosphere.io/marathon/builds/marathon-1.6.0-pre-299-gd8338be.tgz",
"sha1": "327099a92157ef863287b936441bb18399e5b623"
You can run system integration test changes of this PR against Marathon
master by triggering this Jenkins job with the Pull_Request_id
5973
.
The job then reports back to this PR.
\\ ٩( ᐛ )و //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
to name pooled threads e.g. `app-info-module-thread-3`
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-5973-4.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔ Build of #5973 completed successfully.
See details at jenkins-marathon-pipelines-PR-5973-4.
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://s3.amazonaws.com/downloads.mesosphere.io/marathon/builds/marathon-1.6.0-pre-300-g658e102.tgz",
"sha1": "83869c6e353ee6242927b3ab4d51845de240daee"
You can run system integration test changes of this PR against Marathon
master by triggering this Jenkins job with the Pull_Request_id
5973
.
The job then reports back to this PR.
\\ ٩( ᐛ )و //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-5973-5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔ Build of #5973 completed successfully.
See details at jenkins-marathon-pipelines-PR-5973-5.
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://s3.amazonaws.com/downloads.mesosphere.io/marathon/builds/marathon-1.6.0-pre-301-gf3b44f4.tgz",
"sha1": "a0c3a032e865bf308ed9b306ec4999a69b051c52"
You can run system integration test changes of this PR against Marathon
master by triggering this Jenkins job with the Pull_Request_id
5973
.
The job then reports back to this PR.
\\ ٩( ᐛ )و //
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
preference for more readable naming of vars but nothing blocking
@@ -21,8 +21,11 @@ class AppInfoModule @Inject() ( | |||
healthCheckManager: HealthCheckManager, | |||
marathonSchedulerService: MarathonSchedulerService, | |||
taskFailureRepository: TaskFailureRepository) { | |||
|
|||
val ec = NamedExecutionContext.fixedThreadPoolExecutionContext(8, "app-info-module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ec
? are we trying to stay within a linter line length rule :)
could we make this more readable pls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few akka/scala specific and accepted "short names" e.g. ec
for ExecutionContext
, mat
for Materializer
etc. Falls into the same category as loop counter for(i=0,..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this thread pool be ideally related to the number of threads available?
Should we make it configurable? (even if via an undocumented environment variable)?
It seems like we should ideally create the execution context in the module and share it with the pods status controller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ec
is fine for me for the reason @zen-dog pointed out.
I agree with @timcharper that the number of threads could as well be configurable.
|
||
private[this] val log = LoggerFactory.getLogger(getClass) | ||
implicit val ec = NamedExecutionContext.fixedThreadPoolExecutionContext(8, "default-info-service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work!
What do you think about those questions?
@@ -21,8 +21,11 @@ class AppInfoModule @Inject() ( | |||
healthCheckManager: HealthCheckManager, | |||
marathonSchedulerService: MarathonSchedulerService, | |||
taskFailureRepository: TaskFailureRepository) { | |||
|
|||
val ec = NamedExecutionContext.fixedThreadPoolExecutionContext(8, "app-info-module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this thread pool be ideally related to the number of threads available?
Should we make it configurable? (even if via an undocumented environment variable)?
It seems like we should ideally create the execution context in the module and share it with the pods status controller?
@@ -199,10 +200,11 @@ class CoreModuleImpl @Inject() ( | |||
|
|||
// GROUP MANAGER | |||
|
|||
val groupManagerExecutionContext = NamedExecutionContext.fixedThreadPoolExecutionContext(8, "group-manager-module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 8 threads? Did we measure that 8 is a meaningful number? I'm not trying to overcomplicate things, but it seems to me that making these configurable would allow us to also easily run different tests to verify that the number makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
8 seems like a low enough number that it shouldn't matter. More important is the fact that we separate the thread pools from each other.
@@ -21,8 +21,11 @@ class AppInfoModule @Inject() ( | |||
healthCheckManager: HealthCheckManager, | |||
marathonSchedulerService: MarathonSchedulerService, | |||
taskFailureRepository: TaskFailureRepository) { | |||
|
|||
val ec = NamedExecutionContext.fixedThreadPoolExecutionContext(8, "app-info-module") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ec
is fine for me for the reason @zen-dog pointed out.
I agree with @timcharper that the number of threads could as well be configurable.
* Returns an execution context backed by a fixed thread pool. All threads in the pool are prefixed with `namePrefix` | ||
* e.g. `slow-io-pool-thread-1`. | ||
* | ||
* @param numThreads number of threads in the pol |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s/pol/pool
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easily the greatest change ever. I do have question on the usage of the fixed worker thread pool but it should not block this PR. Thanks a ton!
|
||
private[this] val log = LoggerFactory.getLogger(getClass) | ||
implicit val ec = NamedExecutionContext.fixedThreadPoolExecutionContext(8, "default-info-service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this really has an impact if app info and the group manager have their own thread pools. I'd like to understand where we benefit from a fixed worker thread pool and where we should be using the fork join thread pool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the scenario with one thread pool and it adds roughly 20% response time. Not that much but I think dividing them still makes sense - Dīvide et Imperā!
I don't think there is much value in setting it to
Shared execution context showed 20% bigger response times (for whatever reason). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-5973-6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔ Build of #5973 completed successfully.
See details at jenkins-marathon-pipelines-PR-5973-6.
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://s3.amazonaws.com/downloads.mesosphere.io/marathon/builds/marathon-1.6.0-pre-311-ge500dcc.tgz",
"sha1": "84c29052bc915820edf9ce47b83b901606253078"
You can run system integration test changes of this PR against Marathon
master by triggering this Jenkins job with the Pull_Request_id
5973
.
The job then reports back to this PR.
\\ ٩( ᐛ )و //
|
||
trait AppInfoConfig extends ScallopConf { | ||
|
||
lazy val appInfoModuleExecutionContextSize = opt[Int]( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's base the default as Runtime.getRuntime().availableProcessors() * 2
? (or... just # of availableProcessors?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm building your change at jenkins-marathon-pipelines-PR-5973-7.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✔ Build of #5973 completed successfully.
See details at jenkins-marathon-pipelines-PR-5973-7.
You can create a DC/OS with your patched Marathon by creating a new pull
request with the following changes in buildinfo.json:
"url": "https://s3.amazonaws.com/downloads.mesosphere.io/marathon/builds/marathon-1.6.0-pre-313-ga4ee6d2.tgz",
"sha1": "5127a89c0a07c72c70b6c15fd0192ead1f749abf"
You can run system integration test changes of this PR against Marathon
master by triggering this Jenkins job with the Pull_Request_id
5973
.
The job then reports back to this PR.
\\ ٩( ᐛ )و //
Backport of the original PR #5973 JIRA: COPS-2224
Summary: This replace the Akka stream with a simple `Future.sequence` and uses the same `AppInfoBaseData` instance for each pod. This we utilize the cached instances and not limit ourselves to eight parallel threads. To not overload Marathon we use the fixed size thread pool from #5973. However, even with #5973 we might allocate a lot of futures and runnables. JIRA issues: MARATHON-8563
Summary: This replace the Akka stream with a simple `Future.sequence` and uses the same `AppInfoBaseData` instance for each pod. This we utilize the cached instances and not limit ourselves to eight parallel threads. To not overload Marathon we use the fixed size thread pool from #5973. However, even with #5973 we might allocate a lot of futures and runnables. JIRA issues: MARATHON-8563
Summary: This replace the Akka stream with a simple `Future.sequence` and uses the same `AppInfoBaseData` instance for each pod. This we utilize the cached instances and not limit ourselves to eight parallel threads. To not overload Marathon we use the fixed size thread pool from #5973. However, even with #5973 we might allocate a lot of futures and runnables. For 100 pods with 1 instance each I got the following: ## Master ``` › wrk -c 10 -t 10 -d 2m http://localhost:8080/v2/pods/::status Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 136.52ms 44.93ms 859.05ms 93.64% Req/Sec 7.71 2.49 20.00 69.81% 8918 requests in 2.00m, 1.34GB read Requests/sec: 74.26 Transfer/sec: 11.45MB ``` ## This change ``` Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 62.27ms 23.24ms 220.39ms 70.17% Req/Sec 15.94 6.09 50.00 69.12% 19345 requests in 2.00m, 2.91GB read Requests/sec: 161.10 Transfer/sec: 24.85MB ``` JIRA issues: MARATHON-8563
Summary: This replace the Akka stream with a simple `Future.sequence` and uses the same `AppInfoBaseData` instance for each pod. This we utilize the cached instances and not limit ourselves to eight parallel threads. To not overload Marathon we use the fixed size thread pool from #5973. However, even with #5973 we might allocate a lot of futures and runnables. For 100 pods with 1 instance each I got the following: ## Master ``` › wrk -c 10 -t 10 -d 2m http://localhost:8080/v2/pods/::status Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 136.52ms 44.93ms 859.05ms 93.64% Req/Sec 7.71 2.49 20.00 69.81% 8918 requests in 2.00m, 1.34GB read Requests/sec: 74.26 Transfer/sec: 11.45MB ``` ## This change ``` Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 62.27ms 23.24ms 220.39ms 70.17% Req/Sec 15.94 6.09 50.00 69.12% 19345 requests in 2.00m, 2.91GB read Requests/sec: 161.10 Transfer/sec: 24.85MB ``` JIRA issues: MARATHON-8563
Summary: This replace the Akka stream with a simple `Future.sequence` and uses the same `AppInfoBaseData` instance for each pod. This we utilize the cached instances and not limit ourselves to eight parallel threads. To not overload Marathon we use the fixed size thread pool from #5973. However, even with #5973 we might allocate a lot of futures and runnables. For 100 pods with 1 instance each I got the following: ``` › wrk -c 10 -t 10 -d 2m http://localhost:8080/v2/pods/::status Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 136.52ms 44.93ms 859.05ms 93.64% Req/Sec 7.71 2.49 20.00 69.81% 8918 requests in 2.00m, 1.34GB read Requests/sec: 74.26 Transfer/sec: 11.45MB ``` ``` Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 62.27ms 23.24ms 220.39ms 70.17% Req/Sec 15.94 6.09 50.00 69.12% 19345 requests in 2.00m, 2.91GB read Requests/sec: 161.10 Transfer/sec: 24.85MB ``` JIRA issues: MARATHON-8563
Summary: This replace the Akka stream with a simple `Future.sequence` and uses the same `AppInfoBaseData` instance for each pod. This we utilize the cached instances and not limit ourselves to eight parallel threads. To not overload Marathon we use the fixed size thread pool from #5973. However, even with #5973 we might allocate a lot of futures and runnables. For 100 pods with 1 instance each I got the following: ## Master ``` › wrk -c 10 -t 10 -d 2m http://localhost:8080/v2/pods/::status Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 136.52ms 44.93ms 859.05ms 93.64% Req/Sec 7.71 2.49 20.00 69.81% 8918 requests in 2.00m, 1.34GB read Requests/sec: 74.26 Transfer/sec: 11.45MB ``` ## This change ``` Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 62.27ms 23.24ms 220.39ms 70.17% Req/Sec 15.94 6.09 50.00 69.12% 19345 requests in 2.00m, 2.91GB read Requests/sec: 161.10 Transfer/sec: 24.85MB ``` JIRA issues: MARATHON-8563
Summary: This replace the Akka stream with a simple `Future.sequence` and uses the same `AppInfoBaseData` instance for each pod. This we utilize the cached instances and not limit ourselves to eight parallel threads. To not overload Marathon we use the fixed size thread pool from #5973. However, even with #5973 we might allocate a lot of futures and runnables. For 100 pods with 1 instance each I got the following: ``` › wrk -c 10 -t 10 -d 2m http://localhost:8080/v2/pods/::status Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 136.52ms 44.93ms 859.05ms 93.64% Req/Sec 7.71 2.49 20.00 69.81% 8918 requests in 2.00m, 1.34GB read Requests/sec: 74.26 Transfer/sec: 11.45MB ``` ``` Running 2m test @ http://localhost:8080/v2/pods/::status 10 threads and 10 connections Thread Stats Avg Stdev Max +/- Stdev Latency 62.27ms 23.24ms 220.39ms 70.17% Req/Sec 15.94 6.09 50.00 69.12% 19345 requests in 2.00m, 2.91GB read Requests/sec: 161.10 Transfer/sec: 24.85MB ``` JIRA issues: MARATHON-8563
Improved
v2/groups?embed...
requests performance by introducing dedicated thread pools for fetching info instead of using theForkJoinPool
.Additionally introduces dedicated execution context for
GroupManagerModule
,StorageModule
andSchedulerActions
components.Benchmark results master:
This branch:
Summary: 10x throughput, 5x latency and 5x std. deviation improvement.
JIRA: COPS-2224