-
-
Notifications
You must be signed in to change notification settings - Fork 752
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for compressing payloads sent over the message bus #5241
Conversation
174417b
to
bee9972
Compare
orquesta tests. We only need that for integration tests.
to limited resources on CI.
55bb8b8
to
3aa60ba
Compare
This should not affect results in any way since no concurrency is involved in those benchhmarks, but it's done for consistency reasons.
and are in no way related to extreme.
Also hooked up micro benchmarks to a nightly CI, added new micro benchmarks for MongoDB transport level compression, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple small things.
@@ -189,6 +189,8 @@ redirect_stderr = False | |||
[messaging] | |||
# URL of all the nodes in a messaging service cluster. | |||
cluster_urls = # comma separated list allowed here. | |||
# Compression algorithm to use for compressing the payloads which are sent over the message bus. Valid values include: zstd, lzma, bz2, gzip. Defaults to no compression. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need extra libraries when using the various compression algorithms?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only for zstandard, but we bundle that with the StackStorm virtual environment inside the deb and rpm packages.
This is going to sound weird, this looks fine and I trust your code, but I would still introduce this option as "experimental" to give ourselves some flexibility later on. Hear me out... I think our goal should be to enable one of these options by default, but right now we're not sure which one/s are going to be useful, or the most useful for end users. So, for the 3.5 release, we include this option as experimental, and we ask users to try out one or more of the compression settings. Then, between 3.5 and 3.6, we run a "2021 User Survey" where we ask users which option/s they tried, which ones worked well, which ones did not work well for them, which one they ended up running with, how much of a speedup they measured/noticed, and how much faster StackStorm felt (eg: quantitative and qualitative analyses). Right now I have a suspicion that zstd will be the preferred compression algorithm, but nobody has any real world data to support or reject that opinion. If we get some consistent feedback we can decide which options to remove, if any, when we finally release this option "for real". Like let's say the gzip option was universally the fastest but didn't compress enough for most people and the bz2 option compressed really well for everybody but was too slow for some users, and basically nobody settled on those two options in production - we can just remove those two options. And if there's a clear winner, we can make that the default setting for 3.6. Ideally there would be a single winner, and it would always be better than no compression at all. If that's the case, we can just always turn on compression and remove the option with a minimized deprecation period, since we advertised it as experimental instead of fully supported. Doing all of that will allow us to minimize the number of compression algorithms we support long-term, which lowers the complexity, and simplifies support for end users, and it will give us real world data to make StackStorm instances that run with default settings faster and better. If we advertise this as experimental, even if there ends up being no clear consensus around which algorithm, if any, are good to turn on or to use as default, we can just say that all options are fully supported in 3.6 and beyond, without having to change any code whatsoever. But, on the other hand, if we release this option right now as production-ready in 3.5, we're signing ourselves up for supporting all of these options for the foreseeable future, and that could drastically complicate bug reports. If we eventually want to remove any or all of these options we would also have to give a good long deprecation period as well, which could be frustrating for both our users and for us developers. So I think hedging our bets on this option, for now, could help us avoid frustration for everybody, keep the codebase and configuration as simple as it reasonably can be, and give us a good performance boost if/once we select a default compression algorithm. The only downside that I can see is that we would kinda need to run a user survey this summer, but that's definitely something that I think we should do anyway, to gauge how we're doing so far, and to give us a good idea on where to take the project in the future. |
In general I'm +1 on such things (keeping things simple), but here supporting additional algorithms doesn't really meany additional overhead or complexity for us since it's handled out of the box by kombu (and all those libraries are part of Python stdlib except zstandard, bu we bundle that now). Having said that, I would also be fine with only supporting or publicly documenting only one option (e.g. zstandard to be consistent with MongoDB transport level compression) - in fact, that's what I wanted to do initially, but since kombu supports all those out of the box it doesn't really mean much overhead for us and some user may have a valid reason to pick a different algorithm for their workflow.
See my comments and PR description. Compression is always a trade off and it's use-case / data specific so it would be very hard to find a good default which works for everyone and different scenarios. That's also the reason why the functionality is disabled by default and opt-in - so power users can test and see how it affects things with their workflows and then decide if they want to enable it or not (that's also documented in st2docs). micro benchmarks also showed (as expected) that compression adds some overhead (which may or may not be worth it very much depending on many different factors and requirements). But that was of course done in an isolated micro benchmark scenario and actual end to end story is much more complex and affected by a lot of factors. In short, I agree about simplicity where possible, but I also don't think that adds much additional complexity, especially since it's opt-in and in-line with many other knobs we have which are also somewhat workload / use case specific. Overall, I'm actually +1 for simple software without many configuration options, but I wonder how what fits into StackStorm since in a sense StackStorm is a generic platform for doing many different things (for better or worse) which will have different running requirements. But yeah, that's also more of a wider question on the project direction and philosophy. Today, for better or worse, we use many different libraries which support pluggable backends and have tons of config options (think all the oslo stuff - coordination backends, kombu, etc.). But yeah, even if libraries we use support that, this of course doesn't mean we directly or officially support that (aka advertise it in the dos). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggested change just in case Kombu adds support for more compression algorithms in the future, but this is also fine without it.
Okay, yeah, wow, this isn't that big of change. Almost all of this PR is adding/tweaking tests, and adding the microbenchmarks. If all this is doing is passing the compression option to kombu, then you're right, the complexity won't live in our codebase (aside from the tests and microbenchmarks, but those don't face end users). Since Kombu might add more compression options in the future, it might be useful to not only explicitly list common valid compression options, but to also link them to the |
Not an issue - we bundle all the dependencies.
This pull request adds support for compressing the payloads which are sent over the message bus (right now those are pickled object instances).
For backward compatibility reasons, the default value is no compression.
Code utilizes existing
kobmu
mechanism for compression which means it's very simple and also backward / forward compatible since the compression algorithm used to compress the message payload is stored in the message metadata / headers.Resolves #5239.
Impact on performance
How this change will affect the performance really depends on the end user workflow and resources.
Compression is never free and it basically means you are trading CPU cycles / time for compression / decompression for saving amount of bytes sent over network.
In a lot of cases, even though additional CPU cycles are spent on compressing and decompressing the payload, this may result in overall increased throughput (due to network setup, etc).
Microbenchmarks did show that, as expected that compression does add some overhead.
This functionality may come especially handy for users which have generic action trigger enabled and have executions with large (textual) results - we send whole execution object over the message bus including this result (which usually compresses very well since it's textual log like data).
TODO