Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for compressing payloads sent over the message bus #5241

Merged
merged 28 commits into from
Apr 22, 2021

Conversation

Kami
Copy link
Member

@Kami Kami commented Apr 19, 2021

This pull request adds support for compressing the payloads which are sent over the message bus (right now those are pickled object instances).

For backward compatibility reasons, the default value is no compression.

Code utilizes existing kobmu mechanism for compression which means it's very simple and also backward / forward compatible since the compression algorithm used to compress the message payload is stored in the message metadata / headers.

Resolves #5239.

Impact on performance

How this change will affect the performance really depends on the end user workflow and resources.

Compression is never free and it basically means you are trading CPU cycles / time for compression / decompression for saving amount of bytes sent over network.

In a lot of cases, even though additional CPU cycles are spent on compressing and decompressing the payload, this may result in overall increased throughput (due to network setup, etc).

Microbenchmarks did show that, as expected that compression does add some overhead.

This functionality may come especially handy for users which have generic action trigger enabled and have executions with large (textual) results - we send whole execution object over the message bus including this result (which usually compresses very well since it's textual log like data).

TODO

@Kami Kami added this to the 3.5.0 milestone Apr 19, 2021
@pull-request-size pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Apr 19, 2021
@Kami Kami changed the title [WIP] Add support for compressing payloads sent over the message bus Add support for compressing payloads sent over the message bus Apr 19, 2021
@pull-request-size pull-request-size bot added size/XL PR that changes 500-999 lines. Consider splitting work into several ones that easier to review. and removed size/L PR that changes 100-499 lines. Requires some effort to review. labels Apr 20, 2021
@Kami Kami force-pushed the message_bus_compression_support branch from 174417b to bee9972 Compare April 20, 2021 21:57
@Kami Kami force-pushed the message_bus_compression_support branch from 55bb8b8 to 3aa60ba Compare April 21, 2021 09:21
@Kami
Copy link
Member Author

Kami commented Apr 21, 2021

Also hooked up micro benchmarks to a nightly CI, added new micro benchmarks for MongoDB transport level compression, etc.

Copy link
Member

@nmaludy nmaludy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small things.

@@ -189,6 +189,8 @@ redirect_stderr = False
[messaging]
# URL of all the nodes in a messaging service cluster.
cluster_urls = # comma separated list allowed here.
# Compression algorithm to use for compressing the payloads which are sent over the message bus. Valid values include: zstd, lzma, bz2, gzip. Defaults to no compression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need extra libraries when using the various compression algorithms?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only for zstandard, but we bundle that with the StackStorm virtual environment inside the deb and rpm packages.

CHANGELOG.rst Outdated Show resolved Hide resolved
@blag
Copy link
Contributor

blag commented Apr 21, 2021

This is going to sound weird, this looks fine and I trust your code, but I would still introduce this option as "experimental" to give ourselves some flexibility later on. Hear me out...

I think our goal should be to enable one of these options by default, but right now we're not sure which one/s are going to be useful, or the most useful for end users. So, for the 3.5 release, we include this option as experimental, and we ask users to try out one or more of the compression settings. Then, between 3.5 and 3.6, we run a "2021 User Survey" where we ask users which option/s they tried, which ones worked well, which ones did not work well for them, which one they ended up running with, how much of a speedup they measured/noticed, and how much faster StackStorm felt (eg: quantitative and qualitative analyses). Right now I have a suspicion that zstd will be the preferred compression algorithm, but nobody has any real world data to support or reject that opinion.

If we get some consistent feedback we can decide which options to remove, if any, when we finally release this option "for real". Like let's say the gzip option was universally the fastest but didn't compress enough for most people and the bz2 option compressed really well for everybody but was too slow for some users, and basically nobody settled on those two options in production - we can just remove those two options. And if there's a clear winner, we can make that the default setting for 3.6. Ideally there would be a single winner, and it would always be better than no compression at all. If that's the case, we can just always turn on compression and remove the option with a minimized deprecation period, since we advertised it as experimental instead of fully supported.

Doing all of that will allow us to minimize the number of compression algorithms we support long-term, which lowers the complexity, and simplifies support for end users, and it will give us real world data to make StackStorm instances that run with default settings faster and better.

If we advertise this as experimental, even if there ends up being no clear consensus around which algorithm, if any, are good to turn on or to use as default, we can just say that all options are fully supported in 3.6 and beyond, without having to change any code whatsoever.

But, on the other hand, if we release this option right now as production-ready in 3.5, we're signing ourselves up for supporting all of these options for the foreseeable future, and that could drastically complicate bug reports. If we eventually want to remove any or all of these options we would also have to give a good long deprecation period as well, which could be frustrating for both our users and for us developers.

So I think hedging our bets on this option, for now, could help us avoid frustration for everybody, keep the codebase and configuration as simple as it reasonably can be, and give us a good performance boost if/once we select a default compression algorithm. The only downside that I can see is that we would kinda need to run a user survey this summer, but that's definitely something that I think we should do anyway, to gauge how we're doing so far, and to give us a good idea on where to take the project in the future.

@Kami
Copy link
Member Author

Kami commented Apr 21, 2021

Doing all of that will allow us to minimize the number of compression algorithms we support long-term, which lowers the complexity, and simplifies support for end users, and it will give us real world data to make StackStorm instances that run with default settings faster and better.

In general I'm +1 on such things (keeping things simple), but here supporting additional algorithms doesn't really meany additional overhead or complexity for us since it's handled out of the box by kombu (and all those libraries are part of Python stdlib except zstandard, bu we bundle that now).

Having said that, I would also be fine with only supporting or publicly documenting only one option (e.g. zstandard to be consistent with MongoDB transport level compression) - in fact, that's what I wanted to do initially, but since kombu supports all those out of the box it doesn't really mean much overhead for us and some user may have a valid reason to pick a different algorithm for their workflow.

I think our goal should be to enable one of these options by default, but right now we're not sure which one/s are going to be useful, or the most useful for end users.

See my comments and PR description.

Compression is always a trade off and it's use-case / data specific so it would be very hard to find a good default which works for everyone and different scenarios. That's also the reason why the functionality is disabled by default and opt-in - so power users can test and see how it affects things with their workflows and then decide if they want to enable it or not (that's also documented in st2docs).

micro benchmarks also showed (as expected) that compression adds some overhead (which may or may not be worth it very much depending on many different factors and requirements). But that was of course done in an isolated micro benchmark scenario and actual end to end story is much more complex and affected by a lot of factors.

In short, I agree about simplicity where possible, but I also don't think that adds much additional complexity, especially since it's opt-in and in-line with many other knobs we have which are also somewhat workload / use case specific.

Overall, I'm actually +1 for simple software without many configuration options, but I wonder how what fits into StackStorm since in a sense StackStorm is a generic platform for doing many different things (for better or worse) which will have different running requirements.

But yeah, that's also more of a wider question on the project direction and philosophy. Today, for better or worse, we use many different libraries which support pluggable backends and have tons of config options (think all the oslo stuff - coordination backends, kombu, etc.).

But yeah, even if libraries we use support that, this of course doesn't mean we directly or officially support that (aka advertise it in the dos).

Copy link
Contributor

@blag blag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One suggested change just in case Kombu adds support for more compression algorithms in the future, but this is also fine without it.

st2common/st2common/config.py Show resolved Hide resolved
@blag
Copy link
Contributor

blag commented Apr 21, 2021

Okay, yeah, wow, this isn't that big of change. Almost all of this PR is adding/tweaking tests, and adding the microbenchmarks.

If all this is doing is passing the compression option to kombu, then you're right, the complexity won't live in our codebase (aside from the tests and microbenchmarks, but those don't face end users).

Since Kombu might add more compression options in the future, it might be useful to not only explicitly list common valid compression options, but to also link them to the kombu.compression module where compression algorithms are...not quite listed...but are included - see my suggested change.

@Kami Kami dismissed nmaludy’s stale review April 22, 2021 11:05

Not an issue - we bundle all the dependencies.

@Kami Kami merged commit 78ae28e into master Apr 22, 2021
@Kami Kami deleted the message_bus_compression_support branch April 22, 2021 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rabbitmq size/XL PR that changes 500-999 lines. Consider splitting work into several ones that easier to review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Benchmark and prototype compressing message bus payloads
3 participants