-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gzip filter: make sure the stream ends properly #3025
Conversation
This patch adds additional call of Z_FINISH to properly ends gzip stream. Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@dio I'll take a look at. Thank you for fixing it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks for fixing. Will defer to @gsagula for review but some quick comments.
const int result = deflate(zstream_ptr_.get(), Z_FINISH); | ||
updateOutput(output_buffer); | ||
if (result == Z_OK || result == Z_BUF_ERROR) { | ||
finish(output_buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer to do this in a loop vs recursively (even though I know the recursion depth will be small).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do something similar to deflateNext()
:
while (deflateNext(Z_FINISH)) {
if (zstream_ptr_->avail_out == 0) {
updateOutput(output_buffer);
}
}
... and then handle the logic inside deflateNext(..):
const int result = deflate(zstream_ptr_.get(), flush_state);
switch (flush_state) {
case Z_FINISH:
if (result != Z_OK && result != Z_BUF_ERROR) {
RELEASE_ASSERT(result == Z_STREAM_END);
return false;
}
default:
if (result == Z_BUF_ERROR && zstream_ptr_->avail_in == 0) {
return false; // This means that zlib needs more input, so stop here.
}
RELEASE_ASSERT(result == Z_OK);
}
return true;
Just a suggestion, but it is up to you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. This is nicer. 👍
@@ -139,11 +139,15 @@ FilterDataStatus GzipFilter::encodeData(Buffer::Instance& data, bool end_stream) | |||
|
|||
if (end_stream) { | |||
compressor_.flush(compressed_data_); | |||
compressor_.finish(compressed_data_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't look at this code much at all, but it seems like flush/finish finish can be logically combined? Shouldn't finish implicitly flush before finishing?
@dio My apologies for missing the GH notifications about the issue and, thank you again for this fix! It looks great, just a few comments. |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One problem that I found when implementing this extension using Z_FINISH
in combination with transfer-encoding chunked
, is that large payloads cause the stream to fail. The only way that I could make this to work was using Z_SYNC_FLUSH
, or buffering the entire stream, compressing it and applying Z_FINISH. I'm not one hundred percent sure why, but I would recommend trying to stream something over 100MB just for the sake. I'll try testing this use-case a bit later if I have a chance. Thanks!
// While Z_OK, Z_BUF_ERROR (covered in above case) and Z_STREAM_END are the acceptable values, | ||
// the Z_STREAM_ERROR is only possible if the stream is not initialized properly. Ref: | ||
// https://zlib.net/zlib_how.html. | ||
RELEASE_ASSERT(result == Z_STREAM_END); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's important to have testing coverage here too.
@@ -82,5 +82,28 @@ void ZlibCompressorImpl::updateOutput(Buffer::Instance& output_buffer) { | |||
zstream_ptr_->next_out = chunk_char_ptr_.get(); | |||
} | |||
|
|||
void ZlibCompressorImpl::reset() { | |||
const bool result = deflateReset(zstream_ptr_.get()); | |||
RELEASE_ASSERT(result == Z_OK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some death tests to exercise this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will try to have it.
const int result = deflate(zstream_ptr_.get(), Z_FINISH); | ||
updateOutput(output_buffer); | ||
if (result == Z_OK || result == Z_BUF_ERROR) { | ||
finish(output_buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do something similar to deflateNext()
:
while (deflateNext(Z_FINISH)) {
if (zstream_ptr_->avail_out == 0) {
updateOutput(output_buffer);
}
}
... and then handle the logic inside deflateNext(..):
const int result = deflate(zstream_ptr_.get(), flush_state);
switch (flush_state) {
case Z_FINISH:
if (result != Z_OK && result != Z_BUF_ERROR) {
RELEASE_ASSERT(result == Z_STREAM_END);
return false;
}
default:
if (result == Z_BUF_ERROR && zstream_ptr_->avail_in == 0) {
return false; // This means that zlib needs more input, so stop here.
}
RELEASE_ASSERT(result == Z_OK);
}
return true;
Just a suggestion, but it is up to you :)
@@ -81,6 +81,18 @@ class ZlibCompressorImpl : public Compressor { | |||
*/ | |||
uint64_t checksum(); | |||
|
|||
/** | |||
* This is required to re-init the stream after calling finish(_). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think deflateReset()
does not deallocate memory. It just reset the counters and returns the stream results. Can you add a line to the comment explaining this?
From zlib manual:
This function is equivalent to deflateEnd followed by deflateInit, but does
not free and reallocate the internal compression state. The stream will leave
the compression level and any other attributes that may have been
set unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
@gsagula yes, that came into my mind while proposing this PR. Will try to have a test on it. |
Confirmed to be working with streaming file > 100MB. Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@gsagula, so I have this simple test server in node (sorry for some const Server = require("http").Server;
const fs = require("fs");
const server = new Server((req, res) => {
return fs
.createReadStream(req.url == "/book" ? "book.pdf" : "a.txt")
.pipe(res);
});
server.listen(1234); Then,
Seems fine. |
@dio That's awesome! I will do a final pass and some more manual testing. ps. (Js and Crypto) are the two words that make things popular these days :) |
@@ -41,7 +84,14 @@ class ZlibCompressorImplDeathTest : public ZlibCompressorImplTest { | |||
TEST_F(ZlibCompressorImplDeathTest, CompressorTestDeath) { | |||
EXPECT_DEATH(compressorBadInitTestHelper(100, 8), std::string{"assert failure: result >= 0"}); | |||
EXPECT_DEATH(compressorBadInitTestHelper(31, 10), std::string{"assert failure: result >= 0"}); | |||
EXPECT_DEATH(unitializedCompressorTestHelper(), std::string{"assert failure: result == Z_OK"}); | |||
EXPECT_DEATH(uninitializedCompressorTestHelper(), std::string{"assert failure: result == Z_OK"}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding more death tests.
compressed_slices[num_comp_slices - 1].len_); | ||
// FOOTER four-byte sequence (sync flush) | ||
EXPECT_EQ("0000ffff", footer_hex_str.substr(footer_hex_str.size() - 8, 10)); | ||
expectValidCompressedBuffer(output_buffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks much cleaner!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I will defer to @dnoe if he also wants to take a quick look. Thanks for the fix.
@@ -68,18 +68,41 @@ void ZlibCompressorImpl::process(Buffer::Instance& output_buffer, int64_t flush_ | |||
} | |||
|
|||
if (flush_state == Z_SYNC_FLUSH) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/question: IMO it's a bit strange to finish in the context of a sync_flush operation. They are distinct operations from the deflate perspective. Since I think now when we call flush() we actually want to finish(), should we remove the flush() method for now and just add a direct finish() method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattklein123 yes, from my test it works fine to just calling finish()
from the filter.
Since the flush()
would not be used anymore, I think I need to: update the tests as well. Since the current tests are around Z_SYNC_FLUSH
.
Or should we just keep the flush()
method in ZlibCompressorImp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer we delete methods we don't actually need. We can always add them back later. @gsagula WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mattklein123 I'm in favor of deleting any method that is not being used. @dio It makes sense to replace flush() with finish(), but we still use Z_SYNC_FLUSH internally, correct? FYI, as soon as I get a chance I would like to revisit this entire implementation anyway, so don't worry too much about the design.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to replace
flush()
withfinish()
, but we still useZ_SYNC_FLUSH
internally, correct?
I think without Z_SYNC_FLUSH
-ing it is fine. Z_SYNC_FLUSH
adds that 0000ffff
right before crc32
and isize
, which probably it is not necessary?
Will push a commit and let you know.
FYI, as soon as I get a chance I would like to revisit this entire implementation anyway, so don't worry too much about the design.
Fantastic! Probably we can include deflate and deflateRaw as well 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think without Z_SYNC_FLUSH-ing it is fine. Z_SYNC_FLUSH adds that 0000ffff right before crc32 and isize, which probably it is not necessary?
@dio Good point, but I think we still need sync for byte padding (I need to look into it). Here is what others have done. It seems that Z_SYNC_FLUSH is still broadly used.
Fantastic! Probably we can include deflate and deflateRaw as well
I think to do one for Brotli as well would be cool. Some interesting use-cases can be explored when multiple compression types are available :)
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dio Did you see my comment about Z_SYNC_FLUSH? I'm not sure if we can remove it. I'll need to look into it.
@gsagula sorry for missing that. Will revert back and investigate. |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
@@ -42,10 +42,10 @@ void ZlibCompressorImpl::compress(Buffer::Instance& buffer, bool trailer) { | |||
zstream_ptr_->avail_in = input_slice.len_; | |||
zstream_ptr_->next_in = static_cast<Bytef*>(input_slice.mem_); | |||
process(buffer, Z_NO_FLUSH); | |||
buffer.drain(input_slice.len_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that you found a way to use no_flush. Did you check if it works with multiple slices?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a test in here a291b5c.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dio! Just couple comments. I would also make sure that the following use cases work:
- stream a large text file and save it in .gz format (the actual bug)
- stream a large text to clients (browser/curl --compressed)
- stream a non-compressible file (it actually expands)
RELEASE_ASSERT(result == Z_STREAM_END); | ||
return false; | ||
} | ||
default: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this logic work with any other zlib flush state, e.g. Z_PARTIAL_FLUSH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will definitely check this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsagula the Z_PARTIAL_FLUSH
works confirmed by a quick manual test for a small and big file stream from upstream. (the unit tests obviously need to be adapted, do you want me to make it an option i.e. either to use SYNC_FLUSH or PARTIAL_FLUSH?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I was just wondering if other states also work since the currently logic will default to any that is not Z_FINISH. Thanks for testing it.
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
I have manually tested it. And I think it is passed.
I have manually tested it. And I think it is passed.
I can see that currently, the filter by default compresses the stream (if the accept-encoding is required and transferred as chunks). I think we should by default don't compress and only care for whitelisted content-type. I can do detection (of the magic numbers) but it seems too much and will introduce false-positive? |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
I'm not sure I follow this one. Would mind to elaborate it, please?
I think checking for the magic number is still not reliable. One common idiom is to buffer the entire response, compress it and, verify that didn't expand. However, this approach would have other side-effects such as latency and memory. For now, we just compress anything and assume that most of the time this filter will be receiving compressible content. |
@gsagula e.g. having this: http_filters:
- name: envoy.gzip
config:
memory_level: 9
compression_level: BEST when a client asks for |
I don't think so, it validates against the default ist if none is specified in the config:
|
@gsagula Hum, it seems my test server doesn't send Content-Type. Haha. So it is being hit by Sorry for false alarm. So yeah, I think I have your concerns tested now. |
ASSERT_EQ(0, buffer.length()); | ||
|
||
compressor.compress(buffer, Compressor::State::Finish); | ||
ASSERT_GE(10, buffer.length()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a nice helper! Can you add a small comment explaining where this 10 is coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think this one is part of my debugging artifacts. Guess it should be 6, but I will remove it for now to avoid confusion.
sample.c_str(), sample.size(), | ||
[](const void*, size_t, const Buffer::BufferFragmentImpl* frag) { delete frag; }); | ||
|
||
buffer.addBufferFragment(*frag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use this for all the other tests too, but no worries for now. We can do that in a separated PR when revisiting this implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
* @param bytes input bytes. | ||
* @return Type flip-ordered bytes. | ||
*/ | ||
template <class Type> static Type flipOrder(const Type& bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://github.com/dio/envoy/blob/3f3a1757b08128dcd7f21c9a6980c5d4c9f33e69/test/common/compressor/zlib_compressor_impl_test.cc#L60 to check the correctness of stored ISIZE
after a buffer is Z_FINISH
-ed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh sorry, I didn't see.
LGTM, Thanks for correcting it. Just one minor comment. |
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just a small question/comment and also needs a master merge.
* @param input_buffer supplies the buffer with data to be compressed. | ||
* @param output_buffer supplies the buffer to output compressed data. | ||
* Compresses data buffer. | ||
* @param buffer supplies the reference to data to be compressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you specify that the buffer contents will be replaced inline with compressed data?
|
||
for (const Buffer::RawSlice& input_slice : slices) { | ||
zstream_ptr_->avail_in = input_slice.len_; | ||
zstream_ptr_->next_in = static_cast<Bytef*>(input_slice.mem_); | ||
process(output_buffer, Z_NO_FLUSH); | ||
process(buffer, Z_NO_FLUSH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add an ASSERT into process
that confirms that in the case of Z_NO_FLUSH no data is added to the buffer? Is that always guaranteed? I'm still a little concerned about the in/out buffer behavior here. Otherwise can you add more comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out if with RLE, SPEED and 1 as memory level, I can see a case when data is being added to the buffer when flush_state == Z_NO_FLUSH. Will try to debug it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @mattklein123 You probably saw something that we didn't see. Question, even if data is added to the buffer, what the consequences would be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will work, because you add data to the end, and you drain from the beginning. Also, you snap the slices before you start adding, so you should never drain compressed data. However, it's not super intuitive how this works, so if you keep it this way I would add a bunch of comments (I think it's probably fine w/ comments).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that this is not the most intuitive design. I'm planning on revisiting this implementation as soon as I get some stuff out of my way. What @dio and I agreed, was to go for correctness first. We will add the comments though. Thanks for the input.
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Signed-off-by: Dhi Aurrahman <dio@rockybars.com>
Thanks, @mattklein123! I have added the comments and done a master merge. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We suspect that the gzip filter is under performing due to excessive flushes. The changes introduced in envoyproxy#3025 -- which were an attempt to fix envoyproxy#2909 -- are causing too many flushes (one per encodeData() call) to ensure the compression ise properly finished. I think at that point it was possible for encodeData() to be call with end_stream=false while encodeTrailers() wasn't handle so Z_FINISH could be missed. This isn't the case anymore, so there's no need to flush every time we encode a chunk of th response. We could make this configurable, but given it's a considerable perf regression _and_ that we'd still be doing the correct thing by just calling Z_FINISH when end_stream=true or from encodeTrailers(), I don't think it's worth it. This also adds stats for visibility. Helps with envoyproxy#8448. Signed-off-by: Raul Gutierrez Segales <rgs@pinterest.com>
gzip filter: make sure the stream ends properly
This patch adds additional call of Z_FINISH to properly ends gzip
stream. This also provides a reset functionality since an ended stream
needs to be re-init'ed.
Risk Level: Low
Testing:
Docs Changes:
N/A
Release Notes:
N/A
Fixes #2909
Signed-off-by: Dhi Aurrahman dio@rockybars.com