-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUMM-1679 Compress HTTP body using deflate
(ETF RFC 1950)
#626
Conversation
9e949eb
to
cbdb5c8
Compare
// |CMF|FLG| | ||
// +---+---+ | ||
// ref. https://datatracker.ietf.org/doc/html/rfc1950#section-2.2 | ||
let header = Data([0x78, 0x5e]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these flags randomly generated ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is coming from mw99/DataCompression which is using the same algorithm defined in the Apple's Compression
framework: COMPRESSION_ZLIB
return nil | ||
} | ||
|
||
var bytes = UInt32(checksum).bigEndian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed ? is adler32
using littleEndian
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM is most likely little-endian. In anyways, .bigEndian
will only reverse bytes if necessary and this is needed to convert this UInt32 in network byte order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I am not the expert here :)
c5d1d6d
to
d0c7231
Compare
d0c7231
to
90b101c
Compare
|
||
// The Adler-32 checksum should be initialized to 1 as described in | ||
// https://datatracker.ietf.org/doc/html/rfc1950#section-8 | ||
return zlib.adler32(1, ptr, uInt(data.count)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this method seems to be macOS only
how does it work in iOS? where does it come from? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This documentation link is for libkernel
which also includes an implementation of zlib.
Here, I'm using adler32
method from zlib
library which is documented here. Both have the same signature, and probably the same implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job, very well explained and super clear 👌. I left few comments, with the test Data
randomness being the most important one.
Out of curiosity, do we have any benchmarks on the compression stuff? What could be the estimated time impact on building the request?
return mockRepeating(byte: 0x41, times: Int(size)) | ||
} | ||
|
||
static func mockRandom<Size>(ofSize size: Size) -> Data where Size: BinaryInteger { | ||
return mockRepeating(byte: .random(in: 0x00...0xFF), times: Int(size)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "randomness" of this Data
is very low as it just repeats a single, random byte given number of times. It doesn't seem enough to reliably test compression/decompression flow. How about having something more fancy here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely, good catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ncreated, I've been adding a better randomisation using SecRandomCopyBytes BUT all compression started failing.. So I've been reading more about the subject and discovered that random data (especially cryptographically strong random numbers) cannot be compressed. In this wikipedia article you can read:
The primary encoding algorithms used to produce bit sequences are Huffman coding (also used by the deflate algorithm) and arithmetic coding. Arithmetic coding achieves compression rates close to the best possible for a particular statistical model, which is given by the information entropy, whereas Huffman compression is simpler and faster but produces poor results for models that deal with symbol probabilities close to 1.
In particular, files of random data cannot be consistently compressed by any conceivable lossless data compression algorithm; indeed, this result is used to define the concept of randomness in Kolmogorov complexity.[18]
It is probably impossible to create an algorithm that can losslessly compress any data
In conclusion, a good randomised array of bytes won't be compressible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can create such random strings by selecting randomElement()
s from a pool of keywords
along these lines:
let keywords = ["foo", "bar", ...]
let randomString = (1...100).map { _ in keywords.randomElement()! } // this should be much longer than keywords.count
let compressed = zip(randomString)
// do assertions
that's also closer to the actual payloads where the same keys are usually repeated multiple times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice finding @maxep , I didn't know that at all 🙂. And I agree with @buranmert - we can significantly improve it by just picking a random byte N times, instead of picking a random byte and repeating it N times.
We have some existing basic character sets we can reuse if we want to focus on Strings
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
picking a random byte N times, instead of picking a random byte and repeating it N times.
Yes, but that's still fails most of the time. Instead, I've added an object to encode with random properties. This perform a good randomisation with enough redundancy to allow compression.
Tests/DatadogTests/Datadog/Core/Upload/DataCompressionTests.swift
Outdated
Show resolved
Hide resolved
a7ddc95
to
745f83b
Compare
@ncreated In regards of a benchmark, we have 2 metrics to consider: the compression rate and the compression speed. In any case, The I'm not able to find more info from Apple, the only resource I found is this wwdc2015 sessions where they introduce |
I was thinking about some basic stuff, like measuring time-to-create-request - just for sanity checking that it doesn't take alarmingly more time now 🙂. Here I ran
I can see it's up to 5x slower - seems reasonable 👍. |
b907a4f
to
5cbb1e3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 💯
What and why?
Apply HTTP body compression using
deflate
data format as described in IETF RFC 1950How?
Using the
zlib
structure (defined in IETF RFC 1950) with theDEFLATE
compression algorithm (defined in IETF RFC 1951).The
DEFLATE
implementation usescompression_encode_buffer(_:_:_:_:_:_:)
from theCompression
framework by allocating a destination buffer of source size and copying the result into aData
structure. In the worst possible case, where the compression expands the data size, the destination buffer becomes too small and deflation returnsnil
.The
Adler32
checksum is performed by using theadler32(_:_:_)
fromzlib
library.After successful compression,
RequestBuilder
will set the request'shttpBody
with the deflated data and set theContent-Encoding
header value todeflate
. In case of compression failure, thehttpBody
will fallback to the uncompressed data.Unit Tests
ServerMock
is now provided withunzip
andinflate
methods to automatically decompresshttpBody
during tests.Integration Tests
The
http-server-mock
python server now inflates request body when necessary.Review checklist