Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Blocklists are sent uncompressed by Cloudflare reverse proxy #570

Closed
afontenot opened this issue Sep 20, 2022 · 7 comments
Closed

Blocklists are sent uncompressed by Cloudflare reverse proxy #570

afontenot opened this issue Sep 20, 2022 · 7 comments

Comments

@afontenot
Copy link

Following up on an issue I originally pointed out here: #564 (comment)

The lack of compression means users on metered connections have to download more than twice as much data as would otherwise be required for blocklists.

Test:

curl -sI --compressed https://download.rethinkdns.com/trie | grep content-encoding

Copying the blocklist file to a domain I control with gzip enabled in the Nginx config shows that the command works correctly.

The issue seems to be Cloudflare's process for deciding which files to compress, which relies on mimetype. Since the blocklist file (correctly) doesn't have any mimetype indicating compressibility, Cloudflare will decompress the file sent from the origin server and send it to the client uncompressed.

Some possible workarounds:

  • If all RethinkDNS versions and platforms support HTTP compression (likely), then you can set cache-control: no-transform on your origin server and Cloudflare should cache the gzipped file unaltered. You might also look at something like gzip_static for Nginx which would allow you to get the best compression ratios with e.g. Zopfli.

  • You could rely on decompression in the client application rather than at the protocol level. You might consider using something like zstd, which would give better compression ratios than gzip and might actually be faster than doing it at the protocol level due to extremely fast decompression for zstd.

@ignoramous
Copy link
Collaborator

ignoramous commented Sep 21, 2022

Hm, now that I recall... it got tricky to serve gzip / br compressed application/octet-stream from Cloudflare, and hence opted against it. I don't remember what the issue was but it may have been related to forcing the http-clients to treat the blob as a "file download".

Either way, try https://download.rethinkdns.com/trie?compressed which streams the blob as a application/wasm that is auto br compressed by Cloudflare (it didn't compress much in my runs).


On my pc, a gzip compresses the trie by 50% (60M -> 30M):

➜  ls -hltr trie*
-rw-rw-r-- 1 32M Sep 21 22:38 trie.bin.gz
-rw-rw-r-- 1 61M Sep 21 22:38 trie.bin

@ignoramous
Copy link
Collaborator

Btw, there's no server serving these downloads, Cloudflare Worker streams blob from AWS CloudFront backed by Amazon S3 (soon moving to Cloudflare R2).

@afontenot
Copy link
Author

Either way, try https://download.rethinkdns.com/trie?compressed which streams the blob as a application/wasm that is auto br compressed by Cloudflare (it didn't compress much in my runs).

Works for me, I see brotli compression (at the protocol level). If changing the mimetype doesn't cause any issues I say go for it.

(it didn't compress much in my runs)

This "on the fly" compression always gets a worse ratio than statically compressing every time, because you have to worry about compression time. Nevertheless, download size is reduced from 63.4 MB to 32.3 MB, which is almost 50%. I'd say that's substantial savings for those on metered connections. (Here in the US, 60 MB would cost me $0.60 over LTE, making it cost prohibitive to download regular blocklist updates - say, every 24 hours.)

Statically, you can do much better:

$ brotli -q 11 trie.bin
$ gzip -9 --keep trie.bin
$ xz -9 -k trie.bin
$ zstd --ultra -22 trie.bin
$ ls -hl --si trie.bin*
-rw-r--r-- 1 afontenot afontenot 64M Sep  5 06:31 trie.bin
-rw-r--r-- 1 afontenot afontenot 24M Sep  5 06:31 trie.bin.br
-rw-r--r-- 1 afontenot afontenot 33M Sep  5 06:31 trie.bin.gz
-rw-r--r-- 1 afontenot afontenot 24M Sep  5 06:31 trie.bin.xz
-rw-r--r-- 1 afontenot afontenot 25M Sep  5 06:31 trie.bin.zst

So brotli for example goes from 32M to 24M, a pretty huge win for static compression. All of them (except xz, included only for reference) decompress quite fast. Zstd is the fastest by a fair margin.

@ignoramous
Copy link
Collaborator

Nevertheless, download size is reduced from 63.4 MB to 32.3 MB, which is almost 50%.

For some reason, the download size remains the same for me. There's no change... How are you testing this? I tried looking up data transfer in Firefox's Network console and with wget -a, both of which indicate 60M+ transfers (or may be I have been reading the outputs incorrectly all along).

Zstd is the fastest by a fair margin.

We'd have to uncompress these files on a myraid of Android's...which is what worries me (even if misplaced).

@afontenot
Copy link
Author

afontenot commented Sep 21, 2022

Nevertheless, download size is reduced from 63.4 MB to 32.3 MB, which is almost 50%.

For some reason, the download size remains the same for me. There's no change... How are you testing this?

I think you are reading the outputs incorrectly. Testing in Firefox, I get a compressed file. My internet is only 100 Mbps, so that's slow enough to see clearly what's happening.

When the file is compressed on the fly, the content-length header isn't set (because the server doesn't know what the final size of the file will be). When you download without the ?compressed, it is sent and therefore you get a working progress bar ("5 seconds remaining") in Firefox. With ?compressed Firefox just says "unknown time remaining". It inserts the real size of the file (60.47 MiB) after the file is downloaded. How can you tell it's working (compressed)? In my case, I can tell because the download speed is > 20 MB/sec, almost double the maximum possible transfer rate for my connection. So instead of showing the true download speed and compressed file size, Firefox shows an inflated download speed and the original file size.

You mention wget, but that doesn't work. You need wget --compression=auto. By default, wget doesn't send the header that the server uses to determine that the client supports protocol level compression. With the command above, I see the compressed size of the download while it's in progress (opposite of Firefox's approach).

My preferred testing method is curl -JORL --compressed -vv which also prints the headers. In this case I see content-encoding: br in the reply headers (indicating brotli compression). You can also see this in the Firefox network console.

Zstd is the fastest by a fair margin.

We'd have to uncompress these files on a myraid of Android's...which is what worries me (even if misplaced).

If you enable protocol level compression ("content encoding"), you pay that price anyway, as the HTTP library on the client is doing the decompression for you. Zstd supports streaming decompression so you can just decompress the bytes as they're downloaded over the wire. If anything I'd expect it to be faster than HTTP with gzip streaming compression across basically all devices.

@ignoramous
Copy link
Collaborator

ignoramous commented Sep 22, 2022

My internet is only 100 Mbps

100Mbps ought to be enough for anybody ;)

When the file is compressed on the fly, the content-length header isn't set (because the server doesn't know what the final size of the file will be). When you download without the ?compressed, it is sent and therefore you get a working progress bar ("5 seconds remaining") in Firefox.

Yep. I can't remember why but I believe that absence of content-length caused issues with Android's Download Manager or some such, which is why we moved to application/octet-stream.

Thanks. I see the compressed output with wget (which is around ~30M) that doesn't lie unlike Firefox does.

wget --compression=auto "https://download.rethinkdns.com/trie?compressed=true" -a /tmp/wget.log && less /tmp/wget.log

-2022-09-22 01:42:07--  https://download.rethinkdns.com/trie?compressed
HTTP request sent, awaiting response... 200 OK
Length: 63409270 (60M) [application/wasm]
Saving to: 'trie?compressed'
...
...
...
31750K .......... .......... .......... ....                  10.8M=8.0s

2022-09-22 21:45:01 (3.87 MB/s) - 'trie?compressed' saved [63409270]

Zstd supports streaming decompression so you can just decompress the bytes as they're downloaded over the wire.

The problem is the native zstd lib (we'd have to bundle one for each arch), the jni-overhead, and the additional code that we'd have to write and maintain... :D The AOSP repositories do have zstd built-from-source, but it is unclear if one can dynamically link to it.

I'll switch to downloading from the ?compressed endpoint on the app for v053k (the next release due in a few days) to see how it goes.

@ignoramous
Copy link
Collaborator

Part of #573

Blocklists will be downloaded from ?compressed endpoint from here on. Thanks @afontenot for pushing us to reevaluate our choices and help Rethink be better! Appreciate it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants