-
Notifications
You must be signed in to change notification settings - Fork 13.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize EBML encoding #2743
Comments
(This note doesn't really belong here, but it was what got me on the subject during via bug triage.) By the way, I don't know which EBML specification we claim to adhere to, but the link at the top of ebml.rs is to the v1.0 RFC for EBML (dating from 2004), which is not only typo-ridden and (IMO) difficult to read, but also, most importantly, way way out of date, as described on this matroska-devel email from September 2012. Unfortunately, that email post did not actually provide a recommendation for which spec to use.
(Or maybe it does not matter what variant of the spec we use; the more recent specs seem quite tied to Matroska specifics, while the v1.0 spec explicitly says it should not be used for an implementation.) |
visiting for bug triage, email from 2013-08-26. nothing to add, though I still wonder about my spec question from two months ago. |
I tend to think there is no reason to use EBML at all. I would prefer to just shift to a very compact encoding. |
Accepted for P-low. |
EBML (and similar formats: msgpack, protocol buffers, etc) are nice, but if there is a move away from EBML it might be interesting to look at something similar to what Cap'n Proto uses. Everything is aligned and unpacked (in some cases maybe just a memcpy) and a second pass uses a simple compression algorithm to remove the zeros. http://kentonv.github.io/capnproto/encoding.html |
triage: removing libs, since we aren't exposing this publicly. |
This is a series of individual but correlated changes to the metadata format. The changes are significant enough that it (finally) bumps the metadata encoding version. In brief, they altogether reduce the total size of stage1 binaries by 27% (!!!!). Almost every low-hanging fruit has been considered and fixed; see the individual commits for details. Detailed library (not just metadata) size changes for x86_64-unknown-linux-gnu stage1 binaries (baseline being 3a96d6a): ```` before after delta path --------- --------- ------ -------------------------------- 1706146 1050412 38.4% liballoc-4e7c5e5c.rlib 398576 152454 61.8% libarena-4e7c5e5c.rlib 71441 56892 20.4% libarena-4e7c5e5c.so 14424754 5084102 64.8% libcollections-4e7c5e5c.rlib 39143186 14743118 62.3% libcore-4e7c5e5c.rlib 195574 188150 3.8% libflate-4e7c5e5c.rlib 153123 152603 0.3% libflate-4e7c5e5c.so 477152 215262 54.9% libfmt_macros-4e7c5e5c.rlib 77728 66601 14.3% libfmt_macros-4e7c5e5c.so 1216936 684104 43.8% libgetopts-4e7c5e5c.rlib 207846 181116 12.9% libgetopts-4e7c5e5c.so 349722 147530 57.8% libgraphviz-4e7c5e5c.rlib 60196 49197 18.3% libgraphviz-4e7c5e5c.so 729842 259906 64.4% liblibc-4e7c5e5c.rlib 349358 247014 29.3% liblog-4e7c5e5c.rlib 88878 83163 6.4% liblog-4e7c5e5c.so 1968508 732840 62.8% librand-4e7c5e5c.rlib 1968204 696326 64.6% librbml-4e7c5e5c.rlib 283207 206589 27.1% librbml-4e7c5e5c.so 72369394 46401230 35.9% librustc-4e7c5e5c.rlib 11941372 10498483 12.1% librustc-4e7c5e5c.so 2717894 1983272 27.0% librustc_back-4e7c5e5c.rlib 501900 464176 7.5% librustc_back-4e7c5e5c.so 15058 12588 16.4% librustc_bitflags-4e7c5e5c.rlib 4008268 2961912 26.1% librustc_borrowck-4e7c5e5c.rlib 837550 785633 6.2% librustc_borrowck-4e7c5e5c.so 6473348 6095470 5.8% librustc_driver-4e7c5e5c.rlib 1448785 1433945 1.0% librustc_driver-4e7c5e5c.so 95483688 94779704 0.7% librustc_llvm-4e7c5e5c.rlib 43516815 43487809 0.1% librustc_llvm-4e7c5e5c.so 938140 817236 12.9% librustc_privacy-4e7c5e5c.rlib 182653 176563 3.3% librustc_privacy-4e7c5e5c.so 4390288 3543284 19.3% librustc_resolve-4e7c5e5c.rlib 872981 831824 4.7% librustc_resolve-4e7c5e5c.so 1817642 14795426 18.6% librustc_trans-4e7c5e5c.rlib 3657354 3480026 4.8% librustc_trans-4e7c5e5c.so 16815076 13868862 17.5% librustc_typeck-4e7c5e5c.rlib 3274439 3123898 4.6% librustc_typeck-4e7c5e5c.so 21372308 14890582 30.3% librustdoc-4e7c5e5c.rlib 4501971 4172202 7.3% librustdoc-4e7c5e5c.so 8055028 2951044 63.4% libserialize-4e7c5e5c.rlib 958101 710016 25.9% libserialize-4e7c5e5c.so 30810208 15160648 50.8% libstd-4e7c5e5c.rlib 6819003 5967485 12.5% libstd-4e7c5e5c.so 58850950 31949594 45.7% libsyntax-4e7c5e5c.rlib 9060154 7882423 13.0% libsyntax-4e7c5e5c.so 1474310 1062102 28.0% libterm-4e7c5e5c.rlib 345577 323952 6.3% libterm-4e7c5e5c.so 2827854 1643056 41.9% libtest-4e7c5e5c.rlib 517811 452519 12.6% libtest-4e7c5e5c.so 2274106 1761240 22.6% libunicode-4e7c5e5c.rlib --------- --------- ------ -------------------------------- 499359187 363465583 27.2% total ```` Some notes: * Uncompressed metadata compacts very well. It is less visible for compressed metadata but still it achieves about 5~10% reduction. * *Every* commit is designed to reduce the metadata in one way. There is absolutely no negative impact associated to changes (that's why the table above doesn't contain a minus delta). * I've confirmed that this compiles through `make all`, making it almost correct. Other platforms have to be tested though. * Oh, I'll rebase this as soon as I have spare time, but I guess this needs an extensive review anyway. * I haven't rigorously checked the encoder and decoder performance. I tried to minimize the impact (some encodings are actually simpler than the original), but I'm not sure. Fixes #2743, #9303 (partially) and #21482.
What is std::ebml? |
@LucasZanella this bug is from 2012. It's something that existed then, but does not exist now. |
I was searching for ebml libs on Rust. Was this one? |
simplify path joining code a bit
simplify path joining code a bit
"optionally perform 'relaxations' on end_tag to more efficiently encode sizes; this is a fixed point iteration" (from
std::ebml
)The text was updated successfully, but these errors were encountered: