Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize EBML encoding #2743

Closed
catamorphism opened this issue Jun 28, 2012 · 9 comments · Fixed by #22971
Closed

Optimize EBML encoding #2743

catamorphism opened this issue Jun 28, 2012 · 9 comments · Fixed by #22971
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-low Low priority

Comments

@catamorphism
Copy link
Contributor

"optionally perform 'relaxations' on end_tag to more efficiently encode sizes; this is a fixed point iteration" (from std::ebml)

@pnkfelix
Copy link
Member

(This note doesn't really belong here, but it was what got me on the subject during via bug triage.)

By the way, I don't know which EBML specification we claim to adhere to, but the link at the top of ebml.rs is to the v1.0 RFC for EBML (dating from 2004), which is not only typo-ridden and (IMO) difficult to read, but also, most importantly, way way out of date, as described on this matroska-devel email from September 2012.

Unfortunately, that email post did not actually provide a recommendation for which spec to use.

  • Here's one option: http://ebml.sourceforge.net/specs/
  • This pdf seems like it might provide an accurate discussion of EBML; it was last updated in 2009, which is five years better than the v1.0 RFC. But it also is broader than EBML, since it covers the whole Matroska spec.

(Or maybe it does not matter what variant of the spec we use; the more recent specs seem quite tied to Matroska specifics, while the v1.0 spec explicitly says it should not be used for an implementation.)

@pnkfelix
Copy link
Member

visiting for bug triage, email from 2013-08-26.

nothing to add, though I still wonder about my spec question from two months ago.

@nikomatsakis
Copy link
Contributor

I tend to think there is no reason to use EBML at all. I would prefer to just shift to a very compact encoding.

@pnkfelix
Copy link
Member

Accepted for P-low.

@esummers
Copy link

esummers commented Mar 6, 2014

EBML (and similar formats: msgpack, protocol buffers, etc) are nice, but if there is a move away from EBML it might be interesting to look at something similar to what Cap'n Proto uses. Everything is aligned and unpacked (in some cases maybe just a memcpy) and a second pass uses a simple compression algorithm to remove the zeros. http://kentonv.github.io/capnproto/encoding.html

@steveklabnik
Copy link
Member

triage: removing libs, since we aren't exposing this publicly.

bors added a commit that referenced this issue Mar 3, 2015
This is a series of individual but correlated changes to the metadata format. The changes are significant enough that it (finally) bumps the metadata encoding version. In brief, they altogether reduce the total size of stage1 binaries by 27% (!!!!). Almost every low-hanging fruit has been considered and fixed; see the individual commits for details.

Detailed library (not just metadata) size changes for x86_64-unknown-linux-gnu stage1 binaries (baseline being 3a96d6a):

````
   before     after  delta path
--------- --------- ------ --------------------------------
  1706146   1050412  38.4% liballoc-4e7c5e5c.rlib
   398576    152454  61.8% libarena-4e7c5e5c.rlib
    71441     56892  20.4% libarena-4e7c5e5c.so
 14424754   5084102  64.8% libcollections-4e7c5e5c.rlib
 39143186  14743118  62.3% libcore-4e7c5e5c.rlib
   195574    188150   3.8% libflate-4e7c5e5c.rlib
   153123    152603   0.3% libflate-4e7c5e5c.so
   477152    215262  54.9% libfmt_macros-4e7c5e5c.rlib
    77728     66601  14.3% libfmt_macros-4e7c5e5c.so
  1216936    684104  43.8% libgetopts-4e7c5e5c.rlib
   207846    181116  12.9% libgetopts-4e7c5e5c.so
   349722    147530  57.8% libgraphviz-4e7c5e5c.rlib
    60196     49197  18.3% libgraphviz-4e7c5e5c.so
   729842    259906  64.4% liblibc-4e7c5e5c.rlib
   349358    247014  29.3% liblog-4e7c5e5c.rlib
    88878     83163   6.4% liblog-4e7c5e5c.so
  1968508    732840  62.8% librand-4e7c5e5c.rlib
  1968204    696326  64.6% librbml-4e7c5e5c.rlib
   283207    206589  27.1% librbml-4e7c5e5c.so
 72369394  46401230  35.9% librustc-4e7c5e5c.rlib
 11941372  10498483  12.1% librustc-4e7c5e5c.so
  2717894   1983272  27.0% librustc_back-4e7c5e5c.rlib
   501900    464176   7.5% librustc_back-4e7c5e5c.so
    15058     12588  16.4% librustc_bitflags-4e7c5e5c.rlib
  4008268   2961912  26.1% librustc_borrowck-4e7c5e5c.rlib
   837550    785633   6.2% librustc_borrowck-4e7c5e5c.so
  6473348   6095470   5.8% librustc_driver-4e7c5e5c.rlib
  1448785   1433945   1.0% librustc_driver-4e7c5e5c.so
 95483688  94779704   0.7% librustc_llvm-4e7c5e5c.rlib
 43516815  43487809   0.1% librustc_llvm-4e7c5e5c.so
   938140    817236  12.9% librustc_privacy-4e7c5e5c.rlib
   182653    176563   3.3% librustc_privacy-4e7c5e5c.so
  4390288   3543284  19.3% librustc_resolve-4e7c5e5c.rlib
   872981    831824   4.7% librustc_resolve-4e7c5e5c.so
 1817642  14795426  18.6% librustc_trans-4e7c5e5c.rlib
  3657354   3480026   4.8% librustc_trans-4e7c5e5c.so
 16815076  13868862  17.5% librustc_typeck-4e7c5e5c.rlib
  3274439   3123898   4.6% librustc_typeck-4e7c5e5c.so
 21372308  14890582  30.3% librustdoc-4e7c5e5c.rlib
  4501971   4172202   7.3% librustdoc-4e7c5e5c.so
  8055028   2951044  63.4% libserialize-4e7c5e5c.rlib
   958101    710016  25.9% libserialize-4e7c5e5c.so
 30810208  15160648  50.8% libstd-4e7c5e5c.rlib
  6819003   5967485  12.5% libstd-4e7c5e5c.so
 58850950  31949594  45.7% libsyntax-4e7c5e5c.rlib
  9060154   7882423  13.0% libsyntax-4e7c5e5c.so
  1474310   1062102  28.0% libterm-4e7c5e5c.rlib
   345577    323952   6.3% libterm-4e7c5e5c.so
  2827854   1643056  41.9% libtest-4e7c5e5c.rlib
   517811    452519  12.6% libtest-4e7c5e5c.so
  2274106   1761240  22.6% libunicode-4e7c5e5c.rlib
--------- --------- ------ --------------------------------
499359187 363465583  27.2% total
````

Some notes:

* Uncompressed metadata compacts very well. It is less visible for compressed metadata but still it achieves about 5~10% reduction.
* *Every* commit is designed to reduce the metadata in one way. There is absolutely no negative impact associated to changes (that's why the table above doesn't contain a minus delta).
* I've confirmed that this compiles through `make all`, making it almost correct. Other platforms have to be tested though.
* Oh, I'll rebase this as soon as I have spare time, but I guess this needs an extensive review anyway.
* I haven't rigorously checked the encoder and decoder performance. I tried to minimize the impact (some encodings are actually simpler than the original), but I'm not sure.

Fixes #2743, #9303 (partially) and #21482.
@lattice0
Copy link

What is std::ebml?

@steveklabnik
Copy link
Member

@LucasZanella this bug is from 2012. It's something that existed then, but does not exist now.

@lattice0
Copy link

@LucasZanella this bug is from 2012. It's something that existed then, but does not exist now.

I was searching for ebml libs on Rust. Was this one?

RalfJung pushed a commit to RalfJung/rust that referenced this issue Dec 28, 2022
Aaron1011 pushed a commit to Aaron1011/rust that referenced this issue Jan 6, 2023
celinval pushed a commit to celinval/rust-dev that referenced this issue Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. I-compiletime Issue: Problems and improvements with respect to compile times. P-low Low priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants