-
-
Notifications
You must be signed in to change notification settings - Fork 661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hxb #11504
Hxb #11504
Conversation
way too many false positives in tests from printing
but it's still not 100% working
Also run again a couple tests after caching type
The reader would die anyway
* use identity for field params * support partial generic expansion for fields * take off @:generic * try something for enum constructor type params * gencommon says no * communicate field type parameters by index again * surely anons are just a list of anon field refs * fix generic type param names * let's get dangerous * handle CBOs in HxbRestore too * well * chdir * handle warnings as CBOs * don't try to hxb import.hx because there's nothing in there * don't roundtrip on Cross and Eval way too many false positives in tests from printing * add CFLR, don't resolve fields like a crazy person * fix but it's still broken * stop the roundtrip nonsense for now, enough other problems * decode binary modules in check_display_file * remove flush_fields, should not be needed with CFLR * maybe fix generic type parameter naming * add ENFR too * pull stricter tanon changes and remove some debug * unused open * fix overloads a bit more but it's still not 100% working * fix generic type param naming a bit more * check for cf_overloads instead of CfOverloads The flag isn't set for ghetto overloads, so this should catch all relevant cases. * fix overloads even more * add enum when adding enum field ref * fix overloads for real * small optimization for the VERY common depth = 0 case * properly initialize typedef monos * map anons before identifying them * write tpp paths instead of names * remove some debuggery from the writer * embrace ttp * don't follow away Null<Void> in function returns * encode basic types directly I'm not sure if I want to keep this optimization, but the changes to com.basic seem like a good idea regardless. At the moment it might be possible to get a hold of the ominous `m` monomorph in Common.create, and binding that one to anything would lead to interesting results. * add ANFR for anon fields * Happy new year! * [typer] fix functional interface type parameter leak see HaxeFoundation#11390 * ANFR earlier * only write cf_expr_unoptimized if it differs from cf_expr * write texpr positions are bit smarter * add field contexts, hashcons expression type instances * [display] populate server/modules from hxb since there's only that atm * remove cf_expr_unoptimized optimization again because the hashconsing has already broken it We need a different solution for this silly field because it takes up a lot of space and time for very little gain. Maybe something like a binary diff to cf_expr would make sense here. * forward declare locals in texpr way less Hashtbling * be less awkward around ttps They're your friend! * purge some debug from the reader We never want to write to stderr because that hangs the display tests. * don't generate TLazy and remove random bool Also add comment about empty anons * make api an argument of the read function This is a prerequisite for making the reader reentrant, which should now be possible. * demo how continuations could work * reorder tables to something that makes sense to me * separate chunk classes more * add cl_type to avoid hundreds of ClassStatics anons * don't set anon fields too early * properly deal with ClassStatics in unification and field typing also convert unify_anons to something human-readable * bring back missing checks * add small ring cache per type kind * add more elaborate stats behind -D hxb.stats * move expr stuff out of the way because I keep scrolling past it * change mono byte to 1, avoid extra 1 byte for immediate type instance values * encode simple type instances immediately * potentially avoid some GC write barriers by splitting up pos * use less ignorant fast_eq * warnings * basic reader stats * minor cleanup * fix class field scoping * infer nested status from stack Did you know there can be anon fields inside enum fields? Crazy... * field type parameters 2.0 * put all texpr type instances in an array * change -bcp to --hxb-lib * create directory before creating file... * format stats output nicer because that's very important * rename hxml so I find it easier * encode Void directly because it actually appears a lot * reorganize type instance bytes, inline common list lengths * warnings * write blocks less awkwardly also don't read arrays only to turn them into lists * don't write i32 if it can be helped * pass current chunk to pos writer * add texpr stats * optimize TCall * optimize static calls * refactor pos writer slightly * optimize this.field * fix generator printing * enable disabled tests * remove overly fancy position optimization * write TInt as leb too * Revert "write TInt as leb too" This reverts commit ce6c8de. * let's see where we're at * check context before checking cache for display files see HaxeFoundation#11480 * stop leaving debug prints in your commits ffs * adjust test Calling exclude before saving means that the class is extern on the next iteration, and thus not generated. * dodge arcane sourcemap test problem * dodge more * change chunk to module in the writer * use SimnBuffer * remove roundtrip * avoid some object overhead in the reader * use correct v_id when reading from .hxb * [hxb] use cache hxb for --hxb too * don't generate aux output (including hxb) during diagnostics * warnings * only cache context when using compilation server * write chunk name before length, and remove crc The 4 bytes of the name don't contribute to the length, so this makes more sense. * minor refactor to make interface a bit saner * skip hxb generation on display too * warnings * move module_cache to HxbData That might not be the ideal place for it either but it's better than tType.ml * rework reader interface a bit as well * read metadata in forward data see go2hx/go2hx#174 * read anon field meta too * persist ordered list of chunks instead of one big bytes thing * remove feature nonsense and revert meta change * warnings * fix server/module, get all data from hxb cache directly * align chunk order * [hxb] don't overwrite taint reason with check_display_file * [server] we're sending cacheState, not 'dirty' * [tests] update test * [hxb] generate proper warnings for unbound type parameters * Simplify unbound ttp positions; not like they're often useful anyway... * ocaml syntax is hard * anon fields can have @:overload too... * use singular empty anon * slightly improve field vs. overloads handling * remove some debug because hxb is perfect * [TYPF] skip module name, only write pack when different from current module (private types) * no need to complicate things * [tests] add test for HaxeFoundation#11480 * add CFEX, but don't generate anything into it yet * move local type params to expressions * don't write type parameter length twice * write ttp host * write nested status so the reader doesn't have to track this * more CFEX work * rename chunks * move all field references in front of definitions There's currently no data dependency which requires this, but it's more future-proof in case there's ever something along the lines of dependent types. * rename chunk reader functions too * sanity catch Error when resolving types * more sanity * forward declare module type parameters in MTF We need to know their identity early so the instance builder can work. Constraints and defaults are set later. * fix overload handling again * remove manual cl_build call, add enable_field_access to api see HaxeFoundation#11493 * add marker chunks, read hxb modules delayed see HaxeFoundation#11493 * work around java.Init issue so we can run JVM roundtrip on CI see HaxeFoundation#11493 * fail nicer if we can't write the archive * write cl_flags in MTF We want to know early if a class is an interface * remove enable_field_access, rely on typing passes instead see HaxeFoundation#11493 * only load a native lib once we actually need it * generate extern modules too * activate EXD, handle field type parameters awfully for now * don't write empty chunks * reduce diff against development * bucket class fields by their name to avoid long linear lookups * remove rings Not worth it anymore because the type instance simple/not_simple distinction is fast enough. * comment out stats writes because this might be an observable overhead * time writer per-module so we can find pathological modules * lose some more byte-level OOP overhead * remove some debug noise * merge IOChunk and Chunk, remove string_pool * no comment * simplify type instance writing again Reuse a single chunk that we reset when writing a texpr type instance. * add custom StringPool implementation * optimize locals a bit more * maybe deal with duplicate var declaration * revert local changes * try local optimization again * invert texpr order: kind, type, pos This will allow us to omit some type instances in case they can be inferred from the kind. * introduce implicit texpr type instances * walk back a little * don't double match type params * avoid some EXD reading * small cleanup * avoid some list reading for things that aren't lists * reverse field order and create both list and PMap at the same time * Generate dump even with --no-output * avoid write_list for anon fields too Folding a PMap to a list only to then get its length (which is a linear operation) and iterate over it isn't the power move that I thought it was. * write pos pairs where applicable * delay IO.input creation, add per-chunk timer * avoid IO in reader too * don't look if you want to sleep at night * less nightmares * remove abstract reader * do less in display requests CI exploding in 3... 2... 1... * start working on delayed expression reading see HaxeFoundation#11498 * Add hxb.stats to ignored defines for signature * put the dodge in place, activate see HaxeFoundation#11498 * adjust to cl_init changes * used hashed identity pool for anon fields too * less classes * less classes * no classes * avoid some pointless DynArray to List operations * more DynArray * add MDR to deal with import to @:keep * write CLR and friends after CFR CFR might add more CLR. Also add sanity checks to make sure we don't modify pools after exporting them. * actually install cf_type * change field type parameter storing to something more awkward and efficient * avoid some unnecessary byte blitting * fix HxbId insertion on fields without expression * Warnings * [server-hxb] only read expressions eagerly when full typing * [hxb] write type for TMeta * [hxb] whitespace nazi * tanon identification / tunification: stricterer EqStricter * Why was this debug even pushed.. * hacktoberfest * [display] server/type: only restore hxb headers * check if 3725 is still a problem * remove TODO comment * Remove TODOs * clean up unit hxmls * don't null-terminate bytes * [tests] add display test for static field completion Also run again a couple tests after caching type * die when writing out empty module or type name The reader would die anyway --------- Co-authored-by: Rudy Ges <k@klabz.org>
Interested to know, does this rework the existing HXCPP compile cache system? I was wondering about using hxb for compilation caching but if the PR does that already then that reduces the amount of work required. |
This PR adds the writer and reader for hxb, the new Haxe binary format.
What is a binary format?
When Haxe compiles a program, it first parses syntax into an abstract syntax tree (AST) and then types that into a typed expression representation. This representation is handed of to the generators, who then produce the desired output. With a binary format, we can export that representation and reuse it later. With a
Main.hx
like this:haxe -cp src -main Main --js export/hello-hxb.js --hxb export/hello-hxb.zip
This will generate
export/hello-hxb.zip
, which contains, among others, a filejs/Main.hxb
that looks like this in a hex editor:The good news is that you don't have to understand this in order to use it! The generated data can now be consumed with the
--hxb-lib
argument:haxe --hxb-lib export/hello-hxb.zip -main Main --js export/hello-hxb-2.js
Note how we are not adding
-cp src
here. It's not necessary because the compiler doesn't need any source files and can instead completely rely on the binary data provided via--hxb-lib export/hello-hxb.zip
. We can also easily confirm that the generated JavaScript output is equal.How do I use this?
At the moment, there are exactly two command line arguments for hxb:
--hxb path.zip
generates the binary data for all compiled types in the specified .zip-file. This is auxiliary output, like--xml
and--json
.--hxb-lib path.zip
reads the binary data from the specified .zip-file. It can be understood as an equivalent to-cp
but with binary data instead of .hx source files.Additionally, the Haxe compilation cache uses hxb as its internal representation - exclusively for now. While we ultimately want to implement a hybrid memory/hxb cache for optimal performance, we need proper testing of the hxb part before we can look into that.
Limitations
--hxb
generates all known types to the .zip file. We will add the option to specify what exactly should be generated once we've designed how to exactly do this. Track [hxb] Filter for --hxb output #11490 for further information.--hxb-lib
takes priority over-cp
,-lib
and any native library include like--java-lib
. There are plans to make it respect the order of appearance in the command line arguments, but this is going to require some additional work. See [hxb] --hxb-lib vs. -cp vs. --java-lib #11494.-main
is not stored in the generated .zip-file yet, which means we don't have runnable archives. See [hxb] Add -main information #11502.Why should I use this?
Hxb is particularly interesting when you're working with portions of code that rarely, if ever, change. In that case, pre-compiling such code to hxb can give great performance benefits. One very nice case-study for this is go2hx, which deals with the entire Go standard library. Here is the
--times
output posted by @PXshadow:Without hxb:
With hxb:
This is, of course, an extreme case, but shows the potential of hxb very nicely! Thanks to @AlexHaxe, we also have modified our benchmark runs to use hxb for the formatter benchmarks:
The way it works is that
formatter
uses--hxb
andformatter_noio
then has--hxb-lib
to use that output. You can see the overhead of the former in the higher blue bar, and the benefit of the latter in the (much) lower one.Closing thoughts
There will be bugs, for sure, so please report them! This has been the most complex addition to the compiler I've worked on so far, simply because of how integrated into the type-loading it is, and the type-loading itself is already the most complex part of the compiler. Fortunately, there has already been quite a bit of related "cleanup fallout" from the changes we're making here, so this PR's diff itself is not actually terribly big.
Rudy has been busy testing (and fixing) this with some real-life codebases, most importantly from Shiro Games. From what we can tell, their codebases work well with hxb, which already is a decently high bar. Thanks to him, I feel somewhat confident in this change now, although there's always a chance that even hello world won't work under some specific circumstance.
Thank you and happy hxb-testing!