Hxb #11504

Simn · 2024-01-23T10:46:10Z

This PR adds the writer and reader for hxb, the new Haxe binary format.

What is a binary format?

When Haxe compiles a program, it first parses syntax into an abstract syntax tree (AST) and then types that into a typed expression representation. This representation is handed of to the generators, who then produce the desired output. With a binary format, we can export that representation and reuse it later. With a Main.hx like this:

/**
	Can you spot this documentation in the output?
**/
function main() {
	trace("Hello hxb");
}

haxe -cp src -main Main --js export/hello-hxb.js --hxb export/hello-hxb.zip

This will generate export/hello-hxb.zip, which contains, among others, a file js/Main.hxb that looks like this in a hex editor:

The good news is that you don't have to understand this in order to use it! The generated data can now be consumed with the --hxb-lib argument:

haxe --hxb-lib export/hello-hxb.zip -main Main --js export/hello-hxb-2.js

Note how we are not adding -cp src here. It's not necessary because the compiler doesn't need any source files and can instead completely rely on the binary data provided via --hxb-lib export/hello-hxb.zip. We can also easily confirm that the generated JavaScript output is equal.

How do I use this?

At the moment, there are exactly two command line arguments for hxb:

--hxb path.zip generates the binary data for all compiled types in the specified .zip-file. This is auxiliary output, like --xml and --json.
--hxb-lib path.zip reads the binary data from the specified .zip-file. It can be understood as an equivalent to -cp but with binary data instead of .hx source files.

Additionally, the Haxe compilation cache uses hxb as its internal representation - exclusively for now. While we ultimately want to implement a hybrid memory/hxb cache for optimal performance, we need proper testing of the hxb part before we can look into that.

Limitations

At the moment, --hxb generates all known types to the .zip file. We will add the option to specify what exactly should be generated once we've designed how to exactly do this. Track [hxb] Filter for --hxb output #11490 for further information.
For the time being, --hxb-lib takes priority over -cp, -lib and any native library include like --java-lib. There are plans to make it respect the order of appearance in the command line arguments, but this is going to require some additional work. See [hxb] --hxb-lib vs. -cp vs. --java-lib #11494.
The information specified by -main is not stored in the generated .zip-file yet, which means we don't have runnable archives. See [hxb] Add -main information #11502.
All hxb output is target-specific and, in the general case, cannot be used across multiple targets. In fact, the output is bucketed by target name in the generated .zip file and won't even be found when using a different target. This limitation is likely to be permanent due to the way Haxe's typing works.
The format itself isn't quite stable yet, and probably won't be entirely stable until the release of Haxe 5. This affects users only insofar as an archive generated on Haxe commit N might not be compatible when being used on Haxe commit N + 1.
Even after the format is stable, we won't be able to guarantee binary compatibility between minor Haxe versions. Any internal change we make has to be reflected in the binary format. We will make an effort to handle changes in a way that the output from the newest Haxe version is compatible with older Haxe versions, too.
There's no other way to put this: the hxb-powered server cache will never be as fast as the current in-memory cache. This is simply because we're now doing something (reading hxb) instead of nothing, and there will always be a cost associated with that. Whether or not that cost is significant, or even observable, is a different question though!

Why should I use this?

Hxb is particularly interesting when you're working with portions of code that rarely, if ever, change. In that case, pre-compiling such code to hxb can give great performance benefits. One very nice case-study for this is go2hx, which deals with the entire Go standard library. Here is the --times output posted by @PXshadow:

Without hxb:

name        | time(s) |   % |  p% |      # | info
--------------------------------------------
analyzer    |   2.904 |  28 |  28 | 816305 |
filters     |   2.468 |  24 |  24 | 133831 |
typing      |   1.517 |  15 |  15 |   2876 |
generate    |   1.502 |  15 |  15 |      2 |
  hl        |   1.502 |  15 | 100 |      2 |
    write   |   0.452 |   4 |  30 |      1 |
macro       |   1.038 |  10 |  10 |   3532 |
  jit       |   0.039 |   0 |   4 |   1348 |
parsing     |   0.883 |   9 |   9 |    382 |
finalize    |   0.015 |   0 |   0 |      1 |
--------------------------------------------
total       |  10.328 | 100 | 100 | 956931 |

With hxb:

name                          | time(s) |   % |  p% |     # | info
-------------------------------------------------------------
generate                      |   1.329 |  55 |  55 |     2 |
  hl                          |   1.329 |  55 | 100 |     2 |
    write                     |   0.343 |  14 |  26 |     1 |
filters                       |   0.339 |  14 |  14 | 10953 |
typing                        |   0.315 |  13 |  13 |     2 |
hxb                           |   0.204 |   8 |   8 |  3860 |
  read                        |   0.204 |   8 | 100 |  3860 |
    EXD                       |   0.092 |   4 |  45 |   219 |
      stdgo_Go                |   0.036 |   1 |  39 |     2 |
      stdgo_math_bits_Bits    |   0.010 |   0 |  11 |     1 |
    CFD                       |   0.045 |   2 |  22 |   236 |
      stdgo_strconv_Strconv   |   0.015 |   1 |  34 |     1 |
    STR                       |   0.031 |   1 |  15 |   267 |
    MTF                       |   0.015 |   1 |   7 |   267 |
hxblib                        |   0.164 |   7 |   7 |   268 |
  get bytes                   |   0.161 |   7 |  98 |   267 |
macro                         |   0.056 |   2 |   2 |   777 |
  jit                         |   0.033 |   1 |  59 |   762 |
finalize                      |   0.017 |   1 |   1 |     1 |
analyzer                      |   0.010 |   0 |   0 |  5748 |
-------------------------------------------------------------
total                         |   2.436 | 100 | 100 | 21614 |

This is, of course, an extreme case, but shows the potential of hxb very nicely! Thanks to @AlexHaxe, we also have modified our benchmark runs to use hxb for the formatter benchmarks:

The way it works is that formatter uses --hxb and formatter_noio then has --hxb-lib to use that output. You can see the overhead of the former in the higher blue bar, and the benefit of the latter in the (much) lower one.

Closing thoughts

There will be bugs, for sure, so please report them! This has been the most complex addition to the compiler I've worked on so far, simply because of how integrated into the type-loading it is, and the type-loading itself is already the most complex part of the compiler. Fortunately, there has already been quite a bit of related "cleanup fallout" from the changes we're making here, so this PR's diff itself is not actually terribly big.

Rudy has been busy testing (and fixing) this with some real-life codebases, most importantly from Shiro Games. From what we can tell, their codebases work well with hxb, which already is a decently high bar. Thanks to him, I feel somewhat confident in this change now, although there's always a chance that even hello world won't work under some specific circumstance.

Thank you and happy hxb-testing!

way too many false positives in tests from printing

but it's still not 100% working

Also run again a couple tests after caching type

The reader would die anyway

* use identity for field params * support partial generic expansion for fields * take off @:generic * try something for enum constructor type params * gencommon says no * communicate field type parameters by index again * surely anons are just a list of anon field refs * fix generic type param names * let's get dangerous * handle CBOs in HxbRestore too * well * chdir * handle warnings as CBOs * don't try to hxb import.hx because there's nothing in there * don't roundtrip on Cross and Eval way too many false positives in tests from printing * add CFLR, don't resolve fields like a crazy person * fix but it's still broken * stop the roundtrip nonsense for now, enough other problems * decode binary modules in check_display_file * remove flush_fields, should not be needed with CFLR * maybe fix generic type parameter naming * add ENFR too * pull stricter tanon changes and remove some debug * unused open * fix overloads a bit more but it's still not 100% working * fix generic type param naming a bit more * check for cf_overloads instead of CfOverloads The flag isn't set for ghetto overloads, so this should catch all relevant cases. * fix overloads even more * add enum when adding enum field ref * fix overloads for real * small optimization for the VERY common depth = 0 case * properly initialize typedef monos * map anons before identifying them * write tpp paths instead of names * remove some debuggery from the writer * embrace ttp * don't follow away Null<Void> in function returns * encode basic types directly I'm not sure if I want to keep this optimization, but the changes to com.basic seem like a good idea regardless. At the moment it might be possible to get a hold of the ominous `m` monomorph in Common.create, and binding that one to anything would lead to interesting results. * add ANFR for anon fields * Happy new year! * [typer] fix functional interface type parameter leak see HaxeFoundation#11390 * ANFR earlier * only write cf_expr_unoptimized if it differs from cf_expr * write texpr positions are bit smarter * add field contexts, hashcons expression type instances * [display] populate server/modules from hxb since there's only that atm * remove cf_expr_unoptimized optimization again because the hashconsing has already broken it We need a different solution for this silly field because it takes up a lot of space and time for very little gain. Maybe something like a binary diff to cf_expr would make sense here. * forward declare locals in texpr way less Hashtbling * be less awkward around ttps They're your friend! * purge some debug from the reader We never want to write to stderr because that hangs the display tests. * don't generate TLazy and remove random bool Also add comment about empty anons * make api an argument of the read function This is a prerequisite for making the reader reentrant, which should now be possible. * demo how continuations could work * reorder tables to something that makes sense to me * separate chunk classes more * add cl_type to avoid hundreds of ClassStatics anons * don't set anon fields too early * properly deal with ClassStatics in unification and field typing also convert unify_anons to something human-readable * bring back missing checks * add small ring cache per type kind * add more elaborate stats behind -D hxb.stats * move expr stuff out of the way because I keep scrolling past it * change mono byte to 1, avoid extra 1 byte for immediate type instance values * encode simple type instances immediately * potentially avoid some GC write barriers by splitting up pos * use less ignorant fast_eq * warnings * basic reader stats * minor cleanup * fix class field scoping * infer nested status from stack Did you know there can be anon fields inside enum fields? Crazy... * field type parameters 2.0 * put all texpr type instances in an array * change -bcp to --hxb-lib * create directory before creating file... * format stats output nicer because that's very important * rename hxml so I find it easier * encode Void directly because it actually appears a lot * reorganize type instance bytes, inline common list lengths * warnings * write blocks less awkwardly also don't read arrays only to turn them into lists * don't write i32 if it can be helped * pass current chunk to pos writer * add texpr stats * optimize TCall * optimize static calls * refactor pos writer slightly * optimize this.field * fix generator printing * enable disabled tests * remove overly fancy position optimization * write TInt as leb too * Revert "write TInt as leb too" This reverts commit ce6c8de. * let's see where we're at * check context before checking cache for display files see HaxeFoundation#11480 * stop leaving debug prints in your commits ffs * adjust test Calling exclude before saving means that the class is extern on the next iteration, and thus not generated. * dodge arcane sourcemap test problem * dodge more * change chunk to module in the writer * use SimnBuffer * remove roundtrip * avoid some object overhead in the reader * use correct v_id when reading from .hxb * [hxb] use cache hxb for --hxb too * don't generate aux output (including hxb) during diagnostics * warnings * only cache context when using compilation server * write chunk name before length, and remove crc The 4 bytes of the name don't contribute to the length, so this makes more sense. * minor refactor to make interface a bit saner * skip hxb generation on display too * warnings * move module_cache to HxbData That might not be the ideal place for it either but it's better than tType.ml * rework reader interface a bit as well * read metadata in forward data see go2hx/go2hx#174 * read anon field meta too * persist ordered list of chunks instead of one big bytes thing * remove feature nonsense and revert meta change * warnings * fix server/module, get all data from hxb cache directly * align chunk order * [hxb] don't overwrite taint reason with check_display_file * [server] we're sending cacheState, not 'dirty' * [tests] update test * [hxb] generate proper warnings for unbound type parameters * Simplify unbound ttp positions; not like they're often useful anyway... * ocaml syntax is hard * anon fields can have @:overload too... * use singular empty anon * slightly improve field vs. overloads handling * remove some debug because hxb is perfect * [TYPF] skip module name, only write pack when different from current module (private types) * no need to complicate things * [tests] add test for HaxeFoundation#11480 * add CFEX, but don't generate anything into it yet * move local type params to expressions * don't write type parameter length twice * write ttp host * write nested status so the reader doesn't have to track this * more CFEX work * rename chunks * move all field references in front of definitions There's currently no data dependency which requires this, but it's more future-proof in case there's ever something along the lines of dependent types. * rename chunk reader functions too * sanity catch Error when resolving types * more sanity * forward declare module type parameters in MTF We need to know their identity early so the instance builder can work. Constraints and defaults are set later. * fix overload handling again * remove manual cl_build call, add enable_field_access to api see HaxeFoundation#11493 * add marker chunks, read hxb modules delayed see HaxeFoundation#11493 * work around java.Init issue so we can run JVM roundtrip on CI see HaxeFoundation#11493 * fail nicer if we can't write the archive * write cl_flags in MTF We want to know early if a class is an interface * remove enable_field_access, rely on typing passes instead see HaxeFoundation#11493 * only load a native lib once we actually need it * generate extern modules too * activate EXD, handle field type parameters awfully for now * don't write empty chunks * reduce diff against development * bucket class fields by their name to avoid long linear lookups * remove rings Not worth it anymore because the type instance simple/not_simple distinction is fast enough. * comment out stats writes because this might be an observable overhead * time writer per-module so we can find pathological modules * lose some more byte-level OOP overhead * remove some debug noise * merge IOChunk and Chunk, remove string_pool * no comment * simplify type instance writing again Reuse a single chunk that we reset when writing a texpr type instance. * add custom StringPool implementation * optimize locals a bit more * maybe deal with duplicate var declaration * revert local changes * try local optimization again * invert texpr order: kind, type, pos This will allow us to omit some type instances in case they can be inferred from the kind. * introduce implicit texpr type instances * walk back a little * don't double match type params * avoid some EXD reading * small cleanup * avoid some list reading for things that aren't lists * reverse field order and create both list and PMap at the same time * Generate dump even with --no-output * avoid write_list for anon fields too Folding a PMap to a list only to then get its length (which is a linear operation) and iterate over it isn't the power move that I thought it was. * write pos pairs where applicable * delay IO.input creation, add per-chunk timer * avoid IO in reader too * don't look if you want to sleep at night * less nightmares * remove abstract reader * do less in display requests CI exploding in 3... 2... 1... * start working on delayed expression reading see HaxeFoundation#11498 * Add hxb.stats to ignored defines for signature * put the dodge in place, activate see HaxeFoundation#11498 * adjust to cl_init changes * used hashed identity pool for anon fields too * less classes * less classes * no classes * avoid some pointless DynArray to List operations * more DynArray * add MDR to deal with import to @:keep * write CLR and friends after CFR CFR might add more CLR. Also add sanity checks to make sure we don't modify pools after exporting them. * actually install cf_type * change field type parameter storing to something more awkward and efficient * avoid some unnecessary byte blitting * fix HxbId insertion on fields without expression * Warnings * [server-hxb] only read expressions eagerly when full typing * [hxb] write type for TMeta * [hxb] whitespace nazi * tanon identification / tunification: stricterer EqStricter * Why was this debug even pushed.. * hacktoberfest * [display] server/type: only restore hxb headers * check if 3725 is still a problem * remove TODO comment * Remove TODOs * clean up unit hxmls * don't null-terminate bytes * [tests] add display test for static field completion Also run again a couple tests after caching type * die when writing out empty module or type name The reader would die anyway --------- Co-authored-by: Rudy Ges <k@klabz.org>

EliteMasterEric · 2024-01-26T01:02:29Z

Additionally, the Haxe compilation cache uses hxb as its internal representation - exclusively for now. While we ultimately want to implement a hybrid memory/hxb cache for optimal performance, we need proper testing of the hxb part before we can look into that.

Interested to know, does this rework the existing HXCPP compile cache system? I was wondering about using hxb for compilation caching but if the PR does that already then that reduces the amount of work required.

Simn and others added 30 commits December 29, 2023 13:19

Merge branch 'development' into hxb_server_cache_simn_cleanup

4daaac2

Merge branch 'development' into hxb_server_cache_simn_cleanup

445af5c

use identity for field params

1c91911

support partial generic expansion for fields

1c9970e

take off @:generic

d39538a

Merge branch 'development' into hxb_server_cache_simn_cleanup

2c47591

try something for enum constructor type params

01fb137

gencommon says no

425c012

communicate field type parameters by index again

eb130d5

surely anons are just a list of anon field refs

cb98ec4

Merge branch 'development' into hxb_server_cache_simn_cleanup

585c635

fix generic type param names

fc022fd

let's get dangerous

f28fa7d

Merge branch 'development' into hxb_server_cache_simn_cleanup

5393274

handle CBOs in HxbRestore too

775ca62

well

c07ba44

chdir

b59166e

handle warnings as CBOs

fec033b

don't try to hxb import.hx because there's nothing in there

22c98eb

don't roundtrip on Cross and Eval

4753ed5

way too many false positives in tests from printing

add CFLR, don't resolve fields like a crazy person

c4bc0d1

fix but it's still broken

86a929a

stop the roundtrip nonsense for now, enough other problems

4bc65f2

decode binary modules in check_display_file

947312f

remove flush_fields, should not be needed with CFLR

a22623e

maybe fix generic type parameter naming

684f4ce

add ENFR too

3facff7

pull stricter tanon changes and remove some debug

d3268ab

unused open

84efe05

fix overloads a bit more

d35eb2a

but it's still not 100% working

Simn and others added 16 commits January 22, 2024 11:52

fix HxbId insertion on fields without expression

f9b02fe

Warnings

679f12d

[server-hxb] only read expressions eagerly when full typing

c4e07fa

[hxb] write type for TMeta

6df6d4b

[hxb] whitespace nazi

e3d2826

tanon identification / tunification: stricterer EqStricter

47ad4bc

Why was this debug even pushed..

bfe0837

hacktoberfest

4ed30fa

[display] server/type: only restore hxb headers

1189e3a

check if 3725 is still a problem

7713b39

remove TODO comment

a517968

Remove TODOs

db1c97f

Merge branch 'development' into hxb_server_cache_simn_cleanup

5cff609

clean up unit hxmls

fe75341

don't null-terminate bytes

4bd8b05

Merge branch 'development' into hxb

9017bec

skial mentioned this pull request Jan 23, 2024

Haxe Roundup 701 skial/haxe.io#1135

Closed

1 task

Simn and others added 5 commits January 23, 2024 22:25

Merge branch 'development' into hxb

8cbf396

[tests] add display test for static field completion

d85f0cd

Also run again a couple tests after caching type

Merge branch 'development' into hxb

5903203

die when writing out empty module or type name

9417cee

The reader would die anyway

Merge branch 'development' into hxb

bba9a34

kLabz merged commit fe395ef into development Jan 25, 2024
0 of 12 checks passed

back2dos mentioned this pull request Jan 30, 2024

Populate cache from last context rather than hxb #11519

Open

kLabz mentioned this pull request Mar 19, 2024

[macro] Remove CompilationServer.setModuleCheckPolicy options #11615

Merged

Yanrishatum mentioned this pull request Apr 22, 2024

Add heaps_disable_res_completion compilation flag HeapsIO/heaps#1209

Open

kLabz mentioned this pull request May 15, 2024

Hxb #11266

Closed

7 tasks

kLabz deleted the hxb branch May 28, 2024 09:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hxb #11504

Hxb #11504

Simn commented Jan 23, 2024

EliteMasterEric commented Jan 26, 2024

Hxb #11504

Hxb #11504

Conversation

Simn commented Jan 23, 2024

What is a binary format?

How do I use this?

Limitations

Why should I use this?

Closing thoughts

EliteMasterEric commented Jan 26, 2024