Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adapt JLD2 for isbits union fields #35

Closed
simonster opened this issue Aug 20, 2017 · 2 comments
Closed

Adapt JLD2 for isbits union fields #35

simonster opened this issue Aug 20, 2017 · 2 comments

Comments

@simonster
Copy link
Member

As implemented in JuliaLang/julia#22441. At the moment, this is impossible, because Julia does not correctly construct these objects. But once it does, there is the question of how these should be written in the file. Obvious options are:

  • Write union-typed fields the same way we do now, by saving the field content in its own dataset. This might be easier for non-JLD2 implementations to consume, since it's a closer match with the HDF5 data model, but writing a dataset has substantially overhead in terms of both space and time vs. how Julia handles this in memory. It would be nice if the cost of saving/loading data with JLD2 was closely related to the cost of working with said data in Julia.
  • Create datatypes for isbits unions. HDF5 doesn't actually support unions, but we could save a datatype structured like:
DATATYPE "00000002" H5T_COMPOUND {
         H5T_STD_I64LE "Int64";
         H5T_IEEE_F64LE "Float64";
         H5T_STD_U8LE "uniontype";
      }

where only one of the fields will be initialized, and the ty field will say which one. In principle this wastes some space in the file since only the Int64 or Float64 field will contain data, but the storage overhead is smaller than the storage overhead from creating a new dataset, and in the likely common case of Union{T,Null}, we would just be storing an extra byte to signal whether the value was null or not.

This would be a breaking change in that older versions of JLD2 wouldn't be able to read files created with newer versions, although newer versions of JLD2 should still be able to handle files created with older versions.

@alyst
Copy link

alyst commented Nov 28, 2017

Is it the same thing that causes Vector{Union{Float64, Missing}} not to be compressed (Julia 0.6.1, JLD2 0.0.4)?
I guess the check that disables compression is elseif f.compress && isleaftype(T) && isbits(T) in write_dataset()

For my case, serialized gzipped is ~8Mb and JLD2 with compress=true is ~75Mb.

@JonasIsensee
Copy link
Collaborator

taken care of in #221

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants