Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Separate TZData module into separate package #359

Closed
omus opened this issue Sep 19, 2021 · 17 comments · Fixed by #441
Closed

RFC: Separate TZData module into separate package #359

omus opened this issue Sep 19, 2021 · 17 comments · Fixed by #441
Assignees

Comments

@omus
Copy link
Member

omus commented Sep 19, 2021

The Problem

The current implementation of TimeZones.jl builds tzdata into an internal representation using the following high level steps:

  1. Retrieve the IANA tzdata files using the Pkg artifacts system
  2. Parse the tzdata into VariableTimeZone and FixedTimeZone instances
  3. Serialize the VariableTimeZone and FixedTimeZone data using Julia's Seralization.jl and store the data on disk

There have been a few issues with this setup such as:

Reference

Proposal

NOTE: These details need to be expanded upon and are likely to change

  • Create a separate JLL style package which each release is tied to an IANA tzdata release. These releases would include artifacts to the raw tzdata and also a pre-parsed artifact ready to use by TimeZones.jl
    • Allows switching of tzdata versions using the standard Pkg tooling
    • Will be using the tzfile format, or a slightly modified version, as the serialization format. This should provide interoperability between Julia versions with low overhead
    • Allows for the tzdata to be pre-processed eliminating the need for a build step in TimeZones.jl
    • Allows for the use of non-lazy artifacts and better compability with PackageCompiler
@omus
Copy link
Member Author

omus commented Sep 19, 2021

I'm out of time for today. Will be trying to flush out he proposal over the next few days if I have time

@StefanKarpinski
Copy link

Julia serialization is stable in 1.x versions in the sense that data save by Julia 1.x will be readable by Julia 1.y for x ≤ y, so it's fine to use Julia serialization for this. If the format needs to change for whatever reason, we can put compat bounds on the package versions. With that approach it seems like it would be best to pre-process the tzdata into the right format as an automated process and then just have the package consume the pre-processed artifacts.

Another potential approach would be to download the tzdata in raw form as an artifact and then use scratch spaces for the processed version. The only issue with this is that I think it would happen at package load time since I don't think we have a hook for scratch spaces as install/build time. This also means that everyone ends up doing the same work locally over and over again, whereas the other approach does it once centrally and then everyone just loads the pre-processed timezone data.

@ericphanson
Copy link
Contributor

Julia serialization is stable in 1.x versions in the sense that data save by Julia 1.x will be readable by Julia 1.y for x ≤ y, so it's fine to use Julia serialization for this.

In practice, I don't think this is true if the code in question depends on types defined in packages, because packages can change the representation of types in a way that doesn't break their public API but does break Serialization. And one often cannot help but change the package environment slightly when changing Julia versions, because e.g. Manifest's are not compatible between Julia versions, and therefore changing Julia versions often does end up breaking serialization, at least in my experience. (And that's not even mentioning anonymous functions, though those aren't relevant in this case).

Serialize the VariableTimeZone and FixedTimeZone data using Julia's Seralization.jl and store the data on disk

FixedTimeZone uses InlineString15 which depends on InlineStrings.jl. I think it's possible InlineStrings changes the representation of InlineString15 in a non-breaking way; for example, JuliaStrings/InlineStrings.jl#6 makes such a proposal. I believe that would break Serialization-based deserialization of InlineString15's. Therefore, I don't think TimeZones should use Serialization for this.

@StefanKarpinski
Copy link

What about creating a simple binary serialization format just for this data then?

@omus
Copy link
Member Author

omus commented Mar 14, 2022

What about creating a simple binary serialization format just for this data then?

The plan is to use the binary tzfile format. Maybe a slightly modified version to work around some minor limitations

@StefanKarpinski
Copy link

It looks like the data types that need to be representable are Zone, Rule and TZSource are the major things that need to be representable. They all seem like they could have fairly simple representations that we could make easy to load into memory. The only really hard field is the on field which is an "Anonymous boolean function to determine day" but I'm assuming we could eliminate that with some work or replace it with an enumeration of specific functions.

@StefanKarpinski
Copy link

Ah, ok. There's already a format. That seems like a good approach.

@ViralBShah
Copy link

Seeing CI failures due to TZdata every once in a while. It would be really nice to get this done.

@omus
Copy link
Member Author

omus commented May 19, 2022

Seeing CI failures due to TZdata every once in a while. It would be really nice to get this done.

I've been working on this issue during my free time as I've been unable to allocate working hours to this. The limited amount of time to work on this has made progress slow but it's still progress!

@omus
Copy link
Member Author

omus commented May 26, 2022

I've been working on this issue during my free time as I've been unable to allocate working hours to this.

Just to clarify to avoid any misinterpretation: this is something we're dedicating time to at Beacon (it's in the current sprint, in fact). I just didn't have the time to work on it in a previous sprint due to some competing product development priorities.

@omus
Copy link
Member Author

omus commented Jul 8, 2022

A quick status update. As of TimeZones 1.8.0 we have support for reading/writing to the new TZJFile serialization format (this is based off of the tzfile format but with some slight modifications for our particular use case). For the TimeZones 1.9.0 release we'll switch over to using the TZJFile format as the default and also switch over to using scratch spaces for the build directory. We'll still have a build step for now but this lets us thoroughly validate our new format and solve a few issues like #343.

@omus
Copy link
Member Author

omus commented Jul 8, 2022

I'm planning on releasing TimeZones 1.9.0 early next week, ideally Monday.

@StefanKarpinski
Copy link

Thanks for doing this, @omus. I guess if TZJFile format works out in the future we could switch to pre-building the files and generating artifacts? Or does that not make sense?

@omus
Copy link
Member Author

omus commented Jul 8, 2022

I guess if TZJFile format works out in the future we could switch to pre-building the files and generating artifacts? Or does that not make sense?

That is definitely still the plan. I originally was hoping to have this for the 1.9.0 release but I have more work to do there and I'd like to avoid holding up these changes until I get everything perfect.

@omus
Copy link
Member Author

omus commented Jul 11, 2022

Did some work over the weekend to get the 1.9.0 release ready. I've just registered the release and it should be available shortly.

@quinnj
Copy link
Collaborator

quinnj commented Jul 24, 2022

Just want to pop in and say I'm excited for the direction headed here! Thanks @omus!

@omus
Copy link
Member Author

omus commented Aug 22, 2023

PR #441 finally closed this by utilizing precompiled time zone artifacts provided by the TZJData.jl package. Included in TimeZones release 1.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants