Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

time: add time/tzdata package and timetzdata tag to embed tzdata in program #38017

Closed
ianlancetaylor opened this issue Mar 23, 2020 · 32 comments
Closed

Comments

@ianlancetaylor
Copy link
Contributor

Go programs currently rely on the system to provide timezone information in the tzdata database, or rely on GOROOT being accessible at run time in order to read $GOROOT/lib/timezoneinfo.zip. This is not always true, particularly on Windows (#21881) but also in some other environments (#38013).

Therefore, it is desirable for Go program to have some mechanism to directly embed tzdata information into the program itself, so that they don't have to rely on system timezone information that may be absent.

I make the following assumptions, which I think are reasonable:

  • adding the timezone information into a program should be optional, as it increases the size of the program by approximately 800K, and most programs do not need timezone information other than for the local system
  • it should be possible to embed timezone information when building an arbitrary program
  • it should be possible for a program to always embed timezone information without requiring any special build step
  • the program should always prefer data from the system if available, as it is likely to be more up to date than that included in the program

Given those assumptions, I propose adding a new package timezone/tzdata. Importing this package (as import _ "time/tzdata") will cause timezone information to be embedded in the program. Also, building with the build tag timetzdata will force the package to be imported, and therefore will cause timezone information to be embedded in the program. The embedded timezone information will be used if and when no system information is available for the timezone.

There is a sample implementation of this approach at https://golang.org/cl/224588.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/224588 mentions this issue: time/tzdata: new package

@leighmcculloch
Copy link
Contributor

I think this idea of supporting it as a compiler flag and as a package is great.

However, I'm wondering about whether the operating system tzdata should always be preferred if present as it may make an application unpredictable in cases where only using embedded data is highly preferred.

Would it make sense to make the time/tzdata work like certificate pools and root CAs work? Any app can embed its own set of root CAs by creating a cert pool and adding that cert pool to the http package's default transport. The Go stdlib could still have the tzdata package, and still hook up an embedded tzdata, but I think the way it gets hooked up in the code-imports-a-package version could be less magical than an anonymous import. We could require code explicitly setting something like time.DefaultLoadTzinfo = tzdata.LoadTzinfo variable which would allow a developer to control exactly when or how the tzdata gets used and it communicates what's happening to a reader much clearer.

It would support both cases:

  • Making it a backup source of tzdata (replace DefaultLoadTzinfo with a function that tries calling the default DefaultLoadTzinfo and if it it comes back with nothing, then calls tzdata's)
  • Making it an authoritative source (replace DefaultLoadTzinfo with tzdata's function)

Some of this comment was originally posted at #21881 (comment).

@zikaeroh
Copy link
Contributor

zikaeroh commented Mar 23, 2020

Given those assumptions, I propose adding a new package timezone/tzdata. Importing this package (as import _ "time/tzdata") will cause timezone information to be embedded in the program. Also, building with the build tag timetzdata will force the package to be imported, and therefore will cause timezone information to be embedded in the program.

I would support the latter (the build tag), but I'm not sure about offering the former (the _ imported package). I can foresee some time helper package that wants to "make sure" there's timezone information including this, and it ending up in people's dependency trees with no way to exclude it other than to pester the library author or fork.

The _ package is common to load dependencies like database drivers, but I think the binary size impact of those packages is lower than the 800K blob in comparison.

@glycerine
Copy link

glycerine commented Mar 23, 2020

I really like the proposal. A small caveat about that last assumption,

the program should always prefer data from the system if available, as it is likely to be more up to date than that included in the program

Like @leighmcculloch 's comment, I would prefer to give priority to the embedded data always if it is available, so that I don't have deal with platform specific heisenbugs due to data differences or if the user has messed up their system timezone files.

But either way, this is a 1000% improvement on having to hack the Go runtime after every release.

@mvdan
Copy link
Member

mvdan commented Mar 23, 2020

I wonder if we should wait for more progress on #35950, given that one could think of tzdata as an opt-in asset file. Either way, I agree that a solution to this issue would be very useful.

@ianlancetaylor
Copy link
Contributor Author

The kinds of problems that people describe with using system timezone data first are problems that people would see today, where we only use system timezone data. Have people actually seen, in practice, problems across different systems other than missing timezone information?

My concern with using embedded timezone data first is that it is guaranteed to be out of date within six months. While most programs are rebuilt within that time frame, there are certainly programs that are built and deployed once. Those programs may embed timezone information because they run in restricted environments, but they should still use updated timezone information as it becomes available. I have seen, in practice, problems with out of date timezone information, though it was with out of date timezone information on the system rather than in the program.

I don't particularly want to give programs a choice here. That leads to a more complicated API with benefits for few.

@ianlancetaylor
Copy link
Contributor Author

@zikaeroh I think we can address that issue with documentation.

@glycerine
Copy link

It is difficult to make predictions. Especially about the future. ~ Danish proverb

I've reconsidered. I think it is fine as proposed.

@leighmcculloch
Copy link
Contributor

leighmcculloch commented Mar 24, 2020

My concern with using embedded timezone data first is that it is guaranteed to be out of date within six months.

That's a great point. 👏 I think this approach is good then for solving this problem.

Reflecting on when I would use this feature, I struggle to see myself using it. In most cases I make sure tzdata is installed and my operating system stays up-to-date as a part of general system health in the same way that root CAs need to be present. The very rare case I needed consistent tzdata led me to 4d63.com/tz but I no longer have that problem. The issue referenced in the PR description of how we get tzdata in Windows will be addressed some other way (#21881). I realize relying on the operating system is not the workflow everyone uses given #38013 and so for those use cases this seems good.

@alexbrainman
Copy link
Member

@ianlancetaylor

Over the years, I never needed time zone information in my program. I believe majority of Go users are in the same situation as me.

If this proposal is accepted, I am concerned that all my programs will become 800K larger. Same for many other users. I think some package developers will just add import _ "time/tzdata" to one of their source files, and all executables that use that package will silently become 800K larger.

Go executables are large as they are - #6853 - and a lot of developers spend their time making executables smaller. Every 1 K counts. And here we are just wasting 800K of their efforts on something that users won't even use.

  • adding the timezone information into a program should be optional, as it increases the size of the program by approximately 800K, and most programs do not need timezone information other than for the local system

How do you propose I will control if 800K is added to my executable or not?

The only reasonable solution I see here is to restrict import _ "time/tzdata" to main package only.

Thank you.

Alex

PS: Some runt. If this issue is to please users complaining about #21881, then someone should just spent some time and implement #21881 (comment) or #21881 (comment)

@leighmcculloch
Copy link
Contributor

The only reasonable solution I see here is to restrict import _ "time/tzdata" to main package only.

This could be an argument to make it a build tag only and leave out the anonymous import.

someone should just spent some time and implement #21881 (comment)

Maybe we could wait for #21881 (comment) to be implemented, and then evaluate if the remaining use cases remaining warrant it?

@embeddedgo
Copy link

How about support very light timezone data that doesn't contain the whole history of changes?

Something like: https://github.com/embeddedgo/x/tree/master/time/tz

@rsc
Copy link
Contributor

rsc commented Mar 25, 2020

@embeddedgo, if you don't know the rules for different years, you can't print a time from that year. Like if you didn't have the US history you wouldn't know to apply different rules for 1995 vs 2015.

Overall it sounds like people are in favor of this, and it would be a small change. Sure would be nice to have embedded file support. :-)

Adding to proposal minutes but this seems headed for likely accept.

@rasky
Copy link
Member

rasky commented Mar 26, 2020

Is tzdata the right name? I see that timezone.xml in Windows is only 400Kb (and it's a XML file, so it's got overhead). I think that we may want to switch the bundled database format in future, and "tzdata" is a specific database format. Maybe something like "time/zonedb" is both format-agnostic and also more clear to people not fully into timezone issues?

@alexbrainman
Copy link
Member

Adding to proposal minutes but this seems headed for likely accept.

@rsc what about my comment #38017 (comment) ?

Thank you.

Alex

@komuw
Copy link
Contributor

komuw commented Mar 29, 2020

If one of my transitive dependencies imports time/tzdata does it also mean that now my binary will become 800K larger?
If so, that is too much weight to be pulled in especially when done implicitly without my knowing.

@embeddedgo
Copy link

I am especially interested in this topic because additional 800 KB can destroy my efforts to use Go in the niche of embedded systems.

The mainline 32-bit MCUs have at most 1 MB of Flash. Of this, I now have 400 KB for user code,
quite enough for many applications.

@ianlancetaylor
Copy link
Contributor Author

@alexbrainman @komuw @embeddedgo As I said above (#38017 (comment)), I think the issue of importing this package outside of the main package can be addressed with documentation. There is sample documentation in https://golang.org/cl/224588.

@ianlancetaylor
Copy link
Contributor Author

@rasky I don't have an opinion on whether the package should be called tzdata or zonedb. Perhaps people who have a preference could use emoji voting on @rasky's comment above (#38017 (comment)), thumbs up for zonedb, thumbs down for tzdata.

@leighmcculloch
Copy link
Contributor

Is tzdata the right name? I see that timezone.xml in Windows is only 400Kb (and it's a XML file, so it's got overhead). I think that we may want to switch the bundled database format in future, and "tzdata" is a specific database format.

I think the name tzdata is appropriate. Right now the package bundles that format, and it seems most clear for the package to indicate with its name what data it contains.

@rsc
Copy link
Contributor

rsc commented Apr 1, 2020

I did a Google search for tzdata and turned up the time zone database.
I searched for zonedb and turned up an unrelated DNS database.
time/tzdata seems like the right answer.

Except for the name, people seem to be in agreement about adding this
(again, with docs that make clear that it should only be imported in a package main).

This seems like a likely accept.

@rsc
Copy link
Contributor

rsc commented Apr 8, 2020

No change in consensus, so accepted.

@rsc rsc modified the milestones: Proposal, Backlog Apr 8, 2020
@rsc rsc changed the title proposal: time: add time/tzdata package and timetzdata tag to embed tzdata in program time: add time/tzdata package and timetzdata tag to embed tzdata in program Apr 8, 2020
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/228100 mentions this issue: time/tzdata: for test, permit extra elements in zone

@dmitshur
Copy link
Contributor

Re-opening as CL 224588 was rolled back in CL 228200.

@dmitshur dmitshur reopened this Apr 13, 2020
@gopherbot
Copy link
Contributor

Change https://golang.org/cl/228101 mentions this issue: time/tzdata: new package

@xtonyjiang
Copy link

Awesome! Would importing this package also solve #24844?

@ianlancetaylor
Copy link
Contributor Author

@xtonyjiang No, it won't. The CL prefers the information on disk if it is available, as that information is typically more correct for the current system than the embedded information. The embedded information is a fallback to use when nothing else is available.

@EdSchouten
Copy link

EdSchouten commented May 8, 2020

Hey Ian!

I have something to say on this topic that may be interesting to share!

  • adding the timezone information into a program should be optional, as it increases the size of the program by approximately 800K, and most programs do not need timezone information other than for the local system

The reason why the time zone information on UNIX is so big is because zic flattens the information that is provided in the official tzdata files. The tzdata files define the time zones in a layered approach, but each file under /usr/share/zoneinfo is made to be independent, causing the layering to be lost. There is no reuse. This is intentional, because programs generally only load a single time zone file from disk.

At some point in the past I wanted to achieve the same thing (self-contained binaries that included all the time zones without blowing up the size), and discovered that it's possible to come up with solutions that are a lot more efficient. It is possible to come up with algorithms that don't require the data to be flattened, while still having very decent time complexity. I think that in the end I managed to store both the entire dataset and the algorithm in ~110 KB. Here's what I did:

  1. I wrote an ugly script that takes all of the IANA tzdata text files and parses them. In addition to that, it computes some implicit properties that are needed for our algorithm to run quickly (e.g., the amount of DST that was being used right before a location moved from one time zone to another).
  2. This script stomps out this C header file containing all of the unflattened definitions in a compilable format. The definitions for each of the struct types used can be found here.
  3. Then I wrote implementations of mktime and localtime that work on top of these data structures (common subroutines).
  4. Last but not least, I wrote some scripts around zdump -v to generate long lists of unit tests. The time zones used in these tests are the ones that I discovered were problematic in some sense (e.g., countries that moved across the international date line, British Double Summer Time).

Unfortunately, hardly anyone uses this code. Maybe it was a waste of time for me to write it in the first place? Or maybe not, because now it may serve as an inspiration for someone to do something similar for Go?

@ianlancetaylor
Copy link
Contributor Author

@EdSchouten Thanks, nice idea.

@thomasf
Copy link

thomasf commented Aug 8, 2020

Just started to read through the current version of the go 1.15 release notes and this one seems strange to me. Since tzdata becomes stale, why not have this data as a go module that can update independently of the standard library?

To me a golang.org/x/tzdata go module that updates whenever the upstream tzdb changes feels a lot more fitting solution to the problem.

I guess it's way too late to change any of this now but I would like to understand the reasoning for doing it this way, the proposal does not go into this aspect in any detail so that is why I am asking.

@ianlancetaylor
Copy link
Contributor Author

@thomasf First, the point of time/tzdata is to modify the behavior of the standard library time package, so it requires a tight coupling to the standard library. It's hard to do that reliably and maintainably in an external module.

Second, one of the assumptions mentioned in the original proposal statement is: "it should be possible to embed timezone information when building an arbitrary program." The intent here is that a user of a Go program, not the original author, can take an existing program and deploy it to an environment without a timezone database and still have it work correctly. That is done by building the program with -tags timetzdata. Any approach requiring an external module would inevitably be more complex and harder to use.

Thanks for the comment.

@EdSchouten
Copy link

Maybe it would make sense to add a generic flag to go build to request linking in additional libraries by import path? Could also be useful to link binaries with custom database drivers, etc.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests