-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support parsing tz timezone data #10456
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sure you add .tzif to exclude_extensions
in build.zig
FYI I wrote a tzfile parser in lua over at https://github.com/daurnimator/luatz/blob/master/luatz/tzfile.lua
I'd like to think its pretty easy to read.
lib/std/tz.zig
Outdated
const version = data2[4]; | ||
if (!std.mem.eql(u8, magic, "TZif")) return error.BadHeader; | ||
if (version != '2' and version != '3') return error.BadVersion; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should verify that it is followed by 15 nulls
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC8536 doesn't actually say that they should be null, just that they are reserved.
It's even implied that you shouldn't care about them - the spec depends on that legacy parsers that doesn't understand the "version" field not bailing when it sees a '2' or '3' in that byte, otherwise the "fat" version of the files wouldn't work.
|
||
pub const Timetype = struct { | ||
offset: i32, | ||
flags: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why have this as flags
instead of bool
fields for each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes Timetype
pack better; 12 bytes instead of 16. I can change it if the optimization isn't worth the complexity of having getters for each value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Though without algorithms that use the data it's all a bit useless right?
I assume you aim to have a followup PR in mind to make use of this code?
I want to write some Date/Calendar utilities, yeah. Even if that doesn't go anywhere splitting up the task into smaller subtasks (like the timezone parsing here) means some of the work will already be done for the next person who picks up the torch. |
I added support for parsing tz timezone database files. This will be useful for future date support in the standard library.
Some caveats:
It does not support the legacy pre-2005 format. That one stored timestamps as 32-bit signed integers, and as such will stop working completely in 2038. It does support "thick" tz files that have both legacy and modern data in the same file, by just skipping over the legacy cruft.I prepared the leap second data structure to fit into 8 bytes (instead of 16, with padding), which strictly speaking violates the spec but will never matter in practice; it can still represent leap seconds nine million years in the future.
The parser rejects all "MUST NOT" from rfc8536, and one SHOULD NOT - timezone type names (e.g. EST, CET...) cannot be more than 6 characters, which is banned by POSIX anyway.