Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(prost-types): Converting DateTime to Timestamp is fallible #1095

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

caspermeijn
Copy link
Collaborator

The converstion from the private type DateTime to public type Timestamp is fallible when the DateTime is invalid. All code paths that used DateTime::into::<Timestamp>() first check whether the DateTime is valid and then do the conversion, therefore this problem was not visible. #893 points out that some conversions panic, however all these DateTime are invalid.

Solution: Replace impl From<DateTime> for Timestamp with TryFrom and remove the manual is_valid checks before the conversion.

I added the test cases from the issue and roundtrip test between DateTime and Timestamp.

@caspermeijn
Copy link
Collaborator Author

@mumbleskates bilrost solved this problem differently. Do you agree with this solution as well?

Copy link
Contributor

@mumbleskates mumbleskates left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems sensible to me. i'm tempted to recommend an error type for the TryFrom that can distinguish between denormalized inputs and out-of-range inputs, but i don't really think it's actually worth making that distinction.

the fix i wrote in bilrost is enough to avoid the crash in year_to_seconds, but it isn't actually correct yet for denormalized datetimes either (i haven't put it through a fuzzer yet, and i've found that at minimum it's possible to crash in month_to_seconds with an out-of-range month).

year: 2020,
month: 2,
day: 29,
hour: 1,
minute: 2,
second: 3,
nanos: 0,
}),
})
.unwrap(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove .unwrap() on both sides of this assert

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah maybe we can't, since these have different error types. would it be sensible to use TimestampError for the TryFrom?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is a bit lame that it can only return the one enum variant, but I agree that it is sensible. And it is an internal conversion, so we can change it if that makes sense at a later time.

@mumbleskates
Copy link
Contributor

i took a little time today to hook up a fuzzer to the timestamp <--> string machinery and i'm finding all kinds of problems.

so far: parse_two_digit_numeric can panic in split_at if the input is short. after fixing that, it appears that not only does the string "19+0-01-11" parse as the timestamp "1900-01-10T00:00:00Z", but that string parses as the timestamp for "1900-01-09T00:00:00Z" (with the formatting output matching the timestmap value, but the parsing side regressing by 1 day). This error occurs somewhere inside date_time_to_seconds :(

@mumbleskates
Copy link
Contributor

year_to_seconds incorrectly reports that the year 1900 specificially is a leap year 😔

mumbleskates added a commit to mumbleskates/bilrost that referenced this pull request Jul 7, 2024
The converstion from the private type DateTime to public type Timestamp is fallible when the DateTime is invalid. All code paths that used `DateTime::into::<Timestamp>()` first check whether the DateTime is valid and then do the conversion, therefore this problem was not visible. tokio-rs#893 points out that some conversions panic, however all these DateTime are invalid.

Solution: Replace `impl From<DateTime> for Timestamp` with `TryFrom` and remove the manual `is_valid` checks before the conversion.

I added the test cases from the issue and roundtrip test between DateTime and Timestamp.
@caspermeijn
Copy link
Collaborator Author

i took a little time today to hook up a fuzzer to the timestamp <--> string machinery and i'm finding all kinds of problems.

so far: parse_two_digit_numeric can panic in split_at if the input is short. after fixing that, it appears that not only does the string "19+0-01-11" parse as the timestamp "1900-01-10T00:00:00Z", but that string parses as the timestamp for "1900-01-09T00:00:00Z" (with the formatting output matching the timestmap value, but the parsing side regressing by 1 day). This error occurs somewhere inside date_time_to_seconds :(

I have been playing with the idea of improving compatibility with chrono and deprecating the string to/from timestamp conversion. I feel that this type of string operation is a bad fit for a library like prost. We should focus on our strengths and use the strengths of others. What do you think of replacing this conversion by chrono crate?

@mumbleskates
Copy link
Contributor

mumbleskates commented Jul 12, 2024

possibly. one of the original reasons for having it, though, was that prost_types supports the full range of i64 seconds. chrono only supports the range of i64 milliseconds, for some reason; others may be even less. chrono's string parsing is even more constrained, only supporting positive 4-digit years.

indeed, perhaps it is more sensible to support from-string conversion only via pivoting through chrono, though this may effectively make Timestamp's to-string conversion fallible. i understand the rationale behind dropping this functionality, as it is messy and was never really validated before, but it feels really bad losing the infallibility to me.

i can contribute that, with the 3 fixes i include or mention in #1096, fuzz testing in bilrost with cross-validation against chrono soaked up everything i have been able to throw at it (many cpu-days) with no problems. it's also (now, after fixing the year 1900 bug) exhaustively correct for each individual minute from 0000-01-01 through 9999-12-31 (the full range of chrono's rfc3339 support), which gives me a reasonable amount of confidence that it is fully correct.

@caspermeijn
Copy link
Collaborator Author

possibly. one of the original reasons for having it, though, was that prost_types supports the full range of i64 seconds. chrono only supports the range of i64 milliseconds, for some reason; others may be even less. chrono's string parsing is even more constrained, only supporting positive 4-digit years.

indeed, perhaps it is more sensible to support from-string conversion only via pivoting through chrono, though this may effectively make Timestamp's to-string conversion fallible. i understand the rationale behind dropping this functionality, as it is messy and was never really validated before, but it feels really bad losing the infallibility to me.

i can contribute that, with the 3 fixes i include or mention in #1096, fuzz testing in bilrost with cross-validation against chrono soaked up everything i have been able to throw at it (many cpu-days) with no problems. it's also (now, after fixing the year 1900 bug) exhaustively correct for each individual minute from 0000-01-01 through 9999-12-31 (the full range of chrono's rfc3339 support), which gives me a reasonable amount of confidence that it is fully correct.

Ok, let's work towards a fuzz tested implementation. Could you review and approve this PR? Then I will look at yours.

What I find interesting is that the protobuf comments on the Timestamp type specifically mention year 0001 through 9999. So you could argue that the chrono implementation is good enough. But I agree we should not settle for good enough and make it infallible.

@mumbleskates
Copy link
Contributor

mumbleskates commented Jul 13, 2024

fuzz testing for some pretty reasonable bounds is available in bilrost as of here. checked for round tripping, checked that accepted inputs match a reasonable regex, and checked that any date in the supported range of chrono matches its interpretation exactly. i've run it for >3 billion iterations and it's stopped making progress so this should at least be a good starting point 👍

@caspermeijn caspermeijn added this pull request to the merge queue Jul 19, 2024
Merged via the queue into tokio-rs:master with commit e5cc951 Jul 19, 2024
16 checks passed
@caspermeijn caspermeijn deleted the date_time_fallible branch July 19, 2024 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants