Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster time parsing (~93% faster) #3860

Merged
merged 6 commits into from
Mar 16, 2023
Merged

Conversation

tustvold
Copy link
Contributor

@tustvold tustvold commented Mar 14, 2023

Which issue does this PR close?

Closes #3919

Rationale for this change

Faster time parsing

9:50                    time:   [5.1079 ns 5.1315 ns 5.1503 ns]
                        change: [-92.180% -92.146% -92.111%] (p = 0.00 < 0.05)
                        Performance has improved.

09:50                   time:   [5.1957 ns 5.2014 ns 5.2086 ns]
                        change: [-92.186% -92.139% -92.077%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 19 outliers among 100 measurements (19.00%)
  1 (1.00%) low mild
  2 (2.00%) high mild
  16 (16.00%) high severe

09:50 PM                time:   [6.0355 ns 6.0371 ns 6.0387 ns]
                        change: [-93.754% -93.746% -93.739%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 8 outliers among 100 measurements (8.00%)
  5 (5.00%) high mild
  3 (3.00%) high severe

9:50:12 AM              time:   [6.0481 ns 6.0591 ns 6.0733 ns]
                        change: [-95.110% -95.105% -95.098%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  3 (3.00%) high mild
  8 (8.00%) high severe

09:50:12 PM             time:   [6.4839 ns 6.4934 ns 6.5036 ns]
                        change: [-94.865% -94.853% -94.838%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 14 outliers among 100 measurements (14.00%)
  6 (6.00%) high mild
  8 (8.00%) high severe

09:50:12.123456789      time:   [10.475 ns 10.503 ns 10.526 ns]
                        change: [-92.510% -92.485% -92.459%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 27 outliers among 100 measurements (27.00%)
  23 (23.00%) low severe
  1 (1.00%) low mild
  1 (1.00%) high mild
  2 (2.00%) high severe

9:50:12.123456789       time:   [9.9941 ns 10.036 ns 10.082 ns]
                        change: [-92.728% -92.701% -92.673%] (p = 0.00 < 0.05)
                        Performance has improved.

09:50:12.123456789 PM   time:   [11.016 ns 11.037 ns 11.060 ns]
                        change: [-93.423% -93.410% -93.396%] (p = 0.00 < 0.05)
                        Performance has improved.

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the arrow Changes to the arrow crate label Mar 14, 2023
@tustvold tustvold marked this pull request as ready for review March 14, 2023 18:53
@tustvold tustvold marked this pull request as ready for review March 15, 2023 12:31
@alamb
Copy link
Contributor

alamb commented Mar 15, 2023

I plan to review this carefully tomorrow if no on else had had a chance to do so

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very cool to me -- nice work @tustvold

I had several more suggested test cases but otherwise I think this PR looks 👨‍🍳 👌

It is hard to beat 97% improvement ❤️

("13:00 PM", "%I:%M %P"),
("1:00:30.123456 PM", "%I:%M:%S%.f %P"),
("1:00:30.123456789 PM", "%I:%M:%S%.f %P"),
("1:00:30.123456789123 PM", "%I:%M:%S%.f %P"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the experience with #3859, I think it would be valuable to include a time that has 4 and 7 significant digit (aka not a multiple of three)

            ("1:00:30.1234 PM", "%I:%M:%S%.f %P"),
            ("1:00:30.123456 PM", "%I:%M:%S%.f %P"),

Perhaps?

#[test]
fn test_string_to_time_invalid() {
let cases = [
"25:00",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend also testing space and leading colon

            " 9:00:00",
            ":09:00",
```

Perhaps also with a leading T (in case someone splits a RFC timestamp incorrectly)

```
            "T9:00:00",
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also just AM

            "AM",

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, very long subseconds:

            ("1:00:30.123456789123456789 PM")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also intermediate non digits:

"1:00:30.12F456 PM",


_ => &[],
let (am, bytes) = match bytes.get(bytes.len() - 3..) {
Some(b" AM") => (Some(true), &bytes[..bytes.len() - 3]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are Am and Pm valid? If not, perhaps we can add a test showing that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out this is yet another case of chrono doing something differently from what it is documented to do.... Will fix...

@tustvold tustvold merged commit 0df2188 into apache:master Mar 16, 2023
Some(b" am") => (Some(true), &bytes[..bytes.len() - 3]),
Some(b" PM") => (Some(false), &bytes[..bytes.len() - 3]),
Some(b" pm") => (Some(false), &bytes[..bytes.len() - 3]),
Some(b" AM" | b" am" | b" Am" | b" aM") => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve speed of parsing string to Times
2 participants