Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/ottl] Add support for localized time parsing into the timeutils #34353

Merged

Conversation

edmocosta
Copy link
Contributor

Description:
Added support for localized time parsing into the coreinternal/timeutils package.

The tracking issue is a following up to #32140, and the added function (ParseLocalizedStrptime) can be used later to add locale support to the ottl Time function.

As discussed in the related issues, the plan is to have a similar support as implemented by the library monday, making it possible to parse non-english time strings into time.Time objects.
Elastic has built a new OSS library for that same purpose (lunes), that considering the discussed requirements, seems to be a better option than monday. Both libraries are just wrapper to the time.Parse and time.ParseInLocation features. They work by translating the foreign language value to English before calling the standard parsing functions.

The main lunes differences are:

1 - Performance is considerably better, ~13x faster for complete .Parse operations:

BenchmarkParseLunes-10                2707008	       429.7 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseMonday-10                212086	      5630   ns/op	    3753 B/op	     117 allocs/op

BenchmarkParseInLocationLunes-10      2777323	       429.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseInLocationMonday-10      210357	      5596   ns/op	    3754 B/op	     117 allocs/op

Given ParseLocalizedStrptime uses lunes.Translate under the hood, its performance is similar to the existing ParseStrptime, adding an extra overhead of ~303 ns/op for translating the value before parsing:

BenchmarkTranslate-10            3591234	       302.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseLocalizedStrptime-10    	  821572	    1405 ns/op	     429 B/op	       9 allocs/op
BenchmarkParseStrptime-10             	 1000000	    1082 ns/op	     186 B/op	       6 allocs/op

2 - Translations are based on the CLDR project, and it does support 900+ locales (vs 45+), including locales in draft stage. Those lunes translations are generated from a CLDR core file, and does not require manually changes.

3 - Replicates all the relevant standard time.format_test.go test cases, parsing foreign language values with and without layout replacements in all available locales and supported layouts (https://github.com/elastic/lunes/blob/main/lunes_test.go#L154).

4 - It is actively maintained (hosted under elastic repo).

Link to tracking Issue: #32977

Testing:

  • Added unit tests

For now, the only way of manually testing this functionality is by invoking the ParseLocalizedStrptime function manually through tests.

Documentation:

Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 16, 2024
@edmocosta
Copy link
Contributor Author

Hi @atoulme! Any thoughts on this PR? Thanks!

@github-actions github-actions bot removed the Stale label Aug 20, 2024
Copy link
Contributor

github-actions bot commented Sep 3, 2024

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 3, 2024
@edmocosta
Copy link
Contributor Author

Hey @TylerHelmuth! Considering you were part of this functionality's discussion at #32140, I would appreciate your feedback and guidance here. This PR adds a new function into the timeutils that supports parsing time string in foreign language (3rd party library). The purpose of this function is later adding locale support to the ottl time function. Thanks!

@github-actions github-actions bot removed the Stale label Sep 4, 2024
@TylerHelmuth
Copy link
Member

@edmocosta take care of the conflicts then we'll be good to merge.

@edmocosta
Copy link
Contributor Author

@edmocosta take care of the conflicts then we'll be good to merge.

Hi @TylerHelmuth! I've updated the branch and resolved the conflicts, but now the CI is failing, it seems to be missing a make gotidy , should I execute it and commit the 104 files it changes?

@TylerHelmuth
Copy link
Member

@edmocosta yes. It is likely that adding the new package to internal/coreinternal will result in a lot of indirects

@edmocosta
Copy link
Contributor Author

@edmocosta yes. It is likely that adding the new package to internal/coreinternal will result in a lot of indirects

Done! thanks for reviewing, @TylerHelmuth!

@TylerHelmuth TylerHelmuth merged commit 25cb194 into open-telemetry:main Sep 6, 2024
156 checks passed
@github-actions github-actions bot added this to the next release milestone Sep 6, 2024
f7o pushed a commit to f7o/opentelemetry-collector-contrib that referenced this pull request Sep 12, 2024
…pen-telemetry#34353)

**Description:** <Describe what has changed.>
Added support for localized time parsing into the
`coreinternal/timeutils` package.

The [tracking
issue](open-telemetry#32977)
is a following up to
open-telemetry#32140,
and the added function (`ParseLocalizedStrptime`) can be used later to
[add locale
support](open-telemetry#32978)
to the ottl `Time` function.

As discussed in the related issues, the plan is to have a similar
support as implemented by the library
[monday](https://github.com/goodsign/monday), making it possible to
parse non-english time strings into `time.Time` objects.
Elastic has built a new OSS library for that same purpose
([lunes](https://github.com/elastic/lunes)), that considering the
discussed requirements, seems to be a better option than `monday`. Both
libraries are just wrapper to the `time.Parse` and
`time.ParseInLocation` features. They work by translating the foreign
language value to English before calling the standard parsing functions.

The main `lunes` differences are:

1 - Performance is considerably better, ~13x faster for complete .Parse
operations:

```
BenchmarkParseLunes-10                2707008	       429.7 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseMonday-10                212086	      5630   ns/op	    3753 B/op	     117 allocs/op

BenchmarkParseInLocationLunes-10      2777323	       429.4 ns/op	     220 B/op	       5 allocs/op
BenchmarkParseInLocationMonday-10      210357	      5596   ns/op	    3754 B/op	     117 allocs/op
```

Given `ParseLocalizedStrptime` uses `lunes.Translate` under the hood,
its performance is similar to the existing `ParseStrptime`, adding an
extra overhead of ~303 ns/op for translating the value before parsing:

```
BenchmarkTranslate-10            3591234	       302.4 ns/op	     220 B/op	       5 allocs/op
```

```
BenchmarkParseLocalizedStrptime-10    	  821572	    1405 ns/op	     429 B/op	       9 allocs/op
BenchmarkParseStrptime-10             	 1000000	    1082 ns/op	     186 B/op	       6 allocs/op
```

2 - Translations are based on the [CLDR](https://cldr.unicode.org/)
project, and it does support 900+ locales (vs 45+), including locales in
draft stage. Those lunes translations are
[generated](https://github.com/elastic/lunes/blob/main/generator.go)
from a CLDR core file, and does not require manually changes.

3 - Replicates all the relevant standard `time.format_test.go` test
cases, parsing foreign language values with and without layout
replacements in all available locales and supported layouts
(https://github.com/elastic/lunes/blob/main/lunes_test.go#L154).

4 - It is actively maintained (hosted under elastic repo).

**Link to tracking Issue:**
open-telemetry#32977

**Testing:** 
- Added unit tests

For now, the only way of manually testing this functionality is by
invoking the `ParseLocalizedStrptime` function manually through tests.

**Documentation:**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants