Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a human readable representation of duration #14028

Merged
merged 3 commits into from
Dec 4, 2024

Conversation

josevalim
Copy link
Member

Without this function, projects like Phoenix.HTML, Postgrex, and similar would have to implement this function over and over.

@josevalim
Copy link
Member Author

cc @tfiedlerdejanze

Copy link
Contributor

@tfiedlerdejanze tfiedlerdejanze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! glad we have support for ISO + human readable durations, thank you!

@eksperimental
Copy link
Contributor

eksperimental commented Dec 3, 2024

My only concern with this is that we are tying the English language to a human representation, something that we are not doing with any other kind of data from what I can remember (ie. the other representations are language-agnostic).

@wojtekmach
Copy link
Member

Similar to comment above, I think when there is a to_string/1 function there tends to be a complementary function that parses from string into that data type, and it would probably be an overkill to add one for such human friendly representation.

I think this is a good addition but I’m not sure about calling it to_string but no strong opinion.

@josevalim
Copy link
Member Author

josevalim commented Dec 3, 2024

I agree that’s a concern point but, at the same time, the ISO representation (which we follow throughout) is also very English biased. P1Y3M5D only makes sense if you know English.

@kipcole9
Copy link
Contributor

kipcole9 commented Dec 3, 2024

I agree that’s a reservation but, at the same time, the ISO representation (which we follow throughout) is also very English biased. P1Y3M5D only makes sense if you know English.

Whilst true, the standard says:

The purpose of this document is to provide a standard set of date and time format representations for information interchange, in order to minimize the risk of misinterpretation, confusion and their consequences.

This document specifies a set of date and time format representations utilizing numbers, alphabets and symbols defined in ISO/IEC 646. These representations are meant to be both human recognizable and machine readable.

Which, to my reading, emphasises the machine readable interchange as the priority.

Without this function, projects like Phoenix.HTML, Postgrex, and similar would have to implement this function over and over.

I hope that Phoenix.HTML would implement its own function (if at all), using Gettext, as it does for other presentation strings. And Ecto too if its required.

@eksperimental
Copy link
Contributor

eksperimental commented Dec 3, 2024

Similar to comment above, I think when there is a to_string/1 function there tends to be a complementary function that parses from string into that data type, and it would probably be an overkill to add one for such human friendly representation.

I think this is a good addition but I’m not sure about calling it to_string but no strong opinion.

Maybe we could introduce a new function convention as to_natural_language/2 which accepts a second argument as a string such as "en" or an anonymous function that can do the conversion to the desired language. It could play well with i18n functions.

@josevalim
Copy link
Member Author

We already have some foundation for localization in strftime. Maybe we should follow that instead.

@josevalim
Copy link
Member Author

Although I think I should clarify the goal here is to not provide full localization. In the same way that Decimal.to_string needs to choose a default representation, we need a similar one here. I think we can all agree the ISO one is not sufficient. :(

@tfiedlerdejanze
Copy link
Contributor

tfiedlerdejanze commented Dec 3, 2024

We already have some foundation for localization in strftime. Maybe we should follow that instead.

since the set of words that need to be localized is fairly small, smaller than for strftime even, a similar way to allow for localization seems appropriate (if at all imho). Something like the following, allowing to specify singular + plural words for each duration unit.

Duration.to_string(duration, unit_names: [year: {"an", "ans"}, month: {"mois", "mois"}, week: {"semaine", "semaines"}, ...])

unlike Calendar.strftime/3 this function isn't necessarily meant to be used for "end-users" afaict (matter of definition i guess :)), but rather for logging and DX which tends to be in english.

as for the function name, i agree with some of the concerns. How about Duration.format/2?

@kipcole9
Copy link
Contributor

kipcole9 commented Dec 3, 2024 via email

@josevalim
Copy link
Member Author

We cannot assume languages have only two plural forms :)

@tfiedlerdejanze
Copy link
Contributor

We cannot assume languages have only two plural forms :)

right, also 0 plural, 1 singular, N plural will of course also not work for every language... As we're not dealing with names, we'd have to allow a formatter function to allow for localization like @eksperimental suggested.

The need for localization is still arguable though. If not for something like format/2, we should stick to the english representation as presented in the PR, use ISO given it is "a standard" or as an alternative idea, use something more humanly readable than ISO while still using internationally accepted symbols, e.g.:

iex> Duration.to_string(Duration.new!(day: 40, hour: 12, minute: 42, second: 12))
"40d, 12h, 42m, 12s"

FWIW here is what other languages are doing:

@eksperimental
Copy link
Contributor

eksperimental commented Dec 3, 2024

right, also 0 plural, 1 singular, N plural will of course also not work for every language... As we're not dealing with names, we'd have to allow a formatter function to allow for localization like @eksperimental suggested.

some languages have a distinction between dual and plural (plural version of 2 is not the same as plural version of 3+). I would leave it up to the localized function the implementation details.

@kipcole9
Copy link
Contributor

kipcole9 commented Dec 4, 2024

some languages have a distinction between dual and plural

Plural rules can be complex and definitely don't seem to fit into Elixir core - and that's before considering grammatical inflection.

To me the strftime/2 approach is the most pragmatic is UX is a requirement. If it's for DX (logging for example, as noted above), then just format the struct as a keyword list - potentially sorting the keys first. Then no issue of localisation at all.

Given the overlap I expect it could even be as simple as an enhancement to Calendar.strftime/2 rather than a new function.

@josevalim
Copy link
Member Author

josevalim commented Dec 4, 2024

Thanks everyone for the valuable input. So let's start with the parts we can agree on: all Calendar data types have a to_string function, which includes a straight-forward representation of the data type. This representation does not necessarily follow a standard:

iex(4)> NaiveDateTime.to_string ~N[2024-12-04 08:58:44]
"2024-12-04 08:58:44"
iex(5)> NaiveDateTime.to_iso8601 ~N[2024-12-04 08:58:44]
"2024-12-04T08:58:44"

This information is localized but, because we use different separators for dates and times, we can get away with localizing it without relying on english words.

I believe it is important for us to have a simple and accessible representation of durations, for consistency. However, durations are more complex because we don't have a generally expected format for durations, such as using - for date components and : for time components, so we need to find a representation we all agree on.


So, what are our options here?

  1. Use english, as in this PR.

  2. Use ISO8601. Unfortunately, it is quite hard to read as a human: P3Y6M4W3DT12H30M5S.

  3. Use a "simplified" ISO8601, as in the other types, removing the P and adding some spacing:3Y6M4W3D 12H30M5S. The issue with this approach is: how do we show 5 months and 3 minutes? 5M 3M, which is ugh. ISO8601 uses single-letter uppercase for all units.

  4. Use ISO80000-3, which defines units for all components of a duration: 3a 6mo 4wk 3d 12h 30min 5s. The only potential source of confusion is using a for year, but that may be an acceptable trade-offs. We could also have units as an option, in case someone wants to customize it (and since it is only a unit, we don't have to worry about pluralization).

  5. Use ISO80000-3, as above, but default to unit: [year: :y]. I am not sure if it makes sense to choose a standard only to not follow it though.

Feel free to comment on your preferences or alternatives.

@tfiedlerdejanze
Copy link
Contributor

tfiedlerdejanze commented Dec 4, 2024

i'd be in favour of 4). I am torn about allowing to diverge from the standard, but also see no harm in adding a units option so it could be configured to read like initially proposed. Supported options could be :units and :separator.

Using ISO80000-3 for Duration.to_string/1 is good as it achieves its intention (to be more humanly readable than ISO8601 or a keyword list) but still follows a standard. I also wouldn't change the default as suggested in 5)

@josevalim
Copy link
Member Author

PR updated to follow ISO 80000-3, so we can look at it as a reference.

Copy link
Contributor

@sabiwara sabiwara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue with this approach is: how do we show 5 months and 3 minutes? 5M 3M, which is ugh.

I understand we probably won't go with ISO8601, but trying to answer based on wikipedia:

To resolve ambiguity, "P1M" is a one-month duration and "PT1M" is a one-minute duration (note the time designator, T, that precedes the time value)

...which feels too confusing for our purpose indeed. So LGTM for the proposed format and implementation.

Unrelated question, should we implement String.Chars for consistency with other structs defining a to_string/1?

lib/elixir/lib/calendar/duration.ex Outdated Show resolved Hide resolved
@eksperimental
Copy link
Contributor

I think option 4 is the standard way to go. The options accepted by the proposed implementation also makes it flexible to most needs.

@eksperimental
Copy link
Contributor

If we go with ISO 80000-3 I think we should also implement Duration.from_iso80000_3/1 and Duration.from_iso80000_3!/1

@josevalim
Copy link
Member Author

We should not implement from_iso80000_3 because it doesn't specify anything about durations, it only specifies units. It doesn't say anything about how they should be considered together, about separators, etc.

Co-authored-by: Jean Klingler <sabiwara@gmail.com>
@josevalim
Copy link
Member Author

If @kipcole9 and @wojtekmach are happy, I will ship this.

@kipcole9
Copy link
Contributor

kipcole9 commented Dec 4, 2024

Thanks for asking @josevalim. Looks good to me.

Copy link
Member

@wojtekmach wojtekmach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a great compromise. I just have one question about negative durations.

lib/elixir/lib/calendar/duration.ex Show resolved Hide resolved
@josevalim josevalim merged commit 4328b09 into main Dec 4, 2024
22 of 24 checks passed
@josevalim
Copy link
Member Author

💚 💙 💜 💛 ❤️

@josevalim josevalim deleted the jv-add-duration-to-string branch December 4, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

6 participants