Spec needs an update / 0.3 #522

masklinn · 2022-05-26T15:35:48Z

The spec as it stands can't be used as the basis for an implementation: following its descriptions / instructions means it's impossible to pass the test suite, and leads some implementations to use weird hacks.

There are two minor and one major issue in the 0.2 spec.

minor issue: `UserAgent#patch_minor`

patch_minor was added to the test file and some regexes starting at #322 but it was never added to the spec, so do that at least. It should also be added to the reference implementation eventually, probably.

minor issue: `Device#model_replacement` can't be required

The 0.2 spec states that the model field is required (either $1 should have matched content, or model_replacement should be non-empty:

If in the regex no first match (within normal brackets) is given the device_replacement together with the model_replacement shall be specified!

But doing so fails the following test case:

- user_agent_string: 'Opera/9.80 (BlackBerry; Opera Mini/7.0.31437/28.3030; U; en) Presto/2.8.119 Version/11.10'
  family: 'BlackBerry'
  brand: 'BlackBerry'
  model:

That is because this is matched by the following item:

- regex: 'Black[Bb]erry;'
  device_replacement: 'BlackBerry'
  brand_replacement: 'BlackBerry'

The item has no capturing group at all, and has no model_replacement. The item goes against spec.

major issue: os needs to be fully templated

The spec doesn't currently have a concept for the difference, but in this text I will use the following lingo:

"non-templating" is when a replacement field can not use any match group / placeholder, and is always literal
"restricted templating" is when a replacement field can only use the corresponding match group, aka what the 0.2 spec describes for UAs and OS, where the first field is group 1 and the first replacement field (resp. family_replacement and os_replacement) can only use the placeholder $1
"full templating" is when a replacement field can use any match group / placeholder between $1 and $9

The 0.2 spec specifies

the os and ua fields as being restrict-templated
the "device" fields as being fully templated
none of the fields as being non-templated

In its specification of the OS category, it defines os_replacement as matching group 1 and having $1 available, os_v1_replacement as matching group 2 and having $2 available, etc...

However OS has multiple items which are defined with os_v1_replacement using group 2, and more:

- regex: 'Mac OS X\s.{1,50}\s(\d+).(\d+).(\d+)'
  os_replacement: 'Mac OS X'
  os_v1_replacement: '$1'
  os_v2_replacement: '$2'
  os_v3_replacement: '$3'
- regex: 'Win(?:dows)? ?(95|98|3.1|NT|ME|2000|XP|Vista|7|CE)'
  os_replacement: 'Windows'
  os_v1_replacement: '$1'
- regex: '^Box.{0,200}Windows/([\d.]+);'
  os_replacement: 'Windows'
  os_v1_replacement: '$1'

This means the corresponding test cases will necessarily fail if the spec is implemented as stated. Furthermore while the implementations I've checked have extended the spec in order to handle these cases they have done so inconsistently:

the reference, python, and go implementation have simply fully templated the OS category
however the C# implementation has instead added a bunch of special casing where it shuffles the substituted groups around based on hardcoded value

The latter is extremely brittle, and I think full templating of the OS category is the proper response.

fully templating user agent

For coherence, simplicity, and clarity, I think full templating should also be extended to the user agent category, even if there is at this point no use case for it.

From what I've seen, that already seems to be the behaviour of the python and go implementations.

Furthermore far from extending the spec the reference implementation doesn't even implement the spec on the user agent category: it supports restricted templating for family_replacement, but the v1, v2, and v3 fields are non-templated when per-spec they should be restrict-templated.

unhappy with the draft

I've set forth a proposal rewrite of the spec organised thus:

introducing regexes.yaml and its component
presenting the common points of categories
a. how categories are defined
b. the parsing algorithm (as it's the same for all categories)
c. description of the regex field(s)
d. description of templated (replacement) fields and "full templating"
definition of each category
updated webidl suggestion (old definition seems to be webidl-inspired but not actually work?)

However while I think the narrative arc is fine I'm having real trouble with the writing so far, I'm far from happy about how it's written up, and would be glad for assistance.

UA spec I feel has major issues, especially that respecting it makes it impossible for tests to pass, leading to implementations having to copy one another or do their own random hack (as the C# implementation does). There are two major issues in the 0.2 spec: Replacement fields templating ----------------------------- It states that the OS fields have individual replacements, that is match `$1` is only used in `os_replacement`, match `$2` is only used in `os_v1_replacement`, etc... However there are multiple test suites which use `$1` in `os_v1_replacement` (macos, win, Box.Windows), and macOS also uses `$2` and `$3` in shifted position. The reference implementation handle this by just making the OS fields into "full" templates (that is all groups are available to all replacement fields). The C# implementation instead tries to mess around with different orders for v1 and v2 based on what it finds there. Obviously it makes sense to standardise the behaviour of the standard implementation instead of the hack of the C# lib. But for uniformity and to allow for less redundant explanations, I think it also makes sense to make the `user_agent_parsers` fields into "full" templates (that is, all groups are available) even if no parser currently uses that. In fact that's how the Python and Go implementations behave already. Side-note: far from extending the spec, the reference implementation doesn't even implement it in full as it only supports replacing `$1` in `family_replacement`, it has no support for templating at all in `v1_replacement`, `v2_replacement`, or `v3_replacement`. `Device#model_replacement` can't be required -------------------------------------------- Despite what the spec currently says, one of the user agents (`Opera/9.80 (BlackBerry; Opera Mini/7.0.31437/28.3030; U; en)`) has no capturing group and no `model_replacement`, so it's not possible to parse it per-expectation if `model_replacement` is required. As such, only `Device#device_replacement` should be required. Add `patch_minor` to user agent ------------------------------- `patch_minor` was added to the test file and some regexes starting at ua-parser#322 but it was never added to the spec, so do that at least. It should also be added to the reference implementation eventually, probably.

masklinn force-pushed the 0.3 branch from 89316dc to dd6c80e Compare April 20, 2024 19:35

masklinn marked this pull request as ready for review April 20, 2024 19:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec needs an update / 0.3 #522

Spec needs an update / 0.3 #522

masklinn commented May 26, 2022 •

edited

Loading

Spec needs an update / 0.3 #522

Are you sure you want to change the base?

Spec needs an update / 0.3 #522

Conversation

masklinn commented May 26, 2022 • edited Loading

minor issue: UserAgent#patch_minor

minor issue: Device#model_replacement can't be required

major issue: os needs to be fully templated

fully templating user agent

unhappy with the draft

masklinn commented May 26, 2022 •

edited

Loading

minor issue: `UserAgent#patch_minor`

minor issue: `Device#model_replacement` can't be required