Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spec needs an update / 0.3 #522

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

masklinn
Copy link
Contributor

@masklinn masklinn commented May 26, 2022

The spec as it stands can't be used as the basis for an implementation: following its descriptions / instructions means it's impossible to pass the test suite, and leads some implementations to use weird hacks.

There are two minor and one major issue in the 0.2 spec.

minor issue: UserAgent#patch_minor

patch_minor was added to the test file and some regexes starting at #322 but it was never added to the spec, so do that at least. It should also be added to the reference implementation eventually, probably.

minor issue: Device#model_replacement can't be required

The 0.2 spec states that the model field is required (either $1 should have matched content, or model_replacement should be non-empty:

If in the regex no first match (within normal brackets) is given the device_replacement together with the model_replacement shall be specified!

But doing so fails the following test case:

- user_agent_string: 'Opera/9.80 (BlackBerry; Opera Mini/7.0.31437/28.3030; U; en) Presto/2.8.119 Version/11.10'
  family: 'BlackBerry'
  brand: 'BlackBerry'
  model:

That is because this is matched by the following item:

- regex: 'Black[Bb]erry;'
  device_replacement: 'BlackBerry'
  brand_replacement: 'BlackBerry'

The item has no capturing group at all, and has no model_replacement. The item goes against spec.

major issue: os needs to be fully templated

The spec doesn't currently have a concept for the difference, but in this text I will use the following lingo:

  • "non-templating" is when a replacement field can not use any match group / placeholder, and is always literal
  • "restricted templating" is when a replacement field can only use the corresponding match group, aka what the 0.2 spec describes for UAs and OS, where the first field is group 1 and the first replacement field (resp. family_replacement and os_replacement) can only use the placeholder $1
  • "full templating" is when a replacement field can use any match group / placeholder between $1 and $9

The 0.2 spec specifies

  • the os and ua fields as being restrict-templated
  • the "device" fields as being fully templated
  • none of the fields as being non-templated

In its specification of the OS category, it defines os_replacement as matching group 1 and having $1 available, os_v1_replacement as matching group 2 and having $2 available, etc...

However OS has multiple items which are defined with os_v1_replacement using group 2, and more:

- regex: 'Mac OS X\s.{1,50}\s(\d+).(\d+).(\d+)'
  os_replacement: 'Mac OS X'
  os_v1_replacement: '$1'
  os_v2_replacement: '$2'
  os_v3_replacement: '$3'
- regex: 'Win(?:dows)? ?(95|98|3.1|NT|ME|2000|XP|Vista|7|CE)'
  os_replacement: 'Windows'
  os_v1_replacement: '$1'
- regex: '^Box.{0,200}Windows/([\d.]+);'
  os_replacement: 'Windows'
  os_v1_replacement: '$1'

This means the corresponding test cases will necessarily fail if the spec is implemented as stated. Furthermore while the implementations I've checked have extended the spec in order to handle these cases they have done so inconsistently:

  • the reference, python, and go implementation have simply fully templated the OS category
  • however the C# implementation has instead added a bunch of special casing where it shuffles the substituted groups around based on hardcoded value

The latter is extremely brittle, and I think full templating of the OS category is the proper response.

fully templating user agent

For coherence, simplicity, and clarity, I think full templating should also be extended to the user agent category, even if there is at this point no use case for it.

From what I've seen, that already seems to be the behaviour of the python and go implementations.

Furthermore far from extending the spec the reference implementation doesn't even implement the spec on the user agent category: it supports restricted templating for family_replacement, but the v1, v2, and v3 fields are non-templated when per-spec they should be restrict-templated.

unhappy with the draft

I've set forth a proposal rewrite of the spec organised thus:

  1. introducing regexes.yaml and its component
  2. presenting the common points of categories
    a. how categories are defined
    b. the parsing algorithm (as it's the same for all categories)
    c. description of the regex field(s)
    d. description of templated (replacement) fields and "full templating"
  3. definition of each category
  4. updated webidl suggestion (old definition seems to be webidl-inspired but not actually work?)

However while I think the narrative arc is fine I'm having real trouble with the writing so far, I'm far from happy about how it's written up, and would be glad for assistance.

UA spec I feel has major issues, especially that respecting it makes
it impossible for tests to pass, leading to implementations having to
copy one another or do their own random hack (as the C# implementation
does).

There are two major issues in the 0.2 spec:

Replacement fields templating
-----------------------------

It states that the OS fields have individual replacements, that is
match `$1` is only used in `os_replacement`, match `$2` is only used
in `os_v1_replacement`, etc...

However there are multiple test suites which use `$1` in
`os_v1_replacement` (macos, win, Box.Windows), and macOS also uses
`$2` and `$3` in shifted position.

The reference implementation handle this by just making the OS fields
into "full" templates (that is all groups are available to all
replacement fields). The C# implementation instead tries to mess
around with different orders for v1 and v2 based on what it finds
there.

Obviously it makes sense to standardise the behaviour of the standard
implementation instead of the hack of the C# lib.

But for uniformity and to allow for less redundant explanations, I
think it also makes sense to make the `user_agent_parsers` fields into
"full" templates (that is, all groups are available) even if no parser
currently uses that. In fact that's how the Python and Go
implementations behave already.

Side-note: far from extending the spec, the reference implementation
doesn't even implement it in full as it only supports replacing `$1`
in `family_replacement`, it has no support for templating at all in
`v1_replacement`, `v2_replacement`, or `v3_replacement`.

`Device#model_replacement` can't be required
--------------------------------------------

Despite what the spec currently says, one of the user agents
(`Opera/9.80 (BlackBerry; Opera Mini/7.0.31437/28.3030; U; en)`) has
no capturing group and no `model_replacement`, so it's not possible to
parse it per-expectation if `model_replacement` is required.

As such, only `Device#device_replacement` should be required.

Add `patch_minor` to user agent
-------------------------------

`patch_minor` was added to the test file and some regexes starting at
 ua-parser#322 but it was never added to the spec, so do that at least. It
should also be added to the reference implementation eventually,
probably.
@masklinn masklinn marked this pull request as ready for review April 20, 2024 19:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant