Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set of UA strings needs more explanation #21

Closed
domenic opened this issue Jan 14, 2020 · 14 comments · Fixed by #46 or #95
Closed

Set of UA strings needs more explanation #21

domenic opened this issue Jan 14, 2020 · 14 comments · Fixed by #46 or #95

Comments

@domenic
Copy link
Collaborator

domenic commented Jan 14, 2020

https://github.com/WICG/ua-client-hints#should-the-ua-string-be-a-set and https://wicg.github.io/ua-client-hints/#abstract-opdef-set-the-sec-ch-ua-header-for-a-request step 5 indicate that instead of a single "Brand Version" string, some clients may sometimes send "Brand Version, OtherBrand OtherVersion".

The explainer section doesn't do a good job explaining why any user agent would possibly do this, or why it would work.

Reading between the lines, the idea might be that this is an escape valve so that e.g. Edge 79 could lie and say that it's Edge 79, Chrome 79 or Chrome 79, Edge 79? And then Edge would have to hope that Chrome does the same thing, cooperatively, otherwise sites would just always assume those two strings mean "Edge"? So this is kind of a way of allowing browsers to advertise that they're part of a (rendering-engine-based) equivalence class?

Is the intention that Edge would do this every time it sent Sec-CH-UA, or just sometimes? (GREASE seems to be "sometimes".)

Or is the README's more-random examples actually what is intended? Will there be a list of totally-fake browser names that people start using, like the literal example NotBrowser? Will this list of fake names be shared (standardized?), or will each browser vendor make up their own?

@scottlow
Copy link

scottlow commented Jan 15, 2020

So this is kind of a way of allowing browsers to advertise that they're part of a (rendering-engine-based) equivalence class?

I came here to ask this as well.

If the intention is to allow browsers to indicate that they're part of the same rendering engine equivalence class, I think it would be more clear to have this represented in a Sec-CH-UA-Engine client hint (as mentioned in the explainer). Assuming Sec-CH-UA-Engine exposed some major version number (or perhaps this would come from Sec-CH-UA-Version per #7), this would give sites the ability to detect specific platform deltas using Sec-CH-UA-Engine and the ability to detect specific browser when necessary using the Sec-CH-UA hint.

Is the intention that Edge would do this every time it sent Sec-CH-UA, or just sometimes? (GREASE seems to be "sometimes".)

It'd also be great to understand what the spec would/would not mandate here. For example, I can imagine a browser wanting to lie about being another browser for compatibility reasons unless a developer explicitly asked for more information by using Accept-CH: UA. As I interpret the spec today, I'm not certain that's allowed.

@zclifford
Copy link

zclifford commented Jan 15, 2020

For what it's worth, the set behavior may also be useful for bots / crawlers to identify themselves.

Currently bots typically stick their name somewhere in the middle of a legacy User-Agent string that's otherwise similar to the browser they want to emulate.
(For example Bingbot, Googlebot)

And this behavior of including the robot name in the UA is actually strongly suggested by the draft robots.tx standard:

The product token SHOULD be part of the identification string that the crawler sends to the service (for example, in the case of HTTP, the product name SHOULD be in the user-agent header).

With the new spec, if multiple brands & versions are allowed, then the header could convey both the browser the robot wants to emulate, and the name of the robot itself.

@domenic
Copy link
Collaborator Author

domenic commented Jan 23, 2020

I don't think this issue was solved. The examples still are only about the mechanics of this, not the why. And they give unrealistic examples. They don't answer the OP's question.

@domenic domenic reopened this Jan 23, 2020
@yoavweiss
Copy link
Collaborator

@domenic - fair enough

Reading between the lines, the idea might be that this is an escape valve so that e.g. Edge 79 could lie and say that it's Edge 79, Chrome 79 or Chrome 79, Edge 79? And then Edge would have to hope that Chrome does the same thing, cooperatively, otherwise sites would just always assume those two strings mean "Edge"? So this is kind of a way of allowing browsers to advertise that they're part of a (rendering-engine-based) equivalence class?

Yes, the idea is to enable expression of equivalence sets, while trying to reduce the possibility of strict comparisons.

Is the intention that Edge would do this every time it sent Sec-CH-UA, or just sometimes? (GREASE seems to be "sometimes".)

The expectation is that browsers that don't typically find themselves on block-lists will be responsible to do the GREASEing, in order to discourage sites from blocking other browsers.

Or is the README's more-random examples actually what is intended? Will there be a list of totally-fake browser names that people start using, like the literal example NotBrowser? Will this list of fake names be shared (standardized?), or will each browser vendor make up their own?

I don't expect the random examples to be standardized. I can see such examples appearing in order to prevent "If there's a browser I don't know, I'll block it" type of logic on the server side.

@yoavweiss
Copy link
Collaborator

@scottlow

If the intention is to allow browsers to indicate that they're part of the same rendering engine equivalence class, I think it would be more clear to have this represented in a Sec-CH-UA-Engine client hint (as mentioned in the explainer). Assuming Sec-CH-UA-Engine exposed some major version number (or perhaps this would come from Sec-CH-UA-Version per #7), this would give sites the ability to detect specific platform deltas using Sec-CH-UA-Engine and the ability to detect specific browser when necessary using the Sec-CH-UA hint.

Having an explicit "Engine" hint runs a risk of making it harder for different browsers that are running on the same engine to do different things (e.g. turn on different flags by default). Maybe the set for e.g. Chromium based browsers should include "Chromium" as well. but not having an explicit engine hint gives us flexibility to include or remove such hints, as web compat evolves.

It'd also be great to understand what the spec would/would not mandate here. For example, I can imagine a browser wanting to lie about being another browser for compatibility reasons unless a developer explicitly asked for more information by using Accept-CH: UA. As I interpret the spec today, I'm not certain that's allowed.

That's interesting and not something I considered. A "real UA" hint? :)
Maybe we can open a separate issue to discuss that?

@amtunlimited
Copy link
Contributor

I would assume that if a browser wanted to lie about which browser (i.e. Vivaldi not advertising as such) that it also wouldn't do so if the server asked nicely, but I could be wrong.

My hope is that if a browser knows it's generally compatible with Chromium-based browsers, then it would generally be compatible with browser with "Chromium" in it's set. Conversely, if it's seeing a lot of errors in it's logs, it could take an intersection of the sets to find a connection if it's deeper than "Chromium" (i.e. if Opera ships with a feature turned off by default that Chrome usually has on)

@scottlow
Copy link

scottlow commented Jan 25, 2020

Having an explicit "Engine" hint runs a risk of making it harder for different browsers that are running on the same engine to do different things (e.g. turn on different flags by default). Maybe the set for e.g. Chromium based browsers should include "Chromium" as well. but not having an explicit engine hint gives us flexibility to include or remove such hints, as web compat evolves.

It feels like adding Chromium in the set is similar to adding cruft-for-compatibility like KHTML, like Gecko in our UAs today. For example, I could imagine a world where all browsers add Chromium to their set so they get code that's compatible with popular Chromium-based browsers.

I'm likely writing this from a biased perspective, since we've been (and are continuing to work through) numerous UA-related issues on sites, but the bugs we've seen so have have fallen largely into two categories:

  1. Bugs where our new UA token is not added to an allow list of tokens maintained by a site - These bugs are common and are cases where it seems that developers are actually trying to target equivalence classes of browsers by using individual browser names since no better mechanism currently exists. Once our token is added to a site's allow list, all functionality works as expected since we are extremely similar to all other Chromium-based browsers. It sounds like Vivaldi also ran into these types of issues, which is part of what drove their decision to remove their UA token.

    We're fortunate enough to have a dedicated outreach team that is able to connect with site maintainers and get these allow lists updated to include our UA token. I imagine, however, that not all browsers can afford this luxury. As a result, I'm interested in working together to find a way that makes browser equivalence class targeting more prevalent than individual browser targeting.

  2. Bugs where there is a need to detect individual browsers - While less common, these bugs result in cases such as inaccurate strings in security mails, which will read "Did you just sign in with Chrome?" instead of "Did you just sign in with Edge?" This is one legitimate use case of individual browser detection, but there are others (such as third party share tracking) as well.

As I touched on above, I believe that any solution we build here should aim to discourage individual browser detection unless absolutely necessary and move developers towards equivalence class detection/feature detection instead.

What I'm currently envisioning (which would be great to discuss either here or in a separate issue) is a sort of tiered approach which would likely need to be mandated by spec in order to be effective:

  • A Sec-CH-UA-Engine hint that would be encouraged as the main way for developers to detect browsers (if necessary)
    • I believe the spec would need to mandate that this CH MUST expose the engine that the browser is built on and not lie
    • Perhaps (as @arvind-m suggested in Why browser brand and not engine brand? #29) this even becomes the CH that we expose by default instead of Sec-CH-UA to encourage developers to move away from per-browser detection. I'd want us to solicit more developer feedback here before making any decisions though.
    • For engines that have different flags enabled by default, my hope is that such deltas could be feature detected without requiring additional client hints.
  • In the event that behavior enabled/disabled by flags is not feature detectable (such as SameSite cookie changes), then a site that requires a unique browser/version identifier could ask to receive the Sec-CH-UA CH, which could provide this level of granularity should the UA decide they want to expose it.

@yoavweiss, thoughts?

@yoavweiss
Copy link
Collaborator

Created #4 to attempt a more realistic explanation.

It feels like adding Chromium in the set is similar to adding cruft-for-compatibility like KHTML, like Gecko in our UAs today. For example, I could imagine a world where all browsers add Chromium to their set so they get code that's compatible with popular Chromium-based browsers.

I agree it smells similar. One difference might arise if Chrome itself did not add it every single time.

I'm likely writing this from a biased perspective, since we've been (and are continuing to work through) numerous UA-related issues on sites

You spelled "experience" wrong :)

  1. Bugs where our new UA token is not added to an allow list of tokens maintained by a site

Having UA be a set and having that set include words servers necessarily do not understand would (hopefully) result in servers avoid block-listing new UAs that also include their well-known "equivalence class".

2. Bugs where there is a need to detect individual browsers

Having UA also include their own string in the set, on top of their equivalence class would hopefully help servers implement that properly, even if on a browser-by-browser basis. (that is, they'd have to distinguish "Chrome";v="65", "Edge";v="65" from "Chrome";v="65", "Giberish"; v="33b")

I worry that a "give me your really real UA string, plz" API will end up being abused for block-listing, so I'd be hesitant to provide one...

  • A Sec-CH-UA-Engine hint that would be encouraged as the main way for developers to detect browsers

If we were to adopt such a scheme in 2013, Chromiums would now say "WebKit, Chromium". So, at the end, I'm not sure that compat forces operating on such a hint would be different from those operating on the browser name.

  • I believe the spec would need to mandate that this CH MUST expose the engine that the browser is built on and not lie

Prescriptive standards don't work when they are contrasted by user pain. I suspect that minority browsers, when faced with that dilemma, would not choose loosing users in order to be spec compliant :/

  • For engines that have different flags enabled by default, my hope is that such deltas could be feature detected without requiring additional client hints.

That can hopefully work for most features, but I'm not sure that is something that would always satisfy, e.g. services like polyfill.io.

@gdh1995
Copy link

gdh1995 commented Jan 27, 2020

How can I get a robust & reliable & real major version?

Background: My web extension relies on Chrome version (from the current user-agent string) to work around some of Chromium bugs when the extension scripts get loading. Currently I can specify that my scripts run on document start and do (patch) whatever I want in a pure fresh environment.

Problem: If brand returns [{Edge, 79}, {Chrome, 78}], then which one should I trust ? Now that a browser can choose whatever it thinks, do I have to collect all the potential list, and fallback to check a group of special browser actions and HTMLElement properties (hand-written and hard-encoded, of course), in order to just choose a real one?

@yoavweiss
Copy link
Collaborator

I'd expect the major versions to align, so e.g. don't expect Edge to lie about the relative Chrome version that it considers to be in its equivalence class.

@gdh1995
Copy link

gdh1995 commented Jan 27, 2020

I'd expect the major versions to align, so e.g. don't expect Edge to lie about the relative Chrome version that it considers to be in its equivalence class.

So will all browsers with Chromium cores in a same major version share report a same major version, and all Firefox-like browsers share another? Then my extension will be somehow "possible to implement" again, though the related code will indeed be much uglier (than the current to detect a real major version).

@yoavweiss
Copy link
Collaborator

So will all browsers with Chromium cores in a same major version share a same major version

Not necessarily, but I wouldn't expect them to lie about the Chrome version that they are equivalent to.

@yoavweiss
Copy link
Collaborator

@domenic - can you take a look at the examples added in #49 and let me know if they address your concerns here?

@domenic
Copy link
Collaborator Author

domenic commented Mar 9, 2020

I'm happy with the examples. However I'll node that it says NavigatorUAData.brand when I think it should say navigator.userAgentData.uaList.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants