Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about privacy-sensitive #1

Open
zcorpan opened this issue Oct 2, 2023 · 32 comments
Open

Questions about privacy-sensitive #1

zcorpan opened this issue Oct 2, 2023 · 32 comments

Comments

@zcorpan
Copy link

zcorpan commented Oct 2, 2023

How is it determined what is privacy-sensitive?

Is it a good idea to prompt users for exposing privacy-sensitive text fragments to the page? I think most users would not understand what is being asked, which makes me think it's a bad idea to prompt.

@eligrey
Copy link
Owner

eligrey commented Oct 2, 2023

How is it determined what is privacy-sensitive?

The easiest solution is to simply maintain a well-known list of privacy-sensitive directives, pre-populated with the sole entry, text.

I think most users would not understand what is being asked, which makes me think it's a bad idea to prompt.

I don't think that users should be prompted either. I've left the option to prompt in this spec to better afford the needs of more privacy-conscious browser vendors.

A simpler option could be to not require a permission prompt but instead require secure origins for includeSensitive.

I understand that some PrivacyCG members would prefer if scroll-to-text text entries were protected. As it stands currently, there are no actual effective privacy boundaries in the scroll-to-text specification. If we can all agree that additional privacy boundaries are not necessary, I will change the includeSensitive part of the spec to require secure origins instead of a permission prompt.

@zcorpan
Copy link
Author

zcorpan commented Oct 2, 2023

OK. I think a secure context is not sufficient to expose the text directive, since we don't trust the page that has the text directive the honor the user's privacy, regardless of whether it's secure context or not. If text is always sensitive and we don't want to prompt, then this API just adds complexity with no benefit (until other directives are added, at least).

If anything, I think an opt in to reveal the text directive needs to be made by the page that created the link. (A possible workaround today would be to duplicate it in the query string, though may not be compatible for all sites.)

So then a search engine could use the text fragment feature without the "reveal" opt-in (as today), but other use cases (e.g. share link on social media) can use the opt in. But I'm not sure the extra complexity is worthwhile, and there's more risk that sites that really shouldn't opt in accidentally (or deliberately) do.

@eligrey
Copy link
Owner

eligrey commented Oct 2, 2023

Note that the text directive is currently already exposed, available either via CSS tricks (e.g. temporarily use a huge font-size & measure scroll position) or through browser quirks with performance.getEntries().

A mechanism for the link creator to 'reveal' the search text sounds interesting, although I'm not sure how that would look in terms of API ergonomics.

@zcorpan
Copy link
Author

zcorpan commented Oct 3, 2023

With scroll position you can get what the start of the match is but not the full text directive. Hopefully Chromium can fix the performance.getEntries() bug.

@noamr
Copy link

noamr commented Oct 3, 2023

It's a bit difficult to reason about privacy of text directives, since hash fragments are an old feature and text-fragment et al are new, which means that browsers that don't support them would default to exposing them to the page. This is what's happening now and people are starting to depend on this feature for all sorts of things.

Whether a directive is private or not is a great question, but perhaps it's not the browser's role to enforce it, but rather the referring site's ("the search engine" if we're talking about search terms)? To some extent, this is not very different from sending ?search_term= to the page. I would say the same about prompts etc, this should perhaps be part of the search engine's terms and conditions.

We could fix the performance.getEntries() quirk but I think the correct fix would be to expose the text directive to document.URL as we expose anything else after the '#'.

@zcorpan
Copy link
Author

zcorpan commented Oct 3, 2023

It's a bit difficult to reason about privacy of text directives, since hash fragments are an old feature and text-fragment et al are new, which means that browsers that don't support them would default to exposing them to the page. This is what's happening now and people are starting to depend on this feature for all sorts of things.

Right, but a search engine wanting to hide the search terms from the final page can feature-check for text fragment support before using it (assuming the text fragment is not exposed). If it's always exposed, search engines need to choose between using text fragments and hiding search terms.

Whether a directive is private or not is a great question, but perhaps it's not the browser's role to enforce it, but rather the referring site's ("the search engine" if we're talking about search terms)? To some extent, this is not very different from sending ?search_term= to the page.

Yes, indeed.

I would say the same about prompts etc, this should perhaps be part of the search engine's terms and conditions.

We could fix the performance.getEntries() quirk but I think the correct fix would be to expose the text directive to document.URL as we expose anything else after the '#'.

IIRC there was also a web compat reason for not exposing the fragment directive in existing APIs (some sites use the hash for client-side routing or other).

@noamr
Copy link

noamr commented Oct 3, 2023

It's a bit difficult to reason about privacy of text directives, since hash fragments are an old feature and text-fragment et al are new, which means that browsers that don't support them would default to exposing them to the page. This is what's happening now and people are starting to depend on this feature for all sorts of things.

Right, but a search engine wanting to hide the search terms from the final page can feature-check for text fragment support before using it (assuming the text fragment is not exposed). If it's always exposed, search engines need to choose between using text fragments and hiding search terms.

OK this is a good model. So I think my answer to your original post would be:

  • It should be clear using feature detection whether a certain directive is going to be consumed & hidden by the browser.
  • If a directive is hidden, it should be consistently hidden across all APIs.
  • Given that information, it's up to the page to decide whether to send that directive in links or not.

@zcorpan
Copy link
Author

zcorpan commented Oct 3, 2023

Alternatively the opt-in can be to hide the text fragment, e.g.

#:~:text=Something&hide-text-fragment-from-script

(naming TBD)

@noamr
Copy link

noamr commented Oct 3, 2023

Alternatively the opt-in can be to hide the text fragment, e.g.

#:~:text=Something&hide-text-fragment-from-script

(naming TBD)

Not sure about this, but either way feature-detection is the key here in terms of privacy (as it gives the power to the referring site, which is the one responsible), and if we add a new sub-directive, feature detection needs to handle it.

@zcorpan
Copy link
Author

zcorpan commented Oct 3, 2023

How would a search engine use text fragments and also hide the search terms with feature detection (for browsers that support text fragments)?

@noamr
Copy link

noamr commented Oct 3, 2023

How would a search engine use text fragments and also hide the search terms with feature detection (for browsers that support text fragments)?

Either we fix navigation-timing and we make it so that feature-detecting text-fragments also means that they're hidden,
or we opt-in for hiding them in addition to the text fragment itself like you suggested, and feature-detect that.

@zcorpan
Copy link
Author

zcorpan commented Oct 3, 2023

Either we fix navigation-timing and we make it so that feature-detecting text-fragments also means that they're hidden,

Having feature detection affect later behavior seems surprising!

or we opt-in for hiding them in addition to the text fragment itself like you suggested, and feature-detect that.

👍

I'll file a spec issue. Edit: WICG/scroll-to-text-fragment#234

@eligrey
Copy link
Owner

eligrey commented Oct 3, 2023

IIRC there was also a web compat reason for not exposing the fragment directive in existing APIs (some sites use the hash for client-side routing or other).

I want that compat behavior (hiding the fragment directives from existing APIs). Currently some people use this behavior to test third party widget configurations on their site without interfering with their site's routing logic.

@simon-friedberger
Copy link

simon-friedberger commented Oct 4, 2023

I am confused by several things in this discussion:

  1. Feature detection is assuming that the client where the link is generated is also the client consuming the link. But links get copied & pasted. How does this work?
  2. If a search engine wants to send information to the target site they can just use the traditional URL fragment.
  3. If a search engine wants to send you to a site and scroll to a position in that site without the site being able to detect that, there will be a lot of side-channels to fix. Because that site can just check it's scroll position. I also don't really understand why the search engine would want that. Could somebody elaborate?
  4. If there are compat issues with adding additional content to URL fragments which some sites might not understand the directives can simply be not part of the URL fragment but still accessible from script.
  5. Afaict there is no reasoning in https://wicg.github.io/scroll-to-text-fragment/ to justify this hiding: "This section describes the mechanism by which the fragment directive is hidden from script and how it fits into".

@noamr
Copy link

noamr commented Oct 5, 2023

I am confused by several things in this discussion:

  1. Feature detection is assuming that the client where the link is generated is also the client consuming the link. But links get copied & pasted. How does this work?

Good point. OTOH the client can remove the directive when copying to the clipboard etc. It's not 100% hermetic but can cover major uses.

@zcorpan
Copy link
Author

zcorpan commented Oct 5, 2023

  1. Indeed, it seems to me it's not fixable without disabling scrolling, which would regress the user experience of the feature. (Pages can opt out of scrolling of themselves, but the referring page can't.) I have assumed that the difference in fidelity (i.e. being able to tell where the match starts vs being able to access the full text directive directly) is significant enough to keep hiding the text directive (without some opt-in).

@eligrey
Copy link
Owner

eligrey commented Jun 21, 2024

In order to reduce user confusion, we can have a spec-suggested prompt description for browser vendors that choose to prompt for includeSensitive: true.

Something like "[website] wants to access in-page search terms"

@noamr
Copy link

noamr commented Jul 1, 2024

Permission prompts is usually a last resort and there's already a prompt fatigue. I don't think it would solve anything TBH. Given the previous discussion I'm back to my stance here. I don't think we should hide the text fragment directive. It doesn't do anything for privacy or for preventing website breakage.

@bokand
Copy link

bokand commented Jul 3, 2024

This behavior was introduced to prevent site-compat issues (see WICG/scroll-to-text-fragment#15) due to colliding usage of the hash. We were less worried about the privacy aspects at the time since the text snippet already appears on the incoming page and the page can infer what's highlighted via scroll position. However, since then, some security-minded folks I've discussed this with have noted that the increased fidelity of the actual text is significant (e.g. a page could infer the user's search term on the search engine if they have the exact snippet). So I think we should keep the text fragment entirely hidden.

I'm weary of re-introducing the exact scenario we invented the fragment directive for. i.e. different parties start using the fragment directive for their own purposes and then break if an unexpected thing appears there.

@eligrey - IIUC your use case is that the link can include some extra data that a third-party (w.r.t. the site itself) component (widget, extension, etc) can make use of - is that right? Rather than changing the behavior in URL parsing/stripping, what about introducing a new directive meant to carry third-party but non-UA data? e.g.

https://example.com#:~:text=foo&external(vendor,property=value)

This could be parsed and stripped from the URL but could be exposed via an API like the one proposed here. This has the advantage that it's structured and could be vendor scoped so would be less brittle. e.g.

document.fragmentDirective.items[0] //opaque text directive with value hidden to script
document.fragmentDirective.items[1] //"external" directive with readable values

@eligrey
Copy link
Owner

eligrey commented Jul 3, 2024

@bokand Your suggestions work for the 'custom directives' use case, but I also want to support 'custom scroll-to-text behavior' e.g. to enable deep links in unconventional webapps such as 2D/3D/XR experiences.

@bokand
Copy link

bokand commented Jul 3, 2024

sorry if I missed it - do you have more details on that use case?

Guessing that you want to implement a :~:text search that works on non-DOM text - that would require the webapp to cooperate (by implementing the actual search) so in that case I'm not sure why you'd need a text fragment/fragmentDirective at all...why not just pack the search term into a query parameter or ordinary fragment? Text directives were added to make this kind of use case work without the cooperation of the destination page.

@simon-friedberger
Copy link

simon-friedberger commented Jul 4, 2024

This behavior was introduced to prevent site-compat issues (see WICG/scroll-to-text-fragment#15) due to colliding usage of the hash. We were less worried about the privacy aspects at the time since the text snippet already appears on the incoming page and the page can infer what's highlighted via scroll position. However, since then, some security-minded folks I've discussed this with have noted that the increased fidelity of the actual text is significant (e.g. a page could infer the user's search term on the search engine if they have the exact snippet). So I think we should keep the text fragment entirely hidden.

@bokand Do you have an example for this increased fidelity? I understand the theoretical problem but I struggle to think of a practical use-case and I think it would be good for the discussion (and the spec) to have one.

That being said, this proposal has both the "fragment directive API" and we need to clarify if we think it is acceptable to expose the text fragment. Currently, it is supposed to be hidden. The proposal also has "custom directives", for which the situation seems more confusing.

Since they are custom, the UA has no options for determining if they are privacy sensitive. They might contain search terms, they might contain the user's address (to show local results) or the user's religion.

On the other hand, (and I think that is what @bokand is also saying), I do not understand the benefit of standardizing custom directives, the link source and link target have to agree on their meaning, therefore that meaning would have to be standardized and not just "custom".

@eligrey
Copy link
Owner

eligrey commented Jul 4, 2024

Re:

Guessing that you want to implement a :~:text search that works on non-DOM text - that would require the webapp to cooperate (by implementing the actual search) so in that case I'm not sure why you'd need a text fragment/fragmentDirective at all...why not just pack the search term into a query parameter or ordinary fragment?

and

Since they are custom, the UA has no options for determining if they are privacy sensitive. They might contain search terms, they might contain the user's address (to show local results) or the user's religion.

Custom handling for the text directive should ideally re-use the same syntax. The point is to allow websites that have non-DOM text to be able to provide deep links to users while providing some privacy protections in browsers that choose to gate includeSensitive: true with a prompt. If I navigate to a link with a text directive on a device that can only handle basic HTML, the site can provide HTML, and if the device supports XR, the site should be able to render in XR. If the text directive is limited to DOM text, then custom search functionality cannot be implemented without encoding both standard text and vendor-specific directives in unison on every shared link, just in case the user visits from an XR device.

Exposing the text directive in the manner described in this spec provides better compatibility and consistency for sites that choose to support non-DOM browsing mechanisms (e.g. canvas) and vary their content based on device capabilities.

Custom directives should not be used for privacy-sensitive commands. I'm using 'custom directives' here to mean vendor-specific, and it does requires cooperation. Vendor-specific fragment directives are used today to layer psuedo-UA command directives on top of existing websites without interfering with existing routing logic.

One example is Transcend Consent Management, which supports using vendor-specific fragment directives to provide configuration of select options and signals. This tool acts as a psuedo-user agent that is installed by website owners to control web traffic and trackers in accordance with user privacy preferences.

@simon-friedberger
Copy link

I think I do not fully understand what you are trying to achieve here.

If the data is vendor specific anyway, and requires cooperation, why not just encode it in the hash whichever way you want?

@noamr
Copy link

noamr commented Jul 4, 2024

Let's separate custom directives from the standard ones. Let's assume custom ones can be encoded by the cooperating origins (though also that might have the marginal benefit of the UA hiding them in document.URL and exposing them only in a separate API).

Re standard ones, I can see the benefit of accessing them in script and specifically text fragment, for allowing a web page to implement their own scroll-to-text. It's a valid use case but we don't have a proper privacy mitigations for it. A prompt to the user is almost never a proper privacy mitigation and definitely not in this case IMO.

@eligrey
Copy link
Owner

eligrey commented Jul 4, 2024

Note that the latest Safari now allows access to all these directives via the Navigation Timing API without a reload by the way. I am unsure if this change was done in relation to my Apple security report.

I have requested that all directives continue to be exposed in such a manner to help enable polyfilling a better API until it ships. Most major browsers engines currently have access and we can experiment here

@noamr
Copy link

noamr commented Jul 4, 2024

Note that the latest Safari now allows access to all these directives via the Navigation Timing API without a reload by the way. I am unsure if this change was done in relation to my Apple security report.

I have requested that all directives continue to be exposed in such a manner to help enable polyfilling a better API until it ships. Most major browsers engines currently have access and we can experiment here

I spoke to the webkit folks about this a few days ago. From what I understand they coincidentally have the same bug as chromium. Since this is deemed to be a privacy issue, it's likely going to be fixed in both Chromium and WebKit.

@noamr
Copy link

noamr commented Jul 4, 2024

btw I think what you're doing here is really innovative and it saddens me that the privacy tradeoffs might break it. Hopefully in the future the web platform can find a privacy-preserving way to enable custom UX on top of the text fragment.

@bokand
Copy link

bokand commented Jul 4, 2024

@bokand Do you have an example for this increased fidelity?

Different text directive values can produce the same scroll offset so inferring the targeted text from scroll position is at least probabilistic. Exposing it directly means the destination page can more reliably infer something about the referring page. e.g. on a search engine, a specific search query might have a 1:1 mapping to a generated text directive - a page could then use the text directive to reveal the user's search terms. I'm neither a search nor security engineer so I can't say how serious the consequences are here but that's the kind of push-back I've heard to exposing text directives.

If the data is vendor specific anyway, and requires cooperation, why not just encode it in the hash whichever way you want?

I think the idea here is that the URL could be modified/used by a third-party library or extension in which case it might conflict with a site's usage of the hash for routing and other purposes. Using the fragment directive guarantees that it won't.

That said, I think usage of the hash like this is becoming fairly rare so I'm not sure how important this is to solve.

Custom handling for the text directive should ideally re-use the same syntax.

I can see the benefit of this. IMHO the privacy issue here is definitely use case specific - search engines being maybe the special case. In general, the page already knows its own text so it doesn't seem to me to be revealing anything.

Perhaps a second cleartext= directive which is identical to text except it is exposed to the page could be useful? e.g. search engines (and any other sensitive use case) could continue to use text which is completely hidden but the more general case of "look at this part of the page" could use cleartext which could be exposed to the page.

e.g. in the "Copy link to text" feature in Chrome, there's no reason I can think of that the text should be hidden from the page. It could use cleartext=

@eligrey
Copy link
Owner

eligrey commented Jul 5, 2024

If cleartext was used for the same purpose (scroll-to-text/navigate-to-text), then wouldn't that also be privacy-sensitive as well? I think we just need to be okay with websites knowing explicit scroll-to-text queries so that they can self-handle the queries. A secure origin gate as suggested in this spec proposal is enough to prevent pervasive surveillance from network providers.

We should wait for Apple's public response to this issue for their latest take on the privacy implications of exposing the text fragment directive. They are still investigating my security report and I imagine that we will get a clearer picture soon.

@bokand
Copy link

bokand commented Jul 8, 2024

If cleartext was used for the same purpose (scroll-to-text/navigate-to-text), then wouldn't that also be privacy-sensitive as well?

Yes - that's the idea here. In most cases I think the text directive isn't sensitive but in a few (search, maybe others?) it can be. So this will be application-specific and I think it would make sense to let applications decide whether it needs to be strictly hidden from the page.

Note: in both cases we'd still want to strip it from the URL to avoid compat issues on pages using the fragment for routing (WICG/scroll-to-text-fragment#15). In the cleartext case it'd be ok to expose it to the page via an alternate API though.

@eligrey
Copy link
Owner

eligrey commented Aug 20, 2024

I think that hiding parts of navigational directives from navigation targets simply pushes site owners to potentially collude on common readable keys that aren't hidden (e.g. #text=... instead of #:~:text=..., affecting routing interoperability) to implement this feature with custom handling, so I believe that fragment directives should be wholly unrestricted from intentional access, just like query parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants