Questions about privacy-sensitive #1

zcorpan · 2023-10-02T17:49:38Z

How is it determined what is privacy-sensitive?

Is it a good idea to prompt users for exposing privacy-sensitive text fragments to the page? I think most users would not understand what is being asked, which makes me think it's a bad idea to prompt.

eligrey · 2023-10-02T19:39:56Z

How is it determined what is privacy-sensitive?

The easiest solution is to simply maintain a well-known list of privacy-sensitive directives, pre-populated with the sole entry, text.

I think most users would not understand what is being asked, which makes me think it's a bad idea to prompt.

I don't think that users should be prompted either. I've left the option to prompt in this spec to better afford the needs of more privacy-conscious browser vendors.

A simpler option could be to not require a permission prompt but instead require secure origins for includeSensitive.

I understand that some PrivacyCG members would prefer if scroll-to-text text entries were protected. As it stands currently, there are no actual effective privacy boundaries in the scroll-to-text specification. If we can all agree that additional privacy boundaries are not necessary, I will change the includeSensitive part of the spec to require secure origins instead of a permission prompt.

zcorpan · 2023-10-02T20:11:06Z

OK. I think a secure context is not sufficient to expose the text directive, since we don't trust the page that has the text directive the honor the user's privacy, regardless of whether it's secure context or not. If text is always sensitive and we don't want to prompt, then this API just adds complexity with no benefit (until other directives are added, at least).

If anything, I think an opt in to reveal the text directive needs to be made by the page that created the link. (A possible workaround today would be to duplicate it in the query string, though may not be compatible for all sites.)

So then a search engine could use the text fragment feature without the "reveal" opt-in (as today), but other use cases (e.g. share link on social media) can use the opt in. But I'm not sure the extra complexity is worthwhile, and there's more risk that sites that really shouldn't opt in accidentally (or deliberately) do.

eligrey · 2023-10-02T20:21:03Z

Note that the text directive is currently already exposed, available either via CSS tricks (e.g. temporarily use a huge font-size & measure scroll position) or through browser quirks with performance.getEntries().

A mechanism for the link creator to 'reveal' the search text sounds interesting, although I'm not sure how that would look in terms of API ergonomics.

zcorpan · 2023-10-03T09:14:08Z

With scroll position you can get what the start of the match is but not the full text directive. Hopefully Chromium can fix the performance.getEntries() bug.

noamr · 2023-10-03T11:37:33Z

It's a bit difficult to reason about privacy of text directives, since hash fragments are an old feature and text-fragment et al are new, which means that browsers that don't support them would default to exposing them to the page. This is what's happening now and people are starting to depend on this feature for all sorts of things.

Whether a directive is private or not is a great question, but perhaps it's not the browser's role to enforce it, but rather the referring site's ("the search engine" if we're talking about search terms)? To some extent, this is not very different from sending ?search_term= to the page. I would say the same about prompts etc, this should perhaps be part of the search engine's terms and conditions.

We could fix the performance.getEntries() quirk but I think the correct fix would be to expose the text directive to document.URL as we expose anything else after the '#'.

zcorpan · 2023-10-03T11:55:16Z

It's a bit difficult to reason about privacy of text directives, since hash fragments are an old feature and text-fragment et al are new, which means that browsers that don't support them would default to exposing them to the page. This is what's happening now and people are starting to depend on this feature for all sorts of things.

Right, but a search engine wanting to hide the search terms from the final page can feature-check for text fragment support before using it (assuming the text fragment is not exposed). If it's always exposed, search engines need to choose between using text fragments and hiding search terms.

Whether a directive is private or not is a great question, but perhaps it's not the browser's role to enforce it, but rather the referring site's ("the search engine" if we're talking about search terms)? To some extent, this is not very different from sending ?search_term= to the page.

Yes, indeed.

I would say the same about prompts etc, this should perhaps be part of the search engine's terms and conditions.

We could fix the performance.getEntries() quirk but I think the correct fix would be to expose the text directive to document.URL as we expose anything else after the '#'.

IIRC there was also a web compat reason for not exposing the fragment directive in existing APIs (some sites use the hash for client-side routing or other).

noamr · 2023-10-03T12:00:21Z

It's a bit difficult to reason about privacy of text directives, since hash fragments are an old feature and text-fragment et al are new, which means that browsers that don't support them would default to exposing them to the page. This is what's happening now and people are starting to depend on this feature for all sorts of things.

Right, but a search engine wanting to hide the search terms from the final page can feature-check for text fragment support before using it (assuming the text fragment is not exposed). If it's always exposed, search engines need to choose between using text fragments and hiding search terms.

OK this is a good model. So I think my answer to your original post would be:

It should be clear using feature detection whether a certain directive is going to be consumed & hidden by the browser.
If a directive is hidden, it should be consistently hidden across all APIs.
Given that information, it's up to the page to decide whether to send that directive in links or not.

zcorpan · 2023-10-03T12:01:38Z

Alternatively the opt-in can be to hide the text fragment, e.g.

#:~:text=Something&hide-text-fragment-from-script

(naming TBD)

noamr · 2023-10-03T12:04:16Z

Alternatively the opt-in can be to hide the text fragment, e.g.
#:~:text=Something&hide-text-fragment-from-script
(naming TBD)

Not sure about this, but either way feature-detection is the key here in terms of privacy (as it gives the power to the referring site, which is the one responsible), and if we add a new sub-directive, feature detection needs to handle it.

zcorpan · 2023-10-03T12:24:07Z

How would a search engine use text fragments and also hide the search terms with feature detection (for browsers that support text fragments)?

noamr · 2023-10-03T12:43:14Z

How would a search engine use text fragments and also hide the search terms with feature detection (for browsers that support text fragments)?

Either we fix navigation-timing and we make it so that feature-detecting text-fragments also means that they're hidden,
or we opt-in for hiding them in addition to the text fragment itself like you suggested, and feature-detect that.

zcorpan · 2023-10-03T12:50:48Z

Either we fix navigation-timing and we make it so that feature-detecting text-fragments also means that they're hidden,

Having feature detection affect later behavior seems surprising!

or we opt-in for hiding them in addition to the text fragment itself like you suggested, and feature-detect that.

👍

I'll file a spec issue. Edit: WICG/scroll-to-text-fragment#234

eligrey · 2023-10-03T23:04:38Z

IIRC there was also a web compat reason for not exposing the fragment directive in existing APIs (some sites use the hash for client-side routing or other).

I want that compat behavior (hiding the fragment directives from existing APIs). Currently some people use this behavior to test third party widget configurations on their site without interfering with their site's routing logic.

simon-friedberger · 2023-10-04T19:12:05Z

I am confused by several things in this discussion:

Feature detection is assuming that the client where the link is generated is also the client consuming the link. But links get copied & pasted. How does this work?
If a search engine wants to send information to the target site they can just use the traditional URL fragment.
If a search engine wants to send you to a site and scroll to a position in that site without the site being able to detect that, there will be a lot of side-channels to fix. Because that site can just check it's scroll position. I also don't really understand why the search engine would want that. Could somebody elaborate?
If there are compat issues with adding additional content to URL fragments which some sites might not understand the directives can simply be not part of the URL fragment but still accessible from script.
Afaict there is no reasoning in https://wicg.github.io/scroll-to-text-fragment/ to justify this hiding: "This section describes the mechanism by which the fragment directive is hidden from script and how it fits into".

noamr · 2023-10-05T03:49:18Z

I am confused by several things in this discussion:

Feature detection is assuming that the client where the link is generated is also the client consuming the link. But links get copied & pasted. How does this work?

Good point. OTOH the client can remove the directive when copying to the clipboard etc. It's not 100% hermetic but can cover major uses.

zcorpan · 2023-10-05T10:08:44Z

Indeed, it seems to me it's not fixable without disabling scrolling, which would regress the user experience of the feature. (Pages can opt out of scrolling of themselves, but the referring page can't.) I have assumed that the difference in fidelity (i.e. being able to tell where the match starts vs being able to access the full text directive directly) is significant enough to keep hiding the text directive (without some opt-in).

eligrey · 2024-06-21T04:04:47Z

In order to reduce user confusion, we can have a spec-suggested prompt description for browser vendors that choose to prompt for includeSensitive: true.

Something like "[website] wants to access in-page search terms"

noamr · 2024-07-01T14:48:02Z

Permission prompts is usually a last resort and there's already a prompt fatigue. I don't think it would solve anything TBH. Given the previous discussion I'm back to my stance here. I don't think we should hide the text fragment directive. It doesn't do anything for privacy or for preventing website breakage.

bokand · 2024-07-03T18:41:58Z

This behavior was introduced to prevent site-compat issues (see WICG/scroll-to-text-fragment#15) due to colliding usage of the hash. We were less worried about the privacy aspects at the time since the text snippet already appears on the incoming page and the page can infer what's highlighted via scroll position. However, since then, some security-minded folks I've discussed this with have noted that the increased fidelity of the actual text is significant (e.g. a page could infer the user's search term on the search engine if they have the exact snippet). So I think we should keep the text fragment entirely hidden.

I'm weary of re-introducing the exact scenario we invented the fragment directive for. i.e. different parties start using the fragment directive for their own purposes and then break if an unexpected thing appears there.

@eligrey - IIUC your use case is that the link can include some extra data that a third-party (w.r.t. the site itself) component (widget, extension, etc) can make use of - is that right? Rather than changing the behavior in URL parsing/stripping, what about introducing a new directive meant to carry third-party but non-UA data? e.g.

https://example.com#:~:text=foo&external(vendor,property=value)

This could be parsed and stripped from the URL but could be exposed via an API like the one proposed here. This has the advantage that it's structured and could be vendor scoped so would be less brittle. e.g.

document.fragmentDirective.items[0] //opaque text directive with value hidden to script
document.fragmentDirective.items[1] //"external" directive with readable values

eligrey · 2024-07-03T19:18:01Z

@bokand Your suggestions work for the 'custom directives' use case, but I also want to support 'custom scroll-to-text behavior' e.g. to enable deep links in unconventional webapps such as 2D/3D/XR experiences.

bokand · 2024-07-03T19:58:37Z

sorry if I missed it - do you have more details on that use case?

Guessing that you want to implement a :~:text search that works on non-DOM text - that would require the webapp to cooperate (by implementing the actual search) so in that case I'm not sure why you'd need a text fragment/fragmentDirective at all...why not just pack the search term into a query parameter or ordinary fragment? Text directives were added to make this kind of use case work without the cooperation of the destination page.

simon-friedberger · 2024-07-04T05:28:58Z

This behavior was introduced to prevent site-compat issues (see WICG/scroll-to-text-fragment#15) due to colliding usage of the hash. We were less worried about the privacy aspects at the time since the text snippet already appears on the incoming page and the page can infer what's highlighted via scroll position. However, since then, some security-minded folks I've discussed this with have noted that the increased fidelity of the actual text is significant (e.g. a page could infer the user's search term on the search engine if they have the exact snippet). So I think we should keep the text fragment entirely hidden.

@bokand Do you have an example for this increased fidelity? I understand the theoretical problem but I struggle to think of a practical use-case and I think it would be good for the discussion (and the spec) to have one.

That being said, this proposal has both the "fragment directive API" and we need to clarify if we think it is acceptable to expose the text fragment. Currently, it is supposed to be hidden. The proposal also has "custom directives", for which the situation seems more confusing.

Since they are custom, the UA has no options for determining if they are privacy sensitive. They might contain search terms, they might contain the user's address (to show local results) or the user's religion.

On the other hand, (and I think that is what @bokand is also saying), I do not understand the benefit of standardizing custom directives, the link source and link target have to agree on their meaning, therefore that meaning would have to be standardized and not just "custom".

eligrey · 2024-07-04T07:09:32Z

Re:

Guessing that you want to implement a :~:text search that works on non-DOM text - that would require the webapp to cooperate (by implementing the actual search) so in that case I'm not sure why you'd need a text fragment/fragmentDirective at all...why not just pack the search term into a query parameter or ordinary fragment?

and

Since they are custom, the UA has no options for determining if they are privacy sensitive. They might contain search terms, they might contain the user's address (to show local results) or the user's religion.

Custom handling for the text directive should ideally re-use the same syntax. The point is to allow websites that have non-DOM text to be able to provide deep links to users while providing some privacy protections in browsers that choose to gate includeSensitive: true with a prompt. If I navigate to a link with a text directive on a device that can only handle basic HTML, the site can provide HTML, and if the device supports XR, the site should be able to render in XR. If the text directive is limited to DOM text, then custom search functionality cannot be implemented without encoding both standard text and vendor-specific directives in unison on every shared link, just in case the user visits from an XR device.

Exposing the text directive in the manner described in this spec provides better compatibility and consistency for sites that choose to support non-DOM browsing mechanisms (e.g. canvas) and vary their content based on device capabilities.

Custom directives should not be used for privacy-sensitive commands. I'm using 'custom directives' here to mean vendor-specific, and it does requires cooperation. Vendor-specific fragment directives are used today to layer psuedo-UA command directives on top of existing websites without interfering with existing routing logic.

One example is Transcend Consent Management, which supports using vendor-specific fragment directives to provide configuration of select options and signals. This tool acts as a psuedo-user agent that is installed by website owners to control web traffic and trackers in accordance with user privacy preferences.

simon-friedberger · 2024-07-04T07:35:52Z

I think I do not fully understand what you are trying to achieve here.

If the data is vendor specific anyway, and requires cooperation, why not just encode it in the hash whichever way you want?

noamr · 2024-07-04T08:02:07Z

Let's separate custom directives from the standard ones. Let's assume custom ones can be encoded by the cooperating origins (though also that might have the marginal benefit of the UA hiding them in document.URL and exposing them only in a separate API).

Re standard ones, I can see the benefit of accessing them in script and specifically text fragment, for allowing a web page to implement their own scroll-to-text. It's a valid use case but we don't have a proper privacy mitigations for it. A prompt to the user is almost never a proper privacy mitigation and definitely not in this case IMO.

eligrey · 2024-07-04T08:23:42Z

Note that the latest Safari now allows access to all these directives via the Navigation Timing API without a reload by the way. I am unsure if this change was done in relation to my Apple security report.

I have requested that all directives continue to be exposed in such a manner to help enable polyfilling a better API until it ships. Most major browsers engines currently have access and we can experiment here

noamr · 2024-07-04T08:35:49Z

Note that the latest Safari now allows access to all these directives via the Navigation Timing API without a reload by the way. I am unsure if this change was done in relation to my Apple security report.

I have requested that all directives continue to be exposed in such a manner to help enable polyfilling a better API until it ships. Most major browsers engines currently have access and we can experiment here

I spoke to the webkit folks about this a few days ago. From what I understand they coincidentally have the same bug as chromium. Since this is deemed to be a privacy issue, it's likely going to be fixed in both Chromium and WebKit.

noamr · 2024-07-04T10:28:37Z

btw I think what you're doing here is really innovative and it saddens me that the privacy tradeoffs might break it. Hopefully in the future the web platform can find a privacy-preserving way to enable custom UX on top of the text fragment.

bokand · 2024-07-04T15:31:59Z

@bokand Do you have an example for this increased fidelity?

Different text directive values can produce the same scroll offset so inferring the targeted text from scroll position is at least probabilistic. Exposing it directly means the destination page can more reliably infer something about the referring page. e.g. on a search engine, a specific search query might have a 1:1 mapping to a generated text directive - a page could then use the text directive to reveal the user's search terms. I'm neither a search nor security engineer so I can't say how serious the consequences are here but that's the kind of push-back I've heard to exposing text directives.

If the data is vendor specific anyway, and requires cooperation, why not just encode it in the hash whichever way you want?

I think the idea here is that the URL could be modified/used by a third-party library or extension in which case it might conflict with a site's usage of the hash for routing and other purposes. Using the fragment directive guarantees that it won't.

That said, I think usage of the hash like this is becoming fairly rare so I'm not sure how important this is to solve.

Custom handling for the text directive should ideally re-use the same syntax.

I can see the benefit of this. IMHO the privacy issue here is definitely use case specific - search engines being maybe the special case. In general, the page already knows its own text so it doesn't seem to me to be revealing anything.

Perhaps a second cleartext= directive which is identical to text except it is exposed to the page could be useful? e.g. search engines (and any other sensitive use case) could continue to use text which is completely hidden but the more general case of "look at this part of the page" could use cleartext which could be exposed to the page.

e.g. in the "Copy link to text" feature in Chrome, there's no reason I can think of that the text should be hidden from the page. It could use cleartext=

eligrey · 2024-07-05T22:09:02Z

If cleartext was used for the same purpose (scroll-to-text/navigate-to-text), then wouldn't that also be privacy-sensitive as well? I think we just need to be okay with websites knowing explicit scroll-to-text queries so that they can self-handle the queries. A secure origin gate as suggested in this spec proposal is enough to prevent pervasive surveillance from network providers.

We should wait for Apple's public response to this issue for their latest take on the privacy implications of exposing the text fragment directive. They are still investigating my security report and I imagine that we will get a clearer picture soon.

bokand · 2024-07-08T14:42:49Z

If cleartext was used for the same purpose (scroll-to-text/navigate-to-text), then wouldn't that also be privacy-sensitive as well?

Yes - that's the idea here. In most cases I think the text directive isn't sensitive but in a few (search, maybe others?) it can be. So this will be application-specific and I think it would make sense to let applications decide whether it needs to be strictly hidden from the page.

Note: in both cases we'd still want to strip it from the URL to avoid compat issues on pages using the fragment for routing (WICG/scroll-to-text-fragment#15). In the cleartext case it'd be ok to expose it to the page via an alternate API though.

eligrey · 2024-08-20T23:12:48Z

I think that hiding parts of navigational directives from navigation targets simply pushes site owners to potentially collude on common readable keys that aren't hidden (e.g. #text=... instead of #:~:text=..., affecting routing interoperability) to implement this feature with custom handling, so I believe that fragment directives should be wholly unrestricted from intentional access, just like query parameters.

eligrey mentioned this issue Oct 2, 2023

Fragment Directives API privacycg/proposals#40

Closed

zcorpan mentioned this issue Oct 3, 2023

Opt-in to hide text fragment directive from scripts WICG/scroll-to-text-fragment#234

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about privacy-sensitive #1

Questions about privacy-sensitive #1

zcorpan commented Oct 2, 2023

eligrey commented Oct 2, 2023 •

edited

Loading

zcorpan commented Oct 2, 2023

eligrey commented Oct 2, 2023 •

edited

Loading

zcorpan commented Oct 3, 2023

noamr commented Oct 3, 2023 •

edited

Loading

zcorpan commented Oct 3, 2023

noamr commented Oct 3, 2023

zcorpan commented Oct 3, 2023

noamr commented Oct 3, 2023

zcorpan commented Oct 3, 2023

noamr commented Oct 3, 2023

zcorpan commented Oct 3, 2023 •

edited

Loading

eligrey commented Oct 3, 2023 •

edited

Loading

simon-friedberger commented Oct 4, 2023 •

edited

Loading

noamr commented Oct 5, 2023 •

edited

Loading

zcorpan commented Oct 5, 2023

eligrey commented Jun 21, 2024 •

edited

Loading

noamr commented Jul 1, 2024

bokand commented Jul 3, 2024

eligrey commented Jul 3, 2024 •

edited

Loading

bokand commented Jul 3, 2024

simon-friedberger commented Jul 4, 2024 •

edited

Loading

eligrey commented Jul 4, 2024 •

edited

Loading

simon-friedberger commented Jul 4, 2024

noamr commented Jul 4, 2024 •

edited

Loading

eligrey commented Jul 4, 2024 •

edited

Loading

noamr commented Jul 4, 2024

noamr commented Jul 4, 2024

bokand commented Jul 4, 2024

eligrey commented Jul 5, 2024 •

edited

Loading

bokand commented Jul 8, 2024

eligrey commented Aug 20, 2024 •

edited

Loading

Questions about privacy-sensitive #1

Questions about privacy-sensitive #1

Comments

zcorpan commented Oct 2, 2023

eligrey commented Oct 2, 2023 • edited Loading

zcorpan commented Oct 2, 2023

eligrey commented Oct 2, 2023 • edited Loading

zcorpan commented Oct 3, 2023

noamr commented Oct 3, 2023 • edited Loading

zcorpan commented Oct 3, 2023

noamr commented Oct 3, 2023

zcorpan commented Oct 3, 2023

noamr commented Oct 3, 2023

zcorpan commented Oct 3, 2023

noamr commented Oct 3, 2023

zcorpan commented Oct 3, 2023 • edited Loading

eligrey commented Oct 3, 2023 • edited Loading

simon-friedberger commented Oct 4, 2023 • edited Loading

noamr commented Oct 5, 2023 • edited Loading

zcorpan commented Oct 5, 2023

eligrey commented Jun 21, 2024 • edited Loading

noamr commented Jul 1, 2024

bokand commented Jul 3, 2024

eligrey commented Jul 3, 2024 • edited Loading

bokand commented Jul 3, 2024

simon-friedberger commented Jul 4, 2024 • edited Loading

eligrey commented Jul 4, 2024 • edited Loading

simon-friedberger commented Jul 4, 2024

noamr commented Jul 4, 2024 • edited Loading

eligrey commented Jul 4, 2024 • edited Loading

noamr commented Jul 4, 2024

noamr commented Jul 4, 2024

bokand commented Jul 4, 2024

eligrey commented Jul 5, 2024 • edited Loading

bokand commented Jul 8, 2024

eligrey commented Aug 20, 2024 • edited Loading

eligrey commented Oct 2, 2023 •

edited

Loading

eligrey commented Oct 2, 2023 •

edited

Loading

noamr commented Oct 3, 2023 •

edited

Loading

zcorpan commented Oct 3, 2023 •

edited

Loading

eligrey commented Oct 3, 2023 •

edited

Loading

simon-friedberger commented Oct 4, 2023 •

edited

Loading

noamr commented Oct 5, 2023 •

edited

Loading

eligrey commented Jun 21, 2024 •

edited

Loading

eligrey commented Jul 3, 2024 •

edited

Loading

simon-friedberger commented Jul 4, 2024 •

edited

Loading

eligrey commented Jul 4, 2024 •

edited

Loading

noamr commented Jul 4, 2024 •

edited

Loading

eligrey commented Jul 4, 2024 •

edited

Loading

eligrey commented Jul 5, 2024 •

edited

Loading

eligrey commented Aug 20, 2024 •

edited

Loading