-
Notifications
You must be signed in to change notification settings - Fork 264
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use topics from a meta tag on Special Topics Provider Sites #206
Comments
This doesn't actually address the privacy concerns from #118. Further, it picks a single site (a rather arbitrary heuristic) as opposed to applying equally web wide, which doesn't seem particularly webby. Finally, due to filtering, there would be some benefit to all from this (global top topic selection being more refined) but one would still have to observe the user on some page with that topic in order to receive it. |
I agree that it's suboptimal to treat a single site as a special case. But as long as there is no more general approach to the YouTube problem being pursued, this would be better than nothing. Possibly other very large sites that also cover all or most topics could be special cased as well. |
I think this feature request should be interpreted as something like: "For some browser-chosen list of Special Topic Provider Sites, pages on those sites should be able to declare what Topics they are about, and those become available to everyone, as if every Topics caller had observed them. And also YouTube should be on that list." In this sense it's more like a restricted version of #1 than of #118. I don't know that I agree with this proposal! — no idea whether YouTube would be interested in being a Special Topic Provider, no idea how we would determine what other sites should have the same special status, etc. But this version seems "tricky and subtle" rather than "impossible". |
@michaelkleber That makes a lot of sense. The list doesn't have to be browser-chosen.
|
I have rewritten the text of this issue to cover Special Topics Provider Sites, as @michaelkleber suggested. This seems like a possible path forward considering that #118 was closed, and that there still appears to be interest in fairly classifying content from large, multi-topics sites. See p. 7 of CMA update report on implementation of the Privacy Sandbox commitments, April 2023 |
I think you can achieve the same effect with a default |
Don, I see you're still hoping that the browser does the work of turning the "section or channel name" into topics, rather than letting the STPS just declare the page's topics directly. Is that distinction important to you? It seems to me that the way to turn a YouTube channel name into a Topic could be very different from how you turn a hostname into a Topic. So it feels like this version of the proposal implicitly asks browsers to build a specialized STPS-to-Topics model for each Provider Site. On the one hand, that seems like putting the work in the wrong place: Surely the site is in a good position to do a better job! On the other hand, you might worry that an STPS would be able to abuse this by maliciously giving out the wrong topics — but if you're letting them control the "section or channel name" input and the model is public, then surely it would be easy for them to maliciously push false topics either way. |
Hi @michaelkleber -- I don't know. On one hand, it seems like the choice of whether or not to allow sites or channels to choose their own topics should apply to both sites and channels or to neither. Some hostnames provide usable Topics API information to the classifier, and others don't. Some YouTube channel names provide usable information to the classifier, and others don't. (For example, Jalopnik dot com is about cars, but it's a made-up word so doesn't get classified, last I checked. And the YouTube channel "LazerPig" is not about lasers or pigs. Other site and channel names have better keywords in them.) You might be able to use the same classifier for hostnames and channels/sections if STPSs had to transform the channel name into something that would be a valid hostname ("My YouTube Channel" becomes "my-youtube-channel" or similar) On the other hand, there are relatively few STPSs and it would be fairly straightforward to spot-check how accurately they were assigning topics to each channel, so it might be fine to have STPSs pass topics directly. @jkarlin Yes, that seems to be another workable option. |
Hmm, the two questions feel quite different to me. Changing a domain name is both much harder and much more user-visible than changing an invisible But a lot of this comes around to the question of what qualifications a site would need to have to be a STPS. Besides just being large and heterogeneous, if we think it would include a site being more "reputable" in some way, then perhaps that reputation would lead us to expect a lower chance being pushed useless/fabricated topics. (OTOH would you let Reddit onto the list? Seems all-but-guaranteed that some subreddits would claim a random absurd topic for each pageview.) |
@michaelkleber Yes, I agree about the Reddit problem (one of the current best international news subreddits has a deliberately embarrassing and NSFW name in an effort to avoid ads, and they would probably pass the most embarrassing possible topics too). But there are few enough STPSs that the browser (or other STPS list maintainer) could check the privacy policy for whether it covers passing best-effort accurate topics or something else, and spot-check what the site is actually passing. Some sites that are eligible to be STPSs will probably not see a reason to do it until some other party offers them an incentive to more accurately classify their audiences. In that case the other party will be in a position to require and check that the STPS is passing accurate topics, and the browser won't need to enforce. |
This strikes me as very unappealing, and we should do whatever we can to avoid ending up in that position. |
Yes, but it's less unappealing the fewer privacy policies you have to read. The number of pages and topics required for STPS status can be set high enough to keep the work on the browser (or independent evaluator) easily manageable, and not all sites eligible for STPS will apply. |
If we were to go in the direction of allowing metadata, then it might make sense to do so in a page-level opt-in way to address privacy concerns. My primary concern there is that I imagine very few pages would opt in, as it's unclear what their incentive would be. And without a significant user base, it's hard to justify the costs of training the new model and having it sit on users devices. |
Hi @jkarlin, yes, that's a good point. There are at least two scenarios in which a large, multi-topic site will choose page-level opt-in or STPS.
The first scenario is the one that seems to be the immediate problem. I know that either opt-ins or STPS would represent additional development work, but realistically considering the time required for browser development tasks compared to the time required for regulator and lawyer meetings, it seems to me that it's worth the additional time to implement Topics API in a way that takes some meaningful steps toward treating niche sites and YouTube channels in a comparable way. |
Check to see if the page is from a Special Topics Provider Site (STPS), one that hosts content on many topics (such as youtube.com). If so:
Special Topics Provider Sites could enroll, using the existing enrollment process, specifying that they want to be part of the STPS program. The browser or an independent party could crawl the site and check that the site has at least "n" pages that are classified as at least "m" different topics before adding the site to the STPS list.
(simpler solution to achieve a large fraction of the benefits of #118 with less complexity and risk)
The text was updated successfully, but these errors were encountered: