-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should sites be able to set their own topics via response headers? #1
Comments
Would be awesome if, if/when this happens, we could replace “response headers” with Schema.org metadata. |
Hi, all! The way it is proposed, the old fight between subdomains x directories would "come back". Now not for SEO, but for advertising. And there are already many publishers using directories with only one domain. |
Sites that are misclassified because they have some pages with a different or atypical topic could label those pages as a separate section, allowing for the top-level section to be more representative of the general topics on the site. Breaking pages out into a section would be less risky than manual topics, because the classifier is still in the loop. See #17 |
Seems acceptable that they might be able to set their own Topics, or at least suggest one. Not sure what the benefit to site owners would be though unless the Topics classification is repurposed (unless I'm missing something). I'd suggest websites should have the option of opting-out of Topics too (or ideally, having to opt-in). Again, not sure of the benefit to the site owners in all but extreme cases, where customers are blindly loyal and are marketed to by competitors for the first time, but it should still be possible. There's nothing stopping classification of websites by means of text processing so it's a circular argument. I'm sure site owners would appreciate the mechanism though. |
One of the risks of allowing sites to set their own topics is that colluding groups of deceptive or low-engagement sites will claim topics that are associated with high ad revenue. A site would be able to artificially get more lucrative ads by running some user workflows through a page on a different domain that claimed a better set of topics than the user originally had. Requiring a minimum number of visits to pages with a given topic is another way to address this risk. See #19 |
In the same vein as the above over-generalization risks, mis-classification risks and self-attributed misleading classification risks that can all impact marketer effectiveness that correlates to publisher revenues, this seems to bringing up the unsettled question of determining "quality." Marketers are trying to match their content to the "right" audience, which is not adequately defined by the sector of goods/services they compete within. According to the IAB Content Taxonomy the following URL (https://www.edmunds.com/tesla/sedan) could be reasonably be classified with 6 IDs, each of which might appeal to a different characteristic of a prospective buyer:
Which is the "right" topic to assign to this page or an interest for someone who interacts with content like this "enough" to best match a given marketer's ad? |
Is there not a risk of colluding groups of high-engagement sites playing the same game? It does seem possible to prevent a site from directly gaining from the topics it suggests by not allowing the topics the site suggests to be returned in calls to the API on that site. But the colluding sites issue still remains. |
I agree. I don't see how it would be practical to let sites assign their own topics. Too many opportunities for topic manipulation by colluding sites. (It does makes sense for users to be able to install extensions that would zap topics they have a problem with and/or add topics they are actively interested in getting ads about: #25) |
There's definitely a risk associated with that. Maybe the solution is that
a site 'suggested' Topic (or Topics) isn't a guarantee of the setting? I'm
not sure of the exact mechanics but maybe if there's enough of a semantic
link between the site/page content and the 'suggested' Topic, then it's
adopted, otherwise ignored. Or in cases where the signals for the inferred
Topic are weak, there's a higher likelihood of the 'suggested' Topic being
adopted.
…On Sat, 29 Jan 2022, 2:56 am Don Marti, ***@***.***> wrote:
I agree. I don't see how it would be practical to let sites assign their
own topics. Too many opportunities for topic manipulation by colluding
sites.
(It does makes sense for users to be able to install extensions that would
zap topics they have a problem with and/or add topics they are actively
interested in getting ads about: #25
<#25>)
—
Reply to this email directly, view it on GitHub
<#1 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACPEH6WNC6IAGD6RPKOL3R3UYLDCHANCNFSM5MQRPF4Q>
.
Triage notifications on the go with GitHub Mobile for iOS
<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675>
or Android
<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.
You are receiving this because you commented.Message ID:
***@***.***>
|
Would love to have a well-known mechanism for sites to "suggest" a set of topics. If and how the browser factors them into the algorithm can be left as an intentional black box, to allow for anti-collusion / spam, etc., but ideally, it would serve as an input into the decision process. In particular, might be useful for sites with non-descriptive or non-obvious hostnames, etc. In terms of the signaling method, ideally, there should be a response header and an equivalent |
It is safe to assume a meaningful subset of folks will do anything they can to make their pages as valuable as possible and that most folks who enable the API will look at ways to "optimize" its impact, the incentive is to be valuable, not accurate. The result will presumably be that self-definitions fall somewhere between very accurate and very inaccurate and would likely be deemed too unreliable to be trusted unless there was some sort of validation and quality rating. It is analogous to the difficulty with publisher-supplied page signals like meta-tags and descriptions, which run the gamut from very trustworthy to totally unreliable. However, where with publisher-supplied signals a buyer can check pages, develop quality scores for domains and ignore page signals from unreliable sources, with Topics consumers of the signal aren't allowed to know the domains a given browser has based the Topic assignment on and so has no means of gauging the trustworthiness of the Topics signal for that browser. |
The classifier is likely to be wrong from time to time and sites might which to adjust the topics returned for their site. One way to accomplish that is to allow sites to set their own topics via response headers.
The concern with this is if sites decide that some topics are more valuable than others, and decide to only list valuable topics, polluting the input to the API. How real is this risk?
The text was updated successfully, but these errors were encountered: