Skip to content

Commit

Permalink
[chore] Update our robots.txt (#3033)
Browse files Browse the repository at this point in the history
This syncs our copy with the current state of the ai.robots.txt
repository. Upstream has tightened their scope to be AI-only, whereas
before it included a bunch of SEO and "web intelligence" marketing
stuff. I've kept those but moved them into their own section.
  • Loading branch information
daenney authored Jun 23, 2024
1 parent c273847 commit 4604224
Showing 1 changed file with 14 additions and 8 deletions.
22 changes: 14 additions & 8 deletions internal/web/robots.go
Original file line number Diff line number Diff line change
Expand Up @@ -34,32 +34,38 @@ const (
User-agent: AdsBot-Google
User-agent: Amazonbot
User-agent: anthropic-ai
User-agent: Applebot
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: Applebot-Extended
User-agent: Bytespider
User-agent: CCBot
User-agent: ChatGPT-User
User-agent: ClaudeBot
User-agent: Claude-Web
User-agent: cohere-ai
User-agent: DataForSeoBot
User-agent: Diffbot
User-agent: FacebookBot
User-agent: FriendlyCrawler
User-agent: Google-Extended
User-agent: GoogleOther
User-agent: GPTBot
User-agent: ImagesiftBot
User-agent: magpie-crawler
User-agent: Meltwater
User-agent: img2dataset
User-agent: omgili
User-agent: omgilibot
User-agent: peer39_crawler
User-agent: peer39_crawler/1.0
User-agent: PerplexityBot
User-agent: YouBot
Disallow: /
# Marketing/SEO "intelligence" data scrapers
User-agent: AwarioRssBot
User-agent: AwarioSmartBot
User-agent: DataForSeoBot
User-agent: ImagesiftBot
User-agent: magpie-crawler
User-agent: Meltwater
User-agent: PiplBot
User-agent: scoop.it
User-agent: Seekr
User-agent: YouBot
Disallow: /
# Well-known.dev crawler. Indexes stuff under /.well-known.
Expand Down

0 comments on commit 4604224

Please sign in to comment.