[feature] Allow customizable request filtering by user-agent
#1296
Labels
config
Something needs to be made configurable, or there's a config issue
enhancement
New feature or request
security
Milestone
Right now we have an http middleware which aborts incoming http requests that don't have any user-agent set on them, returning code 418 I'm A Teapot, in order to force http callers to at least provide some kind of identification in order to use the API (though this identification is ofc not reliable since it can be trivially spoofed).
However, since
user-agent
is not actually a required header, and only a 'should' (https://www.rfc-editor.org/rfc/rfc7231#section-5.5.3), then we ought to make this behavior configurable by the instance admin (see #1292) to let them choose whether empty user agents get the teapot treatment.Connected to this, we should expand the user-agent middleware to also allow admins to provide a list of regular expressions that will be evaluated against the incoming user-agent header string. This will be useful in filtering out unwanted scraping from bots with a predictable user-agent, which do not respect robots.txt or robots meta tags.
Could also be used by admins who want to completely limit interaction with other fedi softwares that use predictable user-agent strings.
Config key could be something like
advanced-user-agent-filters
, with the value as an array/slice of regex strings. The default value would, i guess, replicate the existing behavior (so just one entry, which matches empty strings)The text was updated successfully, but these errors were encountered: