Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for negative lists #2089

Closed
Popolechien opened this issue Sep 24, 2024 · 1 comment
Closed

Allow for negative lists #2089

Popolechien opened this issue Sep 24, 2024 · 1 comment
Assignees
Milestone

Comments

@Popolechien
Copy link

Popolechien commented Sep 24, 2024

Currently WP1 allows us to build a zim file based off a selection of articles. But we have an increasing number of deployments where would-be institutional users need to back out because some elements are deemed inappropriate:

  • prisons, where apparently inmates will look up their fellow prisoners and shiv whoever has an entry to their name for sexual crimes;
  • schools, where kids' first instinct is to look up 69, panda porn and so on.

Ideally we should be able to run a bespoke recipe and attach a .tsv list of articles that would be skipped entirely during the scraping process. Having looked at the rapist issue I would actually recommend that all articles containing the given strings be omitted, so that even if there is no proper article available even a cursory mention could not be searched for.

@kelson42
Copy link
Collaborator

kelson42 commented Sep 24, 2024

This is called "Article List to ignore" and this is already implemented.

Duplicate of #1706

@kelson42 kelson42 self-assigned this Sep 24, 2024
@kelson42 kelson42 added this to the 1.14.0 milestone Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants