-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add capability to blacklist some websites and redirect them to library / Github issue #124
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #124 +/- ##
==========================================
- Coverage 56.66% 56.58% -0.08%
==========================================
Files 12 12
Lines 533 539 +6
Branches 77 78 +1
==========================================
+ Hits 302 305 +3
- Misses 229 232 +3
Partials 2 2 ☔ View full report in Codecov by Sentry. |
@Popolechien @kelson42 feedback is of course welcomed since this PR contains also quite significant "UX" parts |
I disagree with the copyright stuff. zimit is for individual copies ; we are not publishing the ZIMs. We should not go down this road IMO. If wikihow can't be zimed because of technical protections, then it should be in impossible category |
I agree that indicating that |
Not my call, feel free to suggest proper wording / configuration. I'm not particularly attached to this copyright thing at all. |
Ok, I don't really know how to input my changes into a existing commit, but I'd change |
I will take care of this. Regarding contact, are we sure we want to spread contact email in plain text? (this is usually a good way to be caught by spam) If so, this needs to be spread everywhere in the website (we have many places where we just say "contact us"). |
Fix #28
Fix #33
Changes
Details
blacklist.json
file in the repo so that it is both simpler and subject to code reviews to catch misconfigurations (as requested by @kelson42)host
is present in the URL, then it's a match) and leads to 5 distinct situations detailed below. The case insensitive matching might lead to few false positive, but it is deemed acceptable since very rare edge cases which are probably not worth any effort.already_zimed
We already have a ZIM for the URL (e.g. devdocs, freecodecamp, libretexts, wikipedia) and we want to redirect the user to the library.
Note that for websites covered by WP1, it is possible to add the WP1 hint as below (it is not shown by default). The link goes to https://wp1.openzim.org/#/selections/simple
The library link is configured in the
blacklist.json
forbid_or_copyrighted_by_website_owner
We know there is a copyright or alike issue with this website (e.g. wikihow)
too_big_partially_already_zimed
We cannot make a ZIM of such a big site, we have a dedicated scraper and already publish few ZIMs (e.g. youtube)
Note that scraper URL is optional (if it is not configured, last sentence is not shown).
scraper_needed
This website cannot be zimmed with zimit, and we have a pending scraper request.
not_possible_with_zimit
This website is known to be impossible to ZIM with zimit

Remarks:
Blacklistxxx.vue
components, but it was deemed simpler to maintain than complex if/then/else conditionsFlow
Screen.Recording.2025-02-28.at.09.32.44.mov