Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZIM entry title should not have control characters #128

Closed
kelson42 opened this issue Nov 16, 2023 · 2 comments · Fixed by openzim/python-scraperlib#179
Closed

ZIM entry title should not have control characters #128

kelson42 opened this issue Nov 16, 2023 · 2 comments · Fixed by openzim/python-scraperlib#179
Assignees
Labels
enhancement New feature or request upstream
Milestone

Comments

@kelson42
Copy link
Contributor

Under certain circumstances, zimit/warc2zim can create "broken" ZIM entry titles, see:
kiwix/libkiwix#1020

See the ZIM specification (which has been udpated):
https://wiki.openzim.org/w/index.php?title=ZIM_file_format&type=revision&diff=31304&oldid=31263

Warc2zim should not allow that. What to do - to avoid creating a broken ZIM file - is unclear. For the moment, removing the control-characters seems to me to be the best approach.

@kelson42 kelson42 added the enhancement New feature or request label Nov 16, 2023
@benoit74 benoit74 added this to the 2.1.0 milestone May 17, 2024
@benoit74
Copy link
Collaborator

benoit74 commented Jun 18, 2024

Will be done in openzim/python-scraperlib#159 (hopefully 3.4.0)

@benoit74 benoit74 self-assigned this Jun 25, 2024
@benoit74
Copy link
Collaborator

Upstream change pending review, this will be an automatic drop of control character so no change expected in warc2zim, only updating to latest scraperlib once released should be sufficient.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request upstream
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants