Add utility to get PDF info for proper titles on PDF entries #168

benoit74 · 2024-06-18T07:45:43Z

Content of PDF documents is not indexed for suggestions, while on some ZIM it is the "core" of the ZIM.

Having a utility in scraperlib to extract PDF info and get the document title would probably help.

See openzim/warc2zim#290 for one use-case.

benoit74 added the enhancement New feature or request label Jun 18, 2024

benoit74 added this to the 3.5.0 milestone Jun 20, 2024

benoit74 modified the milestones: 3.5.0, 4.0.0 Jul 10, 2024

benoit74 self-assigned this Jul 11, 2024

benoit74 mentioned this issue Jul 11, 2024

Add utility to index PDF documents content #167

Closed

benoit74 modified the milestones: 4.0.0, 3.5.0 Jul 15, 2024

benoit74 mentioned this issue Jul 15, 2024

Add indexdata + automatic indexing of PDF items #182

Merged

benoit74 closed this as completed in #182 Jul 30, 2024

benoit74 modified the milestones: 3.5.0, 4.0.0 Jul 30, 2024

Provide feedback