Skip to content

Inquiry for using the PDFScraper #7

Closed Answered by erikkastelec
summywong-developer asked this question in Q&A
Discussion options

You must be logged in to vote

For extraction of text data from the PDF I used pdfminer.six. If you need to extract data from tables than camelot would be a better choice.

Both of the libraries are well documented (I only have Slovene documentation for mine), but you can still take a look at how I used them in my library.

If you have PDF documents, which are in "image" form (you can't copy and paste from them) than I suggest you use Tesseractt to convert them into editable pdf format.

Replies: 2 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@erikkastelec
Comment options

Answer selected by erikkastelec
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants