Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Add comparison with pdfplumber #1837

Merged
merged 6 commits into from
May 20, 2023
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 6 additions & 4 deletions docs/meta/comparisons.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,13 +50,17 @@ than PyPDF2. See [history of pypdf](history.md).
[QPDF]: https://github.com/qpdf/qpdf


## pdfminer
## pdfminer.six and pdfplumber

[`pdfminer.six`](https://pypi.org/project/pdfminer.six/) is capable of
extracting the [font size](https://stackoverflow.com/a/69962459/562769)
/ font weight (bold-ness). It has no capabilities for writing PDF files.

## pdfrw / pdfminer / pdfplumber
[`pdfplumber`](https://pypi.org/project/pdfplumber/) is a library focused on extracting data from PDF documents. Since `pdfplumber` is built on top of `pdfminer.six`, there are **no capabilities of exporting or modifying a PDF file** (see [#440 (discussions)](https://github.com/jsvine/pdfplumber/discussions/440#discussioncomment-803880)). However, `pdfplumber` is capable of converting a PDF file into an image, [draw lines and rectangles on the image](https://github.com/jsvine/pdfplumber#drawing-methods), and save it as an image file.
Copy link

@mara004 mara004 May 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is capable of converting a PDF file into an image

From skimming the Readme, it looks like pdfplumber calls Wand for pdf rendering, which is a binding to ImageMagick, which in turn uses ghostscript, IIRC.
So this phrase is kinda misleading as pdfplumber is not an actual pdf rendering library (as opposed to mupdf/poppler/pdfium), but merely a rendering "wrapper-wrapper-wrapper".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree! It is not a PDF rendering library, there's just one function to convert the PDF into an image with the tools you mentioned. I'm not experienced with Wand, ImageMagick, and ghostscript, so if you're an expert there, feel free to elaborate more on my changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RitchieP You could rephrase

However, pdfplumber is capable of converting a PDF file into an image

to

However, pdfplumber is capable of converting a PDF file into an image via ImageMagick

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely! I'll make a PR in a bit.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


The community over at `pdfplumber` is also active in answering questions and the library is actively maintained as of now.
MartinThoma marked this conversation as resolved.
Show resolved Hide resolved

## pdfrw / pdfminer
MartinThoma marked this conversation as resolved.
Show resolved Hide resolved

I don't have experience with any of those libraries. Please add a
comparison if you know pypdf and [`pdfrw`](https://pypi.org/project/pdfrw/)!
Expand All @@ -66,8 +70,6 @@ Please be aware that there is also
Then there is [`pdfrw2`](https://pypi.org/project/pdfrw2/) which doesn't have
a large community behind it.

And there is also [`pdfplumber`](https://pypi.org/project/pdfplumber/)

## Document Generation

There are (Python) [tools to generate PDF documents](https://github.com/py-pdf/awesome-pdf#generators).
Expand Down