[community poll] who is using PyPDF2? #659

MartinThoma · 2022-04-05T22:09:40Z

I'll try to revive PyPDF2 in the next weeks. One part that is important to me is not to break existing / working features. Hence I'll introduce CI and will check the unit tests we have.

In order to test properly / think about a deprecation strategy, I need to have a grasp of who is using PyPDF2.

Please share:

operating system
python version
what you use PyPDF2 for (reading/writing/manipulating PDFs? Maybe with a few details? What are core features to you? )

Joshua-IRT · 2022-04-06T00:06:06Z

OS is RHEL 7 and 8
Python 2.7.5 (blame RHEL7) and 3.6+
I am using PyPDF2 (with a bunch of my own hacks) for populating PDF forms for both internal and client-facing purposes. Being able to set things like form field read/write, text colour (some of the fields are colour-coded based on a priority ranking), etc. are key things we use. I think I had to extend PyPDF2 to make this work, so I may be able to add these as a PR in future.

hvbtup · 2022-04-06T15:26:54Z

I'm using it on Windows and Linux, with Python 2.7 and with Python 3.6 and 3.10.
I made some little changes in my fork in order to support to an additional "background" argument in the various mergePage methods. The effect is that in mergePage, the new page can be used as a watermark ("under the existing content").

I noticed a bug in my fork just a few minutes ago.

Feel free to include my changes here under the existing license if you like (but please wait until I've fixed that bug, which will happen next week).

I'm using PyPDF for two purposes:
a) concatenating PDFs
b) merging pages
We are also using commercial software for these purposes, but for "smaller" installations PyPDF2 suffices.

hvbtup · 2022-04-06T15:43:27Z

I noticed a bug in my fork just a few minutes ago.

Fixed.

pubpub-zz · 2022-04-06T17:09:26Z

I've used pypdf4 with some fixes, evolutions.
I used it also to extract/merge comments

MartinThoma · 2022-04-06T17:32:30Z

I've just added a test for extracting comments from PDFs. Thank you @pubpub-zz :-)

MartinThoma · 2022-04-06T17:33:52Z

Python 2.7.5 (blame RHEL7) and 3.6

@Joshua-IRT Uh, I was hoping that 3.6 and older isn't used any longer 🙈 Is pyenv an option for you?

MasterOdin · 2022-04-06T18:22:31Z

Does the code in master still work on 2.7 and 3.3+ per what's documented in tox? I think fine to ignore 2.6 at least as it's definitely long dead compared to 2.7. Dropping those versions is a major version increase, which is fine, but might be good to make a 1.x release that is compatible with the code that has been merged since the last release if possible.

vteran93 · 2022-04-06T18:36:40Z

Hi,

AWS Lambda with Serverless framework
python 3.8.10 using poetry
reading/writing/manipulating PDFs

MartinThoma · 2022-04-06T18:38:58Z

I would drop official support for Python 3.5 and older. Most major projects did that a while ago (Django, Flask, Pandas, ...)

My plan was to make one or two minor releases with many of the current PRs. Then deprecate 3.6 and older. If there are still many 3.6 users who cannot switch, maybe we can keep the support there. But I would like to avoid that.

What do you think @MasterOdin ?

LightningMan711 · 2022-04-06T21:04:52Z

Windows whatever the current is (10 I think)
Python 3.8.2 (mostly; I still have code that needs 2.7)
Reading pages, manipulating bookmarks.

LightningMan711 · 2022-04-06T21:05:51Z

I also must say that I am glad this project is being revived and updated.

Joshua-IRT · 2022-04-06T23:22:50Z

Python 2.7.5 (blame RHEL7) and 3.6

@Joshua-IRT Uh, I was hoping that 3.6 and older isn't used any longer 🙈 Is pyenv an option for you?

Sorry, Python 3.6.8 is the main version on RHEL8. I have not tried pyenv on our systems yet, but can certainly look into it.

Since Python 3.6 is end-of-life, though, I can certainly understand the desire not to support it (and agree with it). I guess implement what is easy to do in 3.6, make a final release that supports it, then deprecate. I can continue to use my hacked-together version until I get around to setting up pyenv, at least.

TZanke · 2022-04-07T07:30:19Z

openSUSE Leap 15.x
python 3.6.x

reading
writing
merging
edit&create metadata with PyPDF2.generic.X

Nice to see progress again, we started using PyMuPDF for new code (if possible) cause PyPDF2's lack of fixed bugs.

johns1c · 2022-04-07T10:47:55Z

Windows 11
Python 3.8

extracting text from received documents
extracting jpeg images and then running OCR to obtain text from scanned documents (HP all in one)
viewing pdfs using a (much modified version ) of Wx.lib.pdfviewer

I am happy to contribute to the project if you want help - for example

I have a collection of PDFs which cover a variety of object types , fonts, encoding schemes etc and am willing to carry out testing
I have made some amendments to my my own fork of PyPDF2 which I would like to be reviewed and perhaps rolled back into the master, for example one which allows text extraction using the to_unicode objects (see Johns1c/PyPDF2)

Chris

nicksofn · 2022-04-07T20:06:07Z

I use PyPDF3 but I'll say it is almost the same as PyPDF2 and if the community is moving back to PyPDF2 then it seems logical to migrate there too.

operating system:
• IBM i 7.3
• Windows Server 2012 R2
python version:
• IBM i - 3.9.11-1
• Windows Server 2012 R2 – 3.10.4

what you use PyPDF2 for (reading/writing/manipulating PDFs? Maybe with a few details? What are core features to you? ):

Projects now and possibly in the future read PDF contents, split and merge.
Daily: take several PDFs and merge them for printing
Other: Split large PDFs and read contents of the resulting PDFs for proper categorization.

This was referenced Apr 5, 2022

pypdf is back! sfneal/PyPDF3#18

Open

PyPDF4 is dead claird/PyPDF4#103

Open

MartinThoma mentioned this issue Apr 7, 2022

Test against Python 3.6, PyPy and PyPy3 (stop testing against Python 3.3) #458

Merged

MasterOdin pinned this issue Apr 7, 2022

MasterOdin mentioned this issue Apr 7, 2022

PyPDF2 cleanup #658

Closed

MartinThoma added the Meta label Apr 9, 2022

py-pdf locked and limited conversation to collaborators Apr 9, 2022

MartinThoma converted this issue into discussion #683 Apr 9, 2022

MartinThoma unpinned this issue Apr 16, 2022

MartinThoma added the is-question Rather a question than an issue. Should usually be a Discussion instead label Mar 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[community poll] who is using PyPDF2? #659

[community poll] who is using PyPDF2? #659

MartinThoma commented Apr 5, 2022

Joshua-IRT commented Apr 6, 2022

hvbtup commented Apr 6, 2022

hvbtup commented Apr 6, 2022

pubpub-zz commented Apr 6, 2022

MartinThoma commented Apr 6, 2022

MartinThoma commented Apr 6, 2022

MasterOdin commented Apr 6, 2022

vteran93 commented Apr 6, 2022

MartinThoma commented Apr 6, 2022

LightningMan711 commented Apr 6, 2022

LightningMan711 commented Apr 6, 2022

Joshua-IRT commented Apr 6, 2022

TZanke commented Apr 7, 2022

johns1c commented Apr 7, 2022

nicksofn commented Apr 7, 2022 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

[community poll] who is using PyPDF2? #659

[community poll] who is using PyPDF2? #659

Comments

MartinThoma commented Apr 5, 2022

Joshua-IRT commented Apr 6, 2022

hvbtup commented Apr 6, 2022

hvbtup commented Apr 6, 2022

pubpub-zz commented Apr 6, 2022

MartinThoma commented Apr 6, 2022

MartinThoma commented Apr 6, 2022

MasterOdin commented Apr 6, 2022

vteran93 commented Apr 6, 2022

MartinThoma commented Apr 6, 2022

LightningMan711 commented Apr 6, 2022

LightningMan711 commented Apr 6, 2022

Joshua-IRT commented Apr 6, 2022

TZanke commented Apr 7, 2022

johns1c commented Apr 7, 2022

nicksofn commented Apr 7, 2022 • edited Loading

This issue was moved to a discussion.

nicksofn commented Apr 7, 2022 •

edited

Loading