Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[community poll] who is using PyPDF2? #659

Closed
MartinThoma opened this issue Apr 5, 2022 · 15 comments
Closed

[community poll] who is using PyPDF2? #659

MartinThoma opened this issue Apr 5, 2022 · 15 comments
Labels
is-question Rather a question than an issue. Should usually be a Discussion instead Meta

Comments

@MartinThoma
Copy link
Member

I'll try to revive PyPDF2 in the next weeks. One part that is important to me is not to break existing / working features. Hence I'll introduce CI and will check the unit tests we have.

In order to test properly / think about a deprecation strategy, I need to have a grasp of who is using PyPDF2.

Please share:

  1. operating system
  2. python version
  3. what you use PyPDF2 for (reading/writing/manipulating PDFs? Maybe with a few details? What are core features to you? )
@Joshua-IRT
Copy link

  1. OS is RHEL 7 and 8

  2. Python 2.7.5 (blame RHEL7) and 3.6+

  3. I am using PyPDF2 (with a bunch of my own hacks) for populating PDF forms for both internal and client-facing purposes. Being able to set things like form field read/write, text colour (some of the fields are colour-coded based on a priority ranking), etc. are key things we use. I think I had to extend PyPDF2 to make this work, so I may be able to add these as a PR in future.

@hvbtup
Copy link

hvbtup commented Apr 6, 2022

I'm using it on Windows and Linux, with Python 2.7 and with Python 3.6 and 3.10.
I made some little changes in my fork in order to support to an additional "background" argument in the various mergePage methods. The effect is that in mergePage, the new page can be used as a watermark ("under the existing content").

I noticed a bug in my fork just a few minutes ago.

Feel free to include my changes here under the existing license if you like (but please wait until I've fixed that bug, which will happen next week).

I'm using PyPDF for two purposes:
a) concatenating PDFs
b) merging pages
We are also using commercial software for these purposes, but for "smaller" installations PyPDF2 suffices.

@hvbtup
Copy link

hvbtup commented Apr 6, 2022

I noticed a bug in my fork just a few minutes ago.

Fixed.

@pubpub-zz
Copy link
Collaborator

I've used pypdf4 with some fixes, evolutions.
I used it also to extract/merge comments

@MartinThoma
Copy link
Member Author

I've just added a test for extracting comments from PDFs. Thank you @pubpub-zz :-)

@MartinThoma
Copy link
Member Author

Python 2.7.5 (blame RHEL7) and 3.6

@Joshua-IRT Uh, I was hoping that 3.6 and older isn't used any longer 🙈 Is pyenv an option for you?

@MasterOdin
Copy link
Member

Does the code in master still work on 2.7 and 3.3+ per what's documented in tox? I think fine to ignore 2.6 at least as it's definitely long dead compared to 2.7. Dropping those versions is a major version increase, which is fine, but might be good to make a 1.x release that is compatible with the code that has been merged since the last release if possible.

@vteran93
Copy link

vteran93 commented Apr 6, 2022

Hi,

AWS Lambda with Serverless framework
python 3.8.10 using poetry
reading/writing/manipulating PDFs

@MartinThoma
Copy link
Member Author

I would drop official support for Python 3.5 and older. Most major projects did that a while ago (Django, Flask, Pandas, ...)

My plan was to make one or two minor releases with many of the current PRs. Then deprecate 3.6 and older. If there are still many 3.6 users who cannot switch, maybe we can keep the support there. But I would like to avoid that.

What do you think @MasterOdin ?

@LightningMan711
Copy link

Windows whatever the current is (10 I think)
Python 3.8.2 (mostly; I still have code that needs 2.7)
Reading pages, manipulating bookmarks.

@LightningMan711
Copy link

I also must say that I am glad this project is being revived and updated.

@Joshua-IRT
Copy link

Python 2.7.5 (blame RHEL7) and 3.6

@Joshua-IRT Uh, I was hoping that 3.6 and older isn't used any longer 🙈 Is pyenv an option for you?

Sorry, Python 3.6.8 is the main version on RHEL8. I have not tried pyenv on our systems yet, but can certainly look into it.

Since Python 3.6 is end-of-life, though, I can certainly understand the desire not to support it (and agree with it). I guess implement what is easy to do in 3.6, make a final release that supports it, then deprecate. I can continue to use my hacked-together version until I get around to setting up pyenv, at least.

@TZanke
Copy link

TZanke commented Apr 7, 2022

openSUSE Leap 15.x
python 3.6.x

  • reading
  • writing
  • merging
  • edit&create metadata with PyPDF2.generic.X

Nice to see progress again, we started using PyMuPDF for new code (if possible) cause PyPDF2's lack of fixed bugs.

@johns1c
Copy link

johns1c commented Apr 7, 2022

Windows 11
Python 3.8

  1. extracting text from received documents
  2. extracting jpeg images and then running OCR to obtain text from scanned documents (HP all in one)
  3. viewing pdfs using a (much modified version ) of Wx.lib.pdfviewer

I am happy to contribute to the project if you want help - for example

  1. I have a collection of PDFs which cover a variety of object types , fonts, encoding schemes etc and am willing to carry out testing

  2. I have made some amendments to my my own fork of PyPDF2 which I would like to be reviewed and perhaps rolled back into the master, for example one which allows text extraction using the to_unicode objects (see Johns1c/PyPDF2)

Chris

@MasterOdin MasterOdin pinned this issue Apr 7, 2022
@nicksofn
Copy link

nicksofn commented Apr 7, 2022

I use PyPDF3 but I'll say it is almost the same as PyPDF2 and if the community is moving back to PyPDF2 then it seems logical to migrate there too.

operating system:
• IBM i 7.3
• Windows Server 2012 R2
python version:
• IBM i - 3.9.11-1
• Windows Server 2012 R2 – 3.10.4

what you use PyPDF2 for (reading/writing/manipulating PDFs? Maybe with a few details? What are core features to you? ):

Projects now and possibly in the future read PDF contents, split and merge.
Daily: take several PDFs and merge them for printing
Other: Split large PDFs and read contents of the resulting PDFs for proper categorization.

@py-pdf py-pdf locked and limited conversation to collaborators Apr 9, 2022
@MartinThoma MartinThoma converted this issue into discussion #683 Apr 9, 2022
@MartinThoma MartinThoma unpinned this issue Apr 16, 2022
@MartinThoma MartinThoma added the is-question Rather a question than an issue. Should usually be a Discussion instead label Mar 25, 2023

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
is-question Rather a question than an issue. Should usually be a Discussion instead Meta
Projects
None yet
Development

No branches or pull requests

10 participants