-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
#3 Using PdfReader causes a crash #2836
Labels
is-robustness-issue
From a users perspective, this is about robustness
workflow-text-extraction
From a users perspective, text extraction is the affected feature/workflow
Comments
stefan6419846
added
workflow-text-extraction
From a users perspective, text extraction is the affected feature/workflow
is-uncaught-exception
Use this label only for issues caused by broken PDF documents that cannot be recovered.
is-robustness-issue
From a users perspective, this is about robustness
and removed
is-uncaught-exception
Use this label only for issues caused by broken PDF documents that cannot be recovered.
labels
Sep 6, 2024
@macdeport |
fp='/Users/alain/Documents/Perso/Alain/SDC35rM/sdc35-24-4!4-240905.pdf'
#--------------------------
def pdf_text_test(pdf_path):
"""
(06/09/24 13:18:36)
"""
#https://pypdf.readthedocs.io/en/stable/
#https://pypdf.readthedocs.io/en/stable/user/metadata.html
from pypdf import PdfReader
reader = PdfReader(pdf_path)
#txt=''
#for page in reader.pages:
# txt += page.extract_text() # <= PB Crash
print(reader.pages[0])
(reader.pages[0]).remove_text()
return() # pdf_text()
#--------------------------
pdf_text_test(fp)
|
oups : remove_text() applies to the full pdf. so the code should be like (from the top of my head):
check the file : no sensitive data should be in |
Two pieces of good news:
|
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
is-robustness-issue
From a users perspective, this is about robustness
workflow-text-extraction
From a users perspective, text extraction is the affected feature/workflow
Environment
Which environment were you using when you encountered the problem?
Code + PDF
This is a minimal, complete example that shows the issue:
Sorry I can't share this PDF with private information.
Traceback
This is the complete traceback I see:
The text was updated successfully, but these errors were encountered: