-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: PDFMinerToDocument
convert function - adding double new lines between each container_text
so that passages can be detected.
#8729
Conversation
Pull Request Test Coverage Report for Build 12829355379Details
💛 - Coveralls |
PDFMinerToDocument
convert function - adding double new lines between each container_text
so that passages can be detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@anakin87 @julian-risch do you also want to have a quick look on this one? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good.
I would take the opportunity to also change the following line:
pdf_reader = extract_pages(io.BytesIO(bytestream.data), laparams=self.layout_params) |
pdf_reader
is misleading here: these are pages
Related Issues
Proposed Changes:
How did you test it?
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
and added!
in case the PR includes breaking changes.