Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add page metadata on PDFMinerLoader #12277

Merged
merged 8 commits into from
Nov 1, 2023

Conversation

blue-hope
Copy link
Contributor

@vercel
Copy link

vercel bot commented Oct 25, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Visit Preview Nov 1, 2023 2:20pm

@dosubot dosubot bot added Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases labels Oct 25, 2023
@blue-hope blue-hope closed this Oct 25, 2023
@blue-hope blue-hope reopened this Oct 25, 2023
Copy link
Contributor

@hwchase17 hwchase17 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add this is a toggle someone can set? defaulting to old behavior. that way it wont suddently change behavior on folks

@eyurtsev
Copy link
Collaborator

@blue-hope this looks good, but we'd want to add a parameter in the init controlled by end users to make sure that we can retain backwards compatibility for this parser

@blue-hope
Copy link
Contributor Author

blue-hope commented Oct 26, 2023

@hwchase17 @eyurtsev I added the feature flag load_per_pages (we can rename this parameter name) for backwards compatibility

@blue-hope blue-hope requested a review from hwchase17 October 26, 2023 12:18
@blue-hope blue-hope requested a review from eyurtsev October 31, 2023 08:46
Copy link
Collaborator

@eyurtsev eyurtsev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good to merge after tests pass

@eyurtsev eyurtsev added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Nov 1, 2023
@eyurtsev eyurtsev merged commit b1954aa into langchain-ai:master Nov 1, 2023
19 checks passed
xieqihui pushed a commit to xieqihui/langchain that referenced this pull request Nov 21, 2023
- **Description:** langchain-ai#12273 's suggestion PR
Like other PDFLoader, loading pdf per each page and giving page
metadata.
  - **Issue:** langchain-ai#12273 
  - **Twitter handle:** @blue0_0hope

---------

Co-authored-by: Eugene Yurtsev <eyurtsev@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Ɑ: doc loader Related to document loader module (not documentation) 🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants