Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option for pdf loader to create one document per page #361

Merged
merged 3 commits into from
Mar 28, 2023

Conversation

nfcampos
Copy link
Collaborator

@nfcampos nfcampos commented Mar 17, 2023

Closes #298

Thanks to JohnWick on Discord for this contribution!

This newer version of pdfjs does not work on Node 16. So to release this we really should drop support for Node 16.

@nfcampos nfcampos force-pushed the nc/pdf-loader-split-pages branch 2 times, most recently from c30c2c2 to 7fdff5a Compare March 17, 2023 13:02
@nfcampos nfcampos requested a review from hwchase17 March 17, 2023 13:15
@mayooear
Copy link
Contributor

I tested the loader on a 50-page PDF doc and it works as expected: splitting the documents into pages.

@nfcampos nfcampos self-assigned this Mar 27, 2023
@nfcampos nfcampos force-pushed the nc/pdf-loader-split-pages branch from 7fdff5a to 4c71c14 Compare March 28, 2023 08:50
@vercel
Copy link

vercel bot commented Mar 28, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
langchainjs-docs ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Mar 28, 2023 at 8:55AM (UTC)

@titocosta
Copy link

I receive this error.

Error: The browser/environment lacks native support for critical functionality used by the PDF.js library (e.g. `Path2D` and/or `ReadableStream`); please use a `legacy`-build instead.

I have already added path2d-polyfill.
Maybe can be fixed by changing import to "pdfjs-dist/legacy/build/pdf.js" ?

@nfcampos
Copy link
Collaborator Author

nfcampos commented Apr 5, 2023

@titocosta I've opened a PR that lets you pass pdfjs as an argument, see #622

jacoblee93 pushed a commit that referenced this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add splitPages parameter to PDFLoader, add page to metadata
3 participants