Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recipe.run("tests/fixtures/papermage.pdf") process keep being killed. OOM? #79

Open
josephj1o4e1 opened this issue Apr 11, 2024 · 3 comments

Comments

@josephj1o4e1
Copy link

josephj1o4e1 commented Apr 11, 2024

My setting Windows10+WSL can't run:

doc = recipe.run("tests/fixtures/papermage.pdf")
It seems to be downloading the 13 pages 100% normally, but then either killed off as a python script,
or when in quick_start.ipynb, crashes my VScode.
It works for test-uu.pdf since it only have 1 page.
But it crashes every time on papermage.pdf for 13 pages. This issue seems to be related to out-of-memory?

I'm not sure if this could be resolved.
Should there be a memory limit for using this package that I should be aware of?
I think 13 pages is not that much for most pdfs.
Is there another way around like processing it single page by single page?
However, I'm worried that this is not a good method since it wouldn't be able to fully utilize the features such as doc.pages.

@josephj1o4e1 josephj1o4e1 changed the title recipe.run("tests/fixtures/papermage.pdf") process keep being killed recipe.run("tests/fixtures/papermage.pdf") process keep being killed. OOM? Apr 11, 2024
@kyleclo
Copy link
Collaborator

kyleclo commented Apr 11, 2024

Sorry, I'm not familiar with what would be required to run for Windows10+WSL. Do you want to give this a try:

from papermage.recipes import MinimalTextOnlyRecipe

recipe = MinimalTextOnlyRecipe()
doc = recipe.run("tests/fixtures/1903.10676.pdf")

this is a single page PDF and the recipe is very very minimal.

@josephj1o4e1
Copy link
Author

Yes, I tested on "test-uu.pdf" before for a single page case and it worked.
It worked for the pdf you suggested as well.

May I ask if there's a typical setting for using this package (RAM/OS)?
I'm a windows user and it had the OSerror issue that appeared in previous issues .
I switched to WSL and it worked fine only for less page pdfs.

Thanks.

@kyleclo
Copy link
Collaborator

kyleclo commented Apr 12, 2024

Ahh, unfortunately I can test this for MacOS Ventura (m1 macbook), but I don't have ability to test this for windows; this may be something you'll have to work out.

As for memory, profiling it on my M1, I'm seeing CoreRecipe requires 2.2gb for a single page pdf, 2.4Gb memory for a 12 page pdf. And MinimalTextOnlyRecipe requires 290mb for a single page pdf, 400mb for a 12 page pdf.

hope that helps?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants