Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot generate csv files larger than ~1 GB #1687

Open
1 of 5 tasks
BeritJanssen opened this issue Nov 6, 2024 · 2 comments
Open
1 of 5 tasks

Cannot generate csv files larger than ~1 GB #1687

BeritJanssen opened this issue Nov 6, 2024 · 2 comments
Labels
bug something isn't working right

Comments

@BeritJanssen
Copy link
Contributor

BeritJanssen commented Nov 6, 2024

What went wrong?

When trying to download a larger dataset from the People & Parliament production server, I noticed that it's not possible to generate .csv files larger than ca. 1 GB. The downloads overview will show "error" after a while, and the celery logs show a SIGKILL (9). Probably caused by too much memory usage, though I couldn't see memory & CPU usage going up beyond 30% though.

What did you expect to happen?

I thought I could give researchers download rights beyond 1million documents, and they would be able to harvest large amounts of data in one go.

Screenshot

No response

Where did you find the bug?

Version

No response

Steps to reproduce

Attempt a download of >1million documents from parliament-netherlands with a user who has rights to do so.

@BeritJanssen BeritJanssen added the bug something isn't working right label Nov 6, 2024
@lukavdplas
Copy link
Contributor

If a download of this size were generated without issue, the user would (probably) be unable to download it due to an Apache timeout. See #1474 - our decision for now was that we don't support downloads of this size.

That said, this describes an error in generating the file, which is a separate issue.

@BeritJanssen
Copy link
Contributor Author

It is probably a memory issue - we would need to generate large files in batches, as well as serve them in batches (see #1474).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug something isn't working right
Projects
None yet
Development

No branches or pull requests

2 participants