Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive: Generate crc32 over 16MiB chunks #6274

Conversation

nilsmeyer
Copy link
Contributor

SUMMARY

Running crc32 over the whole content of the compressed file potentially requires a lot of RAM. The crc32 function in zlib allows for calculating the checksum in chunks. This changes the code to calculate the checksum over 16 MiB chunks instead. 16 MiB is the value also used by shutil.copyfileobj().

Fixes: #6272

ISSUE TYPE
  • Bugfix Pull Request
COMPONENT NAME

archive

ADDITIONAL INFORMATION

Running crc32 over the whole content of the compressed file potentially
requires a lot of RAM. The crc32 function in zlib allows for calculating
the checksum in chunks. This changes the code to calculate the checksum
over 16 MiB chunks instead. 16 MiB is the value also used by
shutil.copyfileobj().
@ansibullbot
Copy link
Collaborator

cc @bendoh
click here for bot help

@ansibullbot ansibullbot added bug This issue/PR relates to a bug files module module new_contributor Help guide this first time contributor plugins plugin (any type) labels Apr 2, 2023
Copy link
Collaborator

@felixfontein felixfontein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution! The implementation looks good to me, I've added a comment on the changelog below.

@felixfontein felixfontein added check-before-release PR will be looked at again shortly before release and merged if possible. backport-5 labels Apr 2, 2023
nilsmeyer and others added 2 commits April 2, 2023 16:48
Change the type of change to bugfix

Co-authored-by: Felix Fontein <felix@fontein.de>
Co-authored-by: Felix Fontein <felix@fontein.de>
@github-actions
Copy link

github-actions bot commented Apr 4, 2023

Docs Build 📝

Thank you for contribution!✨

This PR has been merged and your docs changes will be incorporated when they are next published.

Copy link
Collaborator

@felixfontein felixfontein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! If nobody objects, I'll merge in ~a week.

Copy link
Collaborator

@russoz russoz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@felixfontein felixfontein removed the check-before-release PR will be looked at again shortly before release and merged if possible. label Apr 13, 2023
@felixfontein felixfontein merged commit 14b19af into ansible-collections:main Apr 13, 2023
@patchback
Copy link

patchback bot commented Apr 13, 2023

Backport to stable-5: 💚 backport PR created

✅ Backport PR branch: patchback/backports/stable-5/14b19afc9ae8dfd2e320d484229dc4c58a4b4e46/pr-6274

Backported as #6325

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request Apr 13, 2023
* archive: Generate crc32 over 16MiB chunks

Running crc32 over the whole content of the compressed file potentially
requires a lot of RAM. The crc32 function in zlib allows for calculating
the checksum in chunks. This changes the code to calculate the checksum
over 16 MiB chunks instead. 16 MiB is the value also used by
shutil.copyfileobj().

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Change the type of change to bugfix

Co-authored-by: Felix Fontein <felix@fontein.de>

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

---------

Co-authored-by: Felix Fontein <felix@fontein.de>
(cherry picked from commit 14b19af)
@patchback
Copy link

patchback bot commented Apr 13, 2023

Backport to stable-6: 💚 backport PR created

✅ Backport PR branch: patchback/backports/stable-6/14b19afc9ae8dfd2e320d484229dc4c58a4b4e46/pr-6274

Backported as #6326

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

@felixfontein
Copy link
Collaborator

@nilsmeyer thanks for your contribution!
@russoz thanks for reviewing!

patchback bot pushed a commit that referenced this pull request Apr 13, 2023
* archive: Generate crc32 over 16MiB chunks

Running crc32 over the whole content of the compressed file potentially
requires a lot of RAM. The crc32 function in zlib allows for calculating
the checksum in chunks. This changes the code to calculate the checksum
over 16 MiB chunks instead. 16 MiB is the value also used by
shutil.copyfileobj().

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Change the type of change to bugfix

Co-authored-by: Felix Fontein <felix@fontein.de>

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

---------

Co-authored-by: Felix Fontein <felix@fontein.de>
(cherry picked from commit 14b19af)
@nilsmeyer
Copy link
Contributor Author

nilsmeyer commented Apr 13, 2023

@felixfontein Thanks for your valuable input and review. 10/10, would pull request again ;)

felixfontein pushed a commit that referenced this pull request Apr 13, 2023
…6MiB chunks (#6325)

archive: Generate crc32 over 16MiB chunks (#6274)

* archive: Generate crc32 over 16MiB chunks

Running crc32 over the whole content of the compressed file potentially
requires a lot of RAM. The crc32 function in zlib allows for calculating
the checksum in chunks. This changes the code to calculate the checksum
over 16 MiB chunks instead. 16 MiB is the value also used by
shutil.copyfileobj().

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Change the type of change to bugfix

Co-authored-by: Felix Fontein <felix@fontein.de>

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

---------

Co-authored-by: Felix Fontein <felix@fontein.de>
(cherry picked from commit 14b19af)

Co-authored-by: Nils Meyer <nils@nm.cx>
felixfontein pushed a commit that referenced this pull request Apr 13, 2023
…6MiB chunks (#6326)

archive: Generate crc32 over 16MiB chunks (#6274)

* archive: Generate crc32 over 16MiB chunks

Running crc32 over the whole content of the compressed file potentially
requires a lot of RAM. The crc32 function in zlib allows for calculating
the checksum in chunks. This changes the code to calculate the checksum
over 16 MiB chunks instead. 16 MiB is the value also used by
shutil.copyfileobj().

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Change the type of change to bugfix

Co-authored-by: Felix Fontein <felix@fontein.de>

* Update changelogs/fragments/6199-archive-generate-checksum-in-chunks.yml

Co-authored-by: Felix Fontein <felix@fontein.de>

---------

Co-authored-by: Felix Fontein <felix@fontein.de>
(cherry picked from commit 14b19af)

Co-authored-by: Nils Meyer <nils@nm.cx>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug This issue/PR relates to a bug files module module new_contributor Help guide this first time contributor plugins plugin (any type)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

archive: compressing single file uses a lot of RAM in calculating the checksum
4 participants