Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

archive.extracted downloads the file even if checksums are the same #55443

Closed
Oloremo opened this issue Nov 26, 2019 · 3 comments
Closed

archive.extracted downloads the file even if checksums are the same #55443

Oloremo opened this issue Nov 26, 2019 · 3 comments
Labels
Feature new functionality including changes to functionality and code refactors, etc. ZRelease-Sodium retired label
Milestone

Comments

@Oloremo
Copy link
Contributor

Oloremo commented Nov 26, 2019

Description of Issue

Right now archive.extracted looks for cached copy of the file on a minion and cache it(download file) if it's not present in the minion local cache to do an archive list and check the archive files against the destination directory. I think it's overkill in some cases.

Example:

extract:
  archive.extracted:
    - name: /tmp/archive
    - source: http://path/to/archive.gz
    - source_hash: http://path/to/archive.gz.md5
    - source_hash_update: True
    - clean: True
    - keep_source: False

What is going to happen in the example above first time you run it:

  1. Salt will search for the http://path/to/archive.gz.hash in local minion cache. There is none.
  2. Salt will download http://path/to/archive.gz into minion cache and will create a http://path/to/archive.gz.hash file with checksum of the file.
  3. Salt will call a archive.list module to compare the output of the archive contents with the destination directory and if there is a difference(there is in our case) it will extract it.
  4. Since keep_source set to False local cached file http://path/to/archive.gz will be removed, yet http://path/to/archive.gz.hash won't be.

So far so good. Now let's see what is going to happen if we run this state once again assuming nor archive.gz nor archive.gz.md5 changed:

  1. Salt will search for the http://path/to/archive.gz.hash in local minion cache. There is such file.
  2. Salt will compare it with the hash from the http://path/to/archive.gz.md5. They will match.
  3. Salt will download http://path/to/archive.gz into minion cache since there is no such file due to keep_source: False
  4. Salt will call a archive.list module to compare the output of the archive contents with the destination directory and if there is a difference(there is none in our case).
  5. Since keep_source set to False local cached file http://path/to/archive.gz will be removed, yet http://path/to/archive.gz.hash won't be.

And this re-download of the archive.gz will be done over and over again. And now imagine it's 1000 nodes and archives are 1Gb+.

What is the alternative? keep_source: True. It solves the issue by the cost of having file cache on all minions which will never be cleaned due to #34369 not implemented. And now imagine it's 1000 nodes and archives are 1Gb+.

So right now it's a choice between re-downloading it every time OR keeping all archives in the cache without any ability to purge it efficiently.

My proposal is to add an optional argument like trust_source_hash: BOOL which will change the behavior to this one:

  1. If source_hash is provided - download it and check it against the local copy of .hash file if it exists.
  2. If they missmatch - do all the stuff as always.
  3. If they match - do nothing and exit without any change.

This will significantly improve time and traffic efficiency for someone willing to commit to relying on source_hash.

@Oloremo Oloremo changed the title archive.extracted downloads the file file even if checksums are the same archive.extracted downloads the file even if checksums are the same Nov 26, 2019
@Oloremo
Copy link
Contributor Author

Oloremo commented Nov 27, 2019

I think I can implement this in case Salt core team is ok with the idea.

@Akm0d
Copy link
Contributor

Akm0d commented Nov 28, 2019

That sounds like a reasonable approach 👍

@Akm0d Akm0d added the Feature new functionality including changes to functionality and code refactors, etc. label Nov 28, 2019
@Akm0d Akm0d added this to the Approved milestone Nov 28, 2019
@Oloremo
Copy link
Contributor Author

Oloremo commented Dec 23, 2019

Feature merged!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature new functionality including changes to functionality and code refactors, etc. ZRelease-Sodium retired label
Projects
None yet
Development

No branches or pull requests

3 participants