Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preserving timestamps in conan cache #2729

Closed
progician opened this issue Apr 9, 2018 · 12 comments
Closed

Preserving timestamps in conan cache #2729

progician opened this issue Apr 9, 2018 · 12 comments

Comments

@progician
Copy link
Contributor

Using conan 1.2.1 at the moment

There are some other issues related to the broken/interrupted installation process. A way to get around this issue would be for us to delete the conan cache entirely and re-install it in every build on the CI. It's unfortunate extra 3 minutes for us, but until something comes along with the other issues, it looks like the best solution.

But then another problem I've run into, and this is that of the timestamps. If I wipe the cache, and reinstall all the dependencies in it, it will change the timestamps of all the binaries and headers of the dependencies. We have a couple of dependencies that are so widely used that basically this means an almost complete rebuild of the system, which takes considerable amount of time.

I understand that some regard to have clean builds for all CI builds as a best practice, but when it becomes an hour for every commit to build, that makes verification of bugs and features really cumbersome.

The tar command does have a way to deal with this, at least on Linux:

--atime-preserve[=METHOD]
preserve access times on dumped files, either by restoring the times after reading (METHOD='replace'; default) or by not setting the times in the first place (METHOD='system')

So I wonder if this could be done (perhaps optionally, due to possible problems with timestamp preservation on different systems) in conan as well. We are using ninja as a build tool and it relies exclusively on timestamps, which makes it really fast, but then it fails to realise that nothing has really changed by just wiping and reinstalling the conan cache.

@lasote
Copy link
Contributor

lasote commented Apr 10, 2018

First of all, do you have more information about the broken/interrupted installation process? Maybe it is something we can improve. Are you killing processes or is the user canceling (ctrl+c) it?

EDIT: Now I see your comment here: #1003

About the timestamps, I understand your concern, I have to talk with @memsharded in case there could be an issue preserving the timestamps but it sounds good to me.

@lasote
Copy link
Contributor

lasote commented Apr 10, 2018

I just talked with @memsharded and remembered why the timestamp is changing.
We clean the timestamp of the files in the .tgz, the files inside the tgz have no timestamp. The reason is to generate the same compressed artifacts for the same origin artifacts.

So when we extract the tgz observed a bug in some build systems related to having a 0 timestamp, and we touch the files. That is why the timestamp of the libraries change every time they are installed.

But typically, the tgz of a binary package will be generated only once (in a CI), maybe we could opt-out the timestamp (only for package tgz, not the recipe sources) cleaning before creating the tgz files. Could it work for you?

@progician
Copy link
Contributor Author

Sounds good to me.

As for the interrupted build issue, it's pretty much what is described in the #1003 ticket.

@memsharded
Copy link
Member

But typically, the tgz of a binary package will be generated only once (in a CI), maybe we could opt-out the timestamp (only for package tgz, not the recipe sources) cleaning before creating the tgz files.

Yes, might be doable as opt-in.

@db4
Copy link
Contributor

db4 commented Oct 11, 2022

Hi @lasote @memsharded! I'm bringing up this old issue because like the original reporter I would like to save timestamps in conan_package.tgz and restore them when files are decompressed into the local cache. I repackage Conan-build artifacts into OS-level installer (.msi or .deb), so having corrects timestamps is a must for me. Can you explain

We clean the timestamp of the files in the .tgz, the files inside the tgz have no timestamp. The reason is to generate the same compressed artifacts for the same origin artifacts.

Why this would be a problem if we store (the same) modification times in .tgz? I ask because I've made some local modifications to preserve timestamps and they seems not to break anything. If you are interested I can submit a PR.

@memsharded memsharded added this to the 2.0.0-beta5 milestone Oct 11, 2022
@db4
Copy link
Contributor

db4 commented Oct 14, 2022

@memsharded Thanks for considering this for Conan 2.0, but why not include it into 1.x? My implementation is quite simple but still fully compatible with older packages without timestamps. If you are still afraid of breaking something, this can be turned off by default.

@jcar87
Copy link
Contributor

jcar87 commented Nov 8, 2022

Just doing a quick comparison of how this is handled by other package managers;

Debian:

root@27e3d03e0baf:/# date
Tue Nov  8 15:59:50 UTC 2022
root@27e3d03e0baf:/# stat /usr/lib/aarch64-linux-gnu/libprotobuf.so.17.0.0
  File: /usr/lib/aarch64-linux-gnu/libprotobuf.so.17.0.0
Access: 2022-11-08 15:59:43.291704007 +0000
Modify: 2020-02-26 17:10:58.000000000 +0000
Change: 2022-11-08 15:59:43.261704007 +0000
 Birth: -

The mtime matches the last time the file contents were modified, while atime and ctime match the time the package was installed.

pip

(venv) root@27e3d03e0baf:~/venv# date
Tue Nov  8 16:09:04 UTC 2022
(venv) root@27e3d03e0baf:~/venv# stat ./lib/python3.8/site-packages/numpy/core/include/numpy/utils.h
  File: ./lib/python3.8/site-packages/numpy/core/include/numpy/utils.h
Access: 2022-11-08 16:08:41.916526079 +0000
Modify: 2022-11-08 16:08:40.962526082 +0000
Change: 2022-11-08 16:08:41.916526006 +0000

Timestamps in all cases seem to match the package installation time, and not when the file contents were last modified.

Homebrew

% stat ./Cellar/openssl@1.1/1.1.1s/include/openssl/crypto.h
16777231 13571955 -rw-r--r-- 1 xxxx yyyy 0 17239 "Nov  8 16:12:17 2022" "Nov  1 12:36:10 2022" "Nov  8 16:12:18 2022" "Nov  1 12:36:10 2022" 4096 40 0 ./Cellar/openssl@1.1/1.1.1s/include/openssl/crypto.h

With the dates in the following order: atime, mtime, ctime, birthtime - with mtime being prior to the installation date.

Headers inside Xcode.app

stat -f %Sm  /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/CoreFoundation.framework/Headers 

 "Sep 30 10:44:55 2022" "Sep 30 10:44:55 2022" "Nov  7 14:09:42 2022" "Sep 30 10:44:55 2022" 4096 16 0x20 CFArray.h

In this case all values are prior to the installation date, except for ctime (which is for modification of file metadata, but not file contents).

Other than pip being the odd one out, all appear to have an mtime that is prior to the package installation date. Whether this is the date when the file contents were last modified, versus when they were originally packaged, I'm not sure.

@memsharded
Copy link
Member

Thanks for the thorough analysis @jcar87 !

Then, it would seem that we should be relatively safe with #12378, which does exactly that, package things with their original times, but usual decompression in a new machine give creation+access time to the current one, while modified time is preserved the original one.

@jcar87
Copy link
Contributor

jcar87 commented Nov 8, 2022

As a side note, my understanding is that the timestamp we should preserve is mtime, correct?

The original issue by @progician mentions an issue when the Conan cache is wipe (and restore), the tiemstamps change in a way that cause files to be rebuilt - I'm assuming that this causes rebuilds of a project that is external to the Conan cache and is already configured and already has some compiled translation units.

However, I'm not sure why the suggestion of applying the equivalent of --atime-preserve would matter here - atime is for access time and it also includes read access. This particular attribute should not cause rebuilds, it's probably mtime that's at play here. If tools such as Ninja or GNU make also pay attention to the other values - that's probably correct.

I believe:

  • mtime should be preserved, if it means "the last time the contents were modified"
  • ctime should be determined at unpacking / install time: as it involves modification of file metadata
  • atime is file access and should be irrelevant to build workflows
  • birthtime - should also not be relevant in build workflows (and I'm not sure is generally used on Linux, but I could be wrong)

@db4
Copy link
Contributor

db4 commented Nov 9, 2022

As for me, preserving just mtime is quite enough. Is there any chance to have it in 1.x as well?

@memsharded
Copy link
Member

As for me, preserving just mtime is quite enough. Is there any chance to have it in 1.x as well?

I am afraid it will not be possible, we are already in the last sprint for 2.0, and this is not an obvious change to do even if using a configurable opt-in, seems too much.

@memsharded
Copy link
Member

Closed by #12378 for next beta.5

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants