-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add caching for submodules and CMake to Windows CI. #11906
Conversation
Also use `$(which ccache)` to make sure that the latest install of ccache is used.
Using two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (postsubmit workflows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given you were having issues with GitHub's cache action compression, you might want to fiddle with https://ccache.dev/manual/4.7.4.html#config_compression
I tried 5 (see commits), but observed no change in cache size. Could try higher levels or recompression.
|
Ah I thought GitHub was doing its own compression? I was suggesting that perhaps it would be better to turn off ccache compression entirely, since GitHub is also compressing it |
GitHub does its own compression yeah. I think ccache compresses as it writes, which might be faster than compressing after... I can experiment a bit more. |
Ah, but we do need to be careful about disk size during the build if not compressing: https://github.com/iree-org/iree/actions/runs/3963413155/jobs/6791208036
(note the default max cache size on disk and the original size) |
b5add67
to
d48685b
Compare
d48685b
to
d475871
Compare
key: ccache_all_windows_${{ github.sha }} | ||
restore-keys: ccache_all_windows | ||
# Fetch dependencies. | ||
# TODO(scotttodd): Move these into a Docker image. | ||
- name: "Updating git submodules" | ||
run: git submodule update --init --jobs 8 --depth 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's possible that the magic 8 here might be different on windows. It's just something I came up with empirically
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... even with a very recent cache hit for .git/modules/
, https://github.com/iree-org/iree/actions/runs/3971047586/jobs/6807491522 took 9 minutes ... most of that to fetch llvm-project :/
.github/workflows/ci.yml
Outdated
IREE_WRITE_LOCAL_CCACHE: 1 | ||
# IREE_WRITE_LOCAL_CCACHE: ${{ needs.setup.outputs.write-caches }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Presumably you're going to flip this back? Reminder to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah. I'm troubleshooting the cache right now. A build earlier got a cache hit on the 2GB data but then had 100% cache misses... creating a 4GB cache entry. Assuming something like cache compression changes invalidated the cache (and not cosmic rays / other compiler settings), it might still make sense to just set the max cache size in ccache to 2-2.5GB.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is weirddd...
I changed a few comments and turned off IREE_WRITE_LOCAL_CCACHE
and got
+ ccache --show-stats
Cacheable calls: 4007 / 4053 (98.87%)
Hits: 1030 / 4007 (25.71%)
Direct: 1025 / 1030 (99.51%)
Preprocessed: 5 / 1030 ( 0.49%)
Misses: 2977 / 4007 (74.29%)
Uncacheable calls: 46 / 4053 ( 1.13%)
Local storage:
Cache size (GB): 2.42 / 3.00 (80.69%)
(logs: https://github.com/iree-org/iree/actions/runs/3972205901/jobs/6809871352)
I'll try with IREE_WRITE_LOCAL_CCACHE
on now... this is tricky to debug, but it will be even harder after it is merged.
There are some MSVC flags suggested at https://github.com/ccache/ccache/wiki/MS-Visual-Studio that might help too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a significant number of cache hits are from the build compiling the same thing multiple times?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same behavior with writes enabled
(logs: https://github.com/iree-org/iree/actions/runs/3972319654/jobs/6810116754)
+ ccache --show-stats
Cacheable calls: 3996 / 4053 (98.59%)
Hits: 1030 / 3996 (25.78%)
Direct: 1024 / 1030 (99.42%)
Preprocessed: 6 / 1030 ( 0.58%)
Misses: 2966 / 3996 (74.22%)
Uncacheable calls: 57 / 4053 ( 1.41%)
Local storage:
Cache size (GB): 2.75 / 3.00 (91.75%)
Cleanups: 41
I'm tempted to land this and watch the postsubmit stats. Presubmit will probably be too limited by cache misses given the size of the cache and low hit rates even with no changes...
This reverts commit f1626af.
Missed this on #11906 Some outputs are using true/false, while others are using 0/1.
This uses GitHub's [actions/cache](https://github.com/actions/cache) together with [ccache](https://ccache.dev/) to speed up our `build_test_all_windows` GitHub Actions CI job. I also tested caching with the `build_test_runtime_windows` job, but benefits were negligible there. We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The [actions/cache](https://github.com/actions/cache) action can be used to similarly store a remote cache, though with more steps in the job definitions. Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the `.git/modules/` directory with an initial set of files, at least until git or [actions/checkout](https://github.com/actions/checkout) improves behavior on Windows. This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (currently only when running workflows on postsubmit). Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using `ccache_all_windows_${{ github.sha }}` as the primary key for immutable cache entries, then `ccache_all_windows` as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches. Progress on iree-org#11009. Once this lands we can probably move the `build_test_all_windows` job to run on presubmit. ## Experimental results: Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed. ### `build_test_runtime_windows` cache size: 27MB (git) + 59MB (ccache) = 86MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130) | 4m 20s | 35s | 1m 13s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857) | 5m 26s | 39s | 1m50s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992) | 4m 5s | 20s | 42s ### `build_test_all_windows` cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967) | 31m 16s | 5m 58s | 22m 46s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752) | 30m 8s | 6m 30s | 14m 55s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849) | 14m 9s | 1m15s | 4m34s Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.
) Missed this on iree-org#11906 Some outputs are using true/false, while others are using 0/1.
This uses GitHub's [actions/cache](https://github.com/actions/cache) together with [ccache](https://ccache.dev/) to speed up our `build_test_all_windows` GitHub Actions CI job. I also tested caching with the `build_test_runtime_windows` job, but benefits were negligible there. We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The [actions/cache](https://github.com/actions/cache) action can be used to similarly store a remote cache, though with more steps in the job definitions. Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the `.git/modules/` directory with an initial set of files, at least until git or [actions/checkout](https://github.com/actions/checkout) improves behavior on Windows. This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (currently only when running workflows on postsubmit). Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using `ccache_all_windows_${{ github.sha }}` as the primary key for immutable cache entries, then `ccache_all_windows` as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches. Progress on #11009. Once this lands we can probably move the `build_test_all_windows` job to run on presubmit. ## Experimental results: Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed. ### `build_test_runtime_windows` cache size: 27MB (git) + 59MB (ccache) = 86MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130) | 4m 20s | 35s | 1m 13s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857) | 5m 26s | 39s | 1m50s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992) | 4m 5s | 20s | 42s ### `build_test_all_windows` cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967) | 31m 16s | 5m 58s | 22m 46s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752) | 30m 8s | 6m 30s | 14m 55s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849) | 14m 9s | 1m15s | 4m34s Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.
Missed this on #11906 Some outputs are using true/false, while others are using 0/1.
This uses GitHub's [actions/cache](https://github.com/actions/cache) together with [ccache](https://ccache.dev/) to speed up our `build_test_all_windows` GitHub Actions CI job. I also tested caching with the `build_test_runtime_windows` job, but benefits were negligible there. We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The [actions/cache](https://github.com/actions/cache) action can be used to similarly store a remote cache, though with more steps in the job definitions. Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the `.git/modules/` directory with an initial set of files, at least until git or [actions/checkout](https://github.com/actions/checkout) improves behavior on Windows. This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (currently only when running workflows on postsubmit). Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using `ccache_all_windows_${{ github.sha }}` as the primary key for immutable cache entries, then `ccache_all_windows` as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches. Progress on iree-org#11009. Once this lands we can probably move the `build_test_all_windows` job to run on presubmit. ## Experimental results: Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed. ### `build_test_runtime_windows` cache size: 27MB (git) + 59MB (ccache) = 86MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130) | 4m 20s | 35s | 1m 13s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857) | 5m 26s | 39s | 1m50s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992) | 4m 5s | 20s | 42s ### `build_test_all_windows` cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967) | 31m 16s | 5m 58s | 22m 46s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752) | 30m 8s | 6m 30s | 14m 55s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849) | 14m 9s | 1m15s | 4m34s Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.
) Missed this on iree-org#11906 Some outputs are using true/false, while others are using 0/1.
This uses GitHub's actions/cache together with ccache to speed up our
build_test_all_windows
GitHub Actions CI job. I also tested caching with thebuild_test_runtime_windows
job, but benefits were negligible there.We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The actions/cache action can be used to similarly store a remote cache, though with more steps in the job definitions.
Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the
.git/modules/
directory with an initial set of files, at least until git or actions/checkout improves behavior on Windows.This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when
needs.setup.outputs.write-caches
is true (currently only when running workflows on postsubmit).Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using
ccache_all_windows_${{ github.sha }}
as the primary key for immutable cache entries, thenccache_all_windows
as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches.Progress on #11009. Once this lands we can probably move the
build_test_all_windows
job to run on presubmit.Experimental results:
Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed.
build_test_runtime_windows
cache size: 27MB (git) + 59MB (ccache) = 86MB (total)
build_test_all_windows
cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total)
Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.