Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add caching for submodules and CMake to Windows CI. #11906

Merged
merged 24 commits into from
Jan 23, 2023

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Jan 19, 2023

This uses GitHub's actions/cache together with ccache to speed up our build_test_all_windows GitHub Actions CI job. I also tested caching with the build_test_runtime_windows job, but benefits were negligible there.

We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The actions/cache action can be used to similarly store a remote cache, though with more steps in the job definitions.

Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the .git/modules/ directory with an initial set of files, at least until git or actions/checkout improves behavior on Windows.

This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when needs.setup.outputs.write-caches is true (currently only when running workflows on postsubmit).

Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using ccache_all_windows_${{ github.sha }} as the primary key for immutable cache entries, then ccache_all_windows as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches.

Progress on #11009. Once this lands we can probably move the build_test_all_windows job to run on presubmit.

Experimental results:

Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed.

build_test_runtime_windows

cache size: 27MB (git) + 59MB (ccache) = 86MB (total)

Configuration Logs total time submodule checkout timing build timing
baseline logs 4m 20s 35s 1m 13s
new (cache miss) logs 5m 26s 39s 1m50s
new (cache hit) logs 4m 5s 20s 42s

build_test_all_windows

cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total)

Configuration Logs total time submodule checkout timing build timing
baseline logs 31m 16s 5m 58s 22m 46s
new (cache miss) logs 30m 8s 6m 30s 14m 55s
new (cache hit) logs 14m 9s 1m15s 4m34s

Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.

Also use `$(which ccache)` to make sure that the latest install of ccache is used.
Using two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (postsubmit workflows).
@ScottTodd ScottTodd added infrastructure Relating to build systems, CI, or testing platform/windows 🚪 Windows-specific build, execution, benchmarking, and deployment labels Jan 19, 2023
@ScottTodd ScottTodd marked this pull request as ready for review January 20, 2023 00:18
Copy link
Contributor

@GMNGeoffrey GMNGeoffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given you were having issues with GitHub's cache action compression, you might want to fiddle with https://ccache.dev/manual/4.7.4.html#config_compression

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
build_tools/cmake/setup_ccache.sh Outdated Show resolved Hide resolved
build_tools/cmake/build_runtime_small.sh Outdated Show resolved Hide resolved
@ScottTodd
Copy link
Member Author

Given you were having issues with GitHub's cache action compression, you might want to fiddle with https://ccache.dev/manual/4.7.4.html#config_compression

I tried 5 (see commits), but observed no change in cache size. Could try higher levels or recompression.

As a rule of thumb, use level 5 or lower since higher levels may slow down compilations noticeably. Higher levels are however useful when recompressing the cache with command line option -X/--recompress.

@GMNGeoffrey
Copy link
Contributor

Given you were having issues with GitHub's cache action compression, you might want to fiddle with https://ccache.dev/manual/4.7.4.html#config_compression

I tried 5 (see commits), but observed no change in cache size. Could try higher levels or recompression.

As a rule of thumb, use level 5 or lower since higher levels may slow down compilations noticeably. Higher levels are however useful when recompressing the cache with command line option -X/--recompress.

Ah I thought GitHub was doing its own compression? I was suggesting that perhaps it would be better to turn off ccache compression entirely, since GitHub is also compressing it

@ScottTodd
Copy link
Member Author

Given you were having issues with GitHub's cache action compression, you might want to fiddle with https://ccache.dev/manual/4.7.4.html#config_compression

I tried 5 (see commits), but observed no change in cache size. Could try higher levels or recompression.

As a rule of thumb, use level 5 or lower since higher levels may slow down compilations noticeably. Higher levels are however useful when recompressing the cache with command line option -X/--recompress.

Ah I thought GitHub was doing its own compression? I was suggesting that perhaps it would be better to turn off ccache compression entirely, since GitHub is also compressing it

GitHub does its own compression yeah. I think ccache compresses as it writes, which might be faster than compressing after... I can experiment a bit more.

@ScottTodd
Copy link
Member Author

Ah, but we do need to be careful about disk size during the build if not compressing: https://github.com/iree-org/iree/actions/runs/3963413155/jobs/6791208036

Local storage:
  Cache size (GB):  2.27 /  5.00 (45.49%)
+ ccache --show-compression
Total data:            2.3 GB (2.3 GB disk blocks)
Compressed data:       2.3 GB (18.4% of original size)
  Original size:      12.3 GB
  Compression ratio: 5.428 x  (81.6% space savings)

(note the default max cache size on disk and the original size)

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
key: ccache_all_windows_${{ github.sha }}
restore-keys: ccache_all_windows
# Fetch dependencies.
# TODO(scotttodd): Move these into a Docker image.
- name: "Updating git submodules"
run: git submodule update --init --jobs 8 --depth 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible that the magic 8 here might be different on windows. It's just something I came up with empirically

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... even with a very recent cache hit for .git/modules/, https://github.com/iree-org/iree/actions/runs/3971047586/jobs/6807491522 took 9 minutes ... most of that to fetch llvm-project :/

Comment on lines 141 to 142
IREE_WRITE_LOCAL_CCACHE: 1
# IREE_WRITE_LOCAL_CCACHE: ${{ needs.setup.outputs.write-caches }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably you're going to flip this back? Reminder to do so

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I'm troubleshooting the cache right now. A build earlier got a cache hit on the 2GB data but then had 100% cache misses... creating a 4GB cache entry. Assuming something like cache compression changes invalidated the cache (and not cosmic rays / other compiler settings), it might still make sense to just set the max cache size in ccache to 2-2.5GB.

Copy link
Member Author

@ScottTodd ScottTodd Jan 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is weirddd...

I changed a few comments and turned off IREE_WRITE_LOCAL_CCACHE and got

+ ccache --show-stats
Cacheable calls:   4007 / 4053 (98.87%)
  Hits:            1030 / 4007 (25.71%)
    Direct:        1025 / 1030 (99.51%)
    Preprocessed:     5 / 1030 ( 0.49%)
  Misses:          2977 / 4007 (74.29%)
Uncacheable calls:   46 / 4053 ( 1.13%)
Local storage:
  Cache size (GB): 2.42 / 3.00 (80.69%)

(logs: https://github.com/iree-org/iree/actions/runs/3972205901/jobs/6809871352)

I'll try with IREE_WRITE_LOCAL_CCACHE on now... this is tricky to debug, but it will be even harder after it is merged.

There are some MSVC flags suggested at https://github.com/ccache/ccache/wiki/MS-Visual-Studio that might help too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a significant number of cache hits are from the build compiling the same thing multiple times?

Copy link
Member Author

@ScottTodd ScottTodd Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same behavior with writes enabled

(logs: https://github.com/iree-org/iree/actions/runs/3972319654/jobs/6810116754)

+ ccache --show-stats
Cacheable calls:   3996 / 4053 (98.59%)
  Hits:            1030 / 3996 (25.78%)
    Direct:        1024 / 1030 (99.42%)
    Preprocessed:     6 / 1030 ( 0.58%)
  Misses:          2966 / 3996 (74.22%)
Uncacheable calls:   57 / 4053 ( 1.41%)
Local storage:
  Cache size (GB): 2.75 / 3.00 (91.75%)
  Cleanups:          41

I'm tempted to land this and watch the postsubmit stats. Presubmit will probably be too limited by cache misses given the size of the cache and low hit rates even with no changes...

.github/workflows/ci.yml Outdated Show resolved Hide resolved
.github/workflows/ci.yml Outdated Show resolved Hide resolved
@ScottTodd ScottTodd merged commit fb16d60 into iree-org:main Jan 23, 2023
@ScottTodd ScottTodd deleted the ccache-windows branch January 23, 2023 19:48
ScottTodd added a commit that referenced this pull request Jan 23, 2023
Missed this on #11906

Some outputs are using true/false, while others are using 0/1.
qedawkins pushed a commit to qedawkins/iree that referenced this pull request Apr 2, 2023
This uses GitHub's [actions/cache](https://github.com/actions/cache)
together with [ccache](https://ccache.dev/) to speed up our
`build_test_all_windows` GitHub Actions CI job. I also tested caching
with the `build_test_runtime_windows` job, but benefits were negligible
there.

We use ccache for our CMake Linux jobs, but those jobs are running on
self-hosted runners and not GitHub-managed runners. The self-hosted
runners have write access to the GCS bucket we store our remote cache
in, while the GitHub-managed runners do not. The
[actions/cache](https://github.com/actions/cache) action can be used to
similarly store a remote cache, though with more steps in the job
definitions.

Git submodules have been taking much longer to update on Windows than on
Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache
action to start the `.git/modules/` directory with an initial set of
files, at least until git or
[actions/checkout](https://github.com/actions/checkout) improves
behavior on Windows.

This uses two caches: one for git submodules and one for ccache. The
caches are always read/restored, and they are only written/saved when
`needs.setup.outputs.write-caches` is true (currently only when running
workflows on postsubmit).

Note: we have 10GB of cache per repository, which is space for about 4
commits worth of cache entries at current sizes (2.4GB). I'm using
`ccache_all_windows_${{ github.sha }}` as the primary key for immutable
cache entries, then `ccache_all_windows` as the "restore" key pattern,
which will match the most recently added cache entry. Cache entries can
be managed at https://github.com/iree-org/iree/actions/caches.

Progress on iree-org#11009. Once this
lands we can probably move the `build_test_all_windows` job to run on
presubmit.

## Experimental results:

Note: these are best-case results. I've also observed many cache misses
where hits would be expected, so more analysis will be needed.

### `build_test_runtime_windows`

cache size: 27MB (git) + 59MB (ccache) = 86MB (total)

Configuration | Logs | total time | submodule checkout timing | build
timing
------------- | ---- | ---------- | ------------------------- |
------------
baseline |
[logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130)
| 4m 20s | 35s | 1m 13s
new (cache miss) |
[logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857)
| 5m 26s | 39s | 1m50s
new (cache hit) |
[logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992)
| 4m 5s | 20s | 42s

### `build_test_all_windows`

cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total)

Configuration | Logs | total time | submodule checkout timing | build
timing
------------- | ---- | ---------- | ------------------------- |
------------
baseline |
[logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967)
| 31m 16s | 5m 58s | 22m 46s
new (cache miss) |
[logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752)
| 30m 8s | 6m 30s | 14m 55s
new (cache hit) |
[logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849)
| 14m 9s | 1m15s | 4m34s

Note: 5 minutes of the total time is spent uploading cache data, which
will only happen on postsubmit.
qedawkins pushed a commit to qedawkins/iree that referenced this pull request Apr 2, 2023
)

Missed this on iree-org#11906

Some outputs are using true/false, while others are using 0/1.
jpienaar pushed a commit that referenced this pull request May 1, 2023
This uses GitHub's [actions/cache](https://github.com/actions/cache)
together with [ccache](https://ccache.dev/) to speed up our
`build_test_all_windows` GitHub Actions CI job. I also tested caching
with the `build_test_runtime_windows` job, but benefits were negligible
there.

We use ccache for our CMake Linux jobs, but those jobs are running on
self-hosted runners and not GitHub-managed runners. The self-hosted
runners have write access to the GCS bucket we store our remote cache
in, while the GitHub-managed runners do not. The
[actions/cache](https://github.com/actions/cache) action can be used to
similarly store a remote cache, though with more steps in the job
definitions.

Git submodules have been taking much longer to update on Windows than on
Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache
action to start the `.git/modules/` directory with an initial set of
files, at least until git or
[actions/checkout](https://github.com/actions/checkout) improves
behavior on Windows.

This uses two caches: one for git submodules and one for ccache. The
caches are always read/restored, and they are only written/saved when
`needs.setup.outputs.write-caches` is true (currently only when running
workflows on postsubmit).

Note: we have 10GB of cache per repository, which is space for about 4
commits worth of cache entries at current sizes (2.4GB). I'm using
`ccache_all_windows_${{ github.sha }}` as the primary key for immutable
cache entries, then `ccache_all_windows` as the "restore" key pattern,
which will match the most recently added cache entry. Cache entries can
be managed at https://github.com/iree-org/iree/actions/caches.

Progress on #11009. Once this
lands we can probably move the `build_test_all_windows` job to run on
presubmit.

## Experimental results:

Note: these are best-case results. I've also observed many cache misses
where hits would be expected, so more analysis will be needed.

### `build_test_runtime_windows`

cache size: 27MB (git) + 59MB (ccache) = 86MB (total)

Configuration | Logs | total time | submodule checkout timing | build
timing
------------- | ---- | ---------- | ------------------------- |
------------
baseline |
[logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130)
| 4m 20s | 35s | 1m 13s
new (cache miss) |
[logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857)
| 5m 26s | 39s | 1m50s
new (cache hit) |
[logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992)
| 4m 5s | 20s | 42s

### `build_test_all_windows`

cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total)

Configuration | Logs | total time | submodule checkout timing | build
timing
------------- | ---- | ---------- | ------------------------- |
------------
baseline |
[logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967)
| 31m 16s | 5m 58s | 22m 46s
new (cache miss) |
[logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752)
| 30m 8s | 6m 30s | 14m 55s
new (cache hit) |
[logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849)
| 14m 9s | 1m15s | 4m34s

Note: 5 minutes of the total time is spent uploading cache data, which
will only happen on postsubmit.
jpienaar pushed a commit that referenced this pull request May 1, 2023
Missed this on #11906

Some outputs are using true/false, while others are using 0/1.
rengolin pushed a commit to plaidml/iree that referenced this pull request May 2, 2023
This uses GitHub's [actions/cache](https://github.com/actions/cache)
together with [ccache](https://ccache.dev/) to speed up our
`build_test_all_windows` GitHub Actions CI job. I also tested caching
with the `build_test_runtime_windows` job, but benefits were negligible
there.

We use ccache for our CMake Linux jobs, but those jobs are running on
self-hosted runners and not GitHub-managed runners. The self-hosted
runners have write access to the GCS bucket we store our remote cache
in, while the GitHub-managed runners do not. The
[actions/cache](https://github.com/actions/cache) action can be used to
similarly store a remote cache, though with more steps in the job
definitions.

Git submodules have been taking much longer to update on Windows than on
Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache
action to start the `.git/modules/` directory with an initial set of
files, at least until git or
[actions/checkout](https://github.com/actions/checkout) improves
behavior on Windows.

This uses two caches: one for git submodules and one for ccache. The
caches are always read/restored, and they are only written/saved when
`needs.setup.outputs.write-caches` is true (currently only when running
workflows on postsubmit).

Note: we have 10GB of cache per repository, which is space for about 4
commits worth of cache entries at current sizes (2.4GB). I'm using
`ccache_all_windows_${{ github.sha }}` as the primary key for immutable
cache entries, then `ccache_all_windows` as the "restore" key pattern,
which will match the most recently added cache entry. Cache entries can
be managed at https://github.com/iree-org/iree/actions/caches.

Progress on iree-org#11009. Once this
lands we can probably move the `build_test_all_windows` job to run on
presubmit.

## Experimental results:

Note: these are best-case results. I've also observed many cache misses
where hits would be expected, so more analysis will be needed.

### `build_test_runtime_windows`

cache size: 27MB (git) + 59MB (ccache) = 86MB (total)

Configuration | Logs | total time | submodule checkout timing | build
timing
------------- | ---- | ---------- | ------------------------- |
------------
baseline |
[logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130)
| 4m 20s | 35s | 1m 13s
new (cache miss) |
[logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857)
| 5m 26s | 39s | 1m50s
new (cache hit) |
[logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992)
| 4m 5s | 20s | 42s

### `build_test_all_windows`

cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total)

Configuration | Logs | total time | submodule checkout timing | build
timing
------------- | ---- | ---------- | ------------------------- |
------------
baseline |
[logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967)
| 31m 16s | 5m 58s | 22m 46s
new (cache miss) |
[logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752)
| 30m 8s | 6m 30s | 14m 55s
new (cache hit) |
[logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849)
| 14m 9s | 1m15s | 4m34s

Note: 5 minutes of the total time is spent uploading cache data, which
will only happen on postsubmit.
rengolin pushed a commit to plaidml/iree that referenced this pull request May 2, 2023
)

Missed this on iree-org#11906

Some outputs are using true/false, while others are using 0/1.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Relating to build systems, CI, or testing platform/windows 🚪 Windows-specific build, execution, benchmarking, and deployment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants