-
Notifications
You must be signed in to change notification settings - Fork 608
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add build/test Windows CI #11009
Comments
Progress on #11009 Tested here: https://github.com/ScottTodd/iree/actions/runs/3380820232/jobs/5614038219 Post-submit only for now, so we can monitor it. I saw one build hang in the "Configuring MSVC" step: https://github.com/ScottTodd/iree/actions/runs/3380762369/jobs/5613916852
…1032) Progress on #11009, depends on #11048 Changes: * `build_runtime` + `test_runtime` -> `build_test_runtime` (overhead from repository cloning, artifact upload, and artifact download was taking longer than just running the tests from the same job) * `build_runtime_windows` -> `build_test_runtime_windows` * Runs on `managed-windows-cpu` (larger build machine) * Runs tests, instead of just builds (now that all runtime tests pass on Windows) * Runs on presubmit now too, instead of just postsubmit (the build appears to be stable) Sample run: https://github.com/iree-org/iree/actions/runs/3412369869/jobs/5677798847
Some tests are failing on Windows. These should either be fixed or disabled prior to adding CI: https://github.com/iree-org/iree/actions/runs/3414442364/jobs/5682427788
|
Progress on #11009 This fixes these two tests on Windows: * `iree/compiler/Dialect/HAL/Target/LLVM/test/smoketest_system.mlir.test` * `iree/tests/e2e/regression/libm_linking.mlir.test` Developers should run vcvarsall before ctest, or configure their IDE as needed. For VSCode, I have this set: ```json "cmakeExplorer.extraCtestEnvVars": { "VCTOOLSINSTALLDIR": "C:\\Program Files (x86)\\Microsoft Visual Studio\\2022\\Preview\\VC\\Tools\\MSVC\\14.31.31103\\", "UNIVERSALCRTSDKDIR": "C:\\Program Files (x86)\\Windows Kits\\10\\", "UCRTVersion": "10.0.19041.0", }, ```
These tests (also mentioned above, with links to issues) are still failing:
but I'd still like to use I see a few ways to skip those tests / mark them as XFAIL, but I'm not sure which to use.
Explicit filters in the script seems the easiest, but then developers running ctest manually will still see failures. That would make it easier to work on fixing the errors though. |
Progress on #11009 (see other implementation ideas for filtering at #11009 (comment))
Progress on #11009 Sample successful run: https://github.com/iree-org/iree/actions/runs/3448018286/jobs/5754640528 At first this will only run on postsubmit, then we can move this to presubmit if it is stable and we have enough resources. We should also keep an eye on how long this takes to run (checkout seems to take substantially longer than on Linux), and also consider using Docker to manage the dependencies and environment. Notes: * This does not test Vulkan or CUDA - we'd want a runner with a physical GPU (or SwiftShader) for that * This downloads CUDA deps on demand via CMake (so there's some additional network activity)
We have two CI jobs now: Timing varies from run to run, but we're looking at roughly
I'd like to aim for 15 minutes total, but could tolerate 25 minutes. We have longer builds (especially multistage builds like those involved in benchmarking), but spending so much time downloading and rebuilding is wasteful. To speed up the build, I just evaluated ccache on Windows with MSVC and was able to get it working on my local machine. We normally use GCS to host our remote cache though, which GitHub-hosted runners don't have write access to: iree/build_tools/cmake/setup_ccache.sh Line 19 in c1c7b48
Lines 305 to 306 in c1c7b48
To speed up the submodule checkout, I've tried a few different variations on https://github.com/actions/checkout settings and explicit |
One step towards enabling [ccache](https://ccache.dev/) on our Windows CI, but there are a few details to still work through: #11009 (comment). On my machine, I see these results (sample size 1): > clean build (no cache): 528 seconds > ``` > λ ccache --show-stats > Cacheable calls: 3942 / 3943 (99.97%) > Hits: 2 / 3942 ( 0.05%) > Direct: 0 / 2 ( 0.00%) > Preprocessed: 2 / 2 (100.0%) > Misses: 3940 / 3942 (99.95%) > Uncacheable calls: 1 / 3943 ( 0.03%) > Local storage: > Cache size (GB): 2.21 / 5.00 (44.21%) > Cleanups: 16 > ``` > clean build (with cache): 96 seconds > ``` > λ ccache --show-stats > Cacheable calls: 3942 / 3943 (99.97%) > Hits: 3939 / 3942 (99.92%) > Direct: 3939 / 3939 (100.0%) > Preprocessed: 0 / 3939 ( 0.00%) > Misses: 3 / 3942 ( 0.08%) > Uncacheable calls: 1 / 3943 ( 0.03%) > Local storage: > Cache size (GB): 2.21 / 5.00 (44.23%) > ``` My only changes to enable ccache were: * Download ccache.exe and put it on my `PATH` * Configure CMake with `-DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` (added to `"cmake.configureArgs": [ ... ]` in VSCode settings)
This uses GitHub's [actions/cache](https://github.com/actions/cache) together with [ccache](https://ccache.dev/) to speed up our `build_test_all_windows` GitHub Actions CI job. I also tested caching with the `build_test_runtime_windows` job, but benefits were negligible there. We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The [actions/cache](https://github.com/actions/cache) action can be used to similarly store a remote cache, though with more steps in the job definitions. Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the `.git/modules/` directory with an initial set of files, at least until git or [actions/checkout](https://github.com/actions/checkout) improves behavior on Windows. This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (currently only when running workflows on postsubmit). Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using `ccache_all_windows_${{ github.sha }}` as the primary key for immutable cache entries, then `ccache_all_windows` as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches. Progress on #11009. Once this lands we can probably move the `build_test_all_windows` job to run on presubmit. ## Experimental results: Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed. ### `build_test_runtime_windows` cache size: 27MB (git) + 59MB (ccache) = 86MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130) | 4m 20s | 35s | 1m 13s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857) | 5m 26s | 39s | 1m50s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992) | 4m 5s | 20s | 42s ### `build_test_all_windows` cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967) | 31m 16s | 5m 58s | 22m 46s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752) | 30m 8s | 6m 30s | 14m 55s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849) | 14m 9s | 1m15s | 4m34s Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.
As we're using shallow clones, this is providing dubious value. 1.7GB of cache data is also cutting into our repo limit of 10GB, which may be better used by ccache (recent runs are getting only 16% cache hits reliably... need to also investigate that). Test run: https://github.com/openxla/iree/actions/runs/4326504405/jobs/7554009641 Progress on #11009
We've been going over the max cache size (see "cleanups" in [these logs](https://github.com/openxla/iree/actions/runs/4326924360/jobs/7554950036#step:9:8614)): ``` + ccache --show-stats Cacheable calls: 4826 / 4885 (98.79%) Hits: 594 / 4826 (12.31%) Direct: 83 / 594 (13.97%) Preprocessed: 511 / 594 (86.03%) Misses: 4232 / 4826 (87.69%) Uncacheable calls: 59 / 4885 ( 1.21%) Local storage: Cache size (GB): 2.65 / 3.00 (88.43%) Cleanups: 85 ``` [ccache](https://ccache.dev/) by default compresses with zstd level 1, but we can increase that: https://ccache.dev/manual/4.2.1.html#config_compression_level > As a rule of thumb, use level 5 or lower since higher levels may slow down compilations noticeably. On my dev machine, level 5 saves 600MB: ``` λ ccache --recompress 5 Recompressing... 100.0% [=========================================================================] Original data: 24.6 GB Old compressed data: 4.4 GB (18.1% of original size) - Compression ratio: 5.537 x (81.9% space savings) New compressed data: 3.8 GB (15.5% of original size) - Compression ratio: 6.439 x (84.5% space savings) Size change: -623.5 MB ``` Sample run: https://github.com/openxla/iree/actions/runs/4327549301/jobs/7556311688#step:9:8617 (starting from existing compression level cache) --- Progress on #11009
We're at around 17-20 minutes now:
The submodule update taking so long is unfortunate (it's only slower on the larger GitHub-managed runners, but we can't build in any reasonable time on the standard sized runners). We asked GitHub support if they could help investigate that, but we haven't heard back from their engineering team yet. Beyond that, I'd like to tweak the ccache setup slightly before turning this build on for presubmits. The current behavior uses the IREE commit as the cache key, which results in every commit creating a new cache entry (evicting previous cache entries, since each entry is ~3.4GB and we have 10GB total). If there is no exact match, the cache fetch falls back to the latest cache entry. This is reasonable for postsubmit (where code is always moving forward), but could be improved for presubmit (where base commits can vary). We could change the cache key to use the LLVM commit instead, which would increase the likelihood that a PR would be able to fetch a relevant cache entry. On our Linux runners, we use a remote ccache in one of our cloud buckets (without the 10GB limit). We can't currently do that for Windows since the GitHub-managed runners don't have write access to those buckets. |
Windows CI cache behavior regressed around 2eb6450 (from #12562). Before:
After:
We can check which calls weren't able to be cached by inspecting the ccache log file. Ideally the cache size would be smaller too... looks like we're hitting the limit we set again (4GB this time). |
This uses GitHub's [actions/cache](https://github.com/actions/cache) together with [ccache](https://ccache.dev/) to speed up our `build_test_all_windows` GitHub Actions CI job. I also tested caching with the `build_test_runtime_windows` job, but benefits were negligible there. We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The [actions/cache](https://github.com/actions/cache) action can be used to similarly store a remote cache, though with more steps in the job definitions. Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the `.git/modules/` directory with an initial set of files, at least until git or [actions/checkout](https://github.com/actions/checkout) improves behavior on Windows. This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (currently only when running workflows on postsubmit). Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using `ccache_all_windows_${{ github.sha }}` as the primary key for immutable cache entries, then `ccache_all_windows` as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches. Progress on iree-org#11009. Once this lands we can probably move the `build_test_all_windows` job to run on presubmit. ## Experimental results: Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed. ### `build_test_runtime_windows` cache size: 27MB (git) + 59MB (ccache) = 86MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130) | 4m 20s | 35s | 1m 13s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857) | 5m 26s | 39s | 1m50s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992) | 4m 5s | 20s | 42s ### `build_test_all_windows` cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967) | 31m 16s | 5m 58s | 22m 46s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752) | 30m 8s | 6m 30s | 14m 55s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849) | 14m 9s | 1m15s | 4m34s Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.
As we're using shallow clones, this is providing dubious value. 1.7GB of cache data is also cutting into our repo limit of 10GB, which may be better used by ccache (recent runs are getting only 16% cache hits reliably... need to also investigate that). Test run: https://github.com/openxla/iree/actions/runs/4326504405/jobs/7554009641 Progress on iree-org#11009
We've been going over the max cache size (see "cleanups" in [these logs](https://github.com/openxla/iree/actions/runs/4326924360/jobs/7554950036#step:9:8614)): ``` + ccache --show-stats Cacheable calls: 4826 / 4885 (98.79%) Hits: 594 / 4826 (12.31%) Direct: 83 / 594 (13.97%) Preprocessed: 511 / 594 (86.03%) Misses: 4232 / 4826 (87.69%) Uncacheable calls: 59 / 4885 ( 1.21%) Local storage: Cache size (GB): 2.65 / 3.00 (88.43%) Cleanups: 85 ``` [ccache](https://ccache.dev/) by default compresses with zstd level 1, but we can increase that: https://ccache.dev/manual/4.2.1.html#config_compression_level > As a rule of thumb, use level 5 or lower since higher levels may slow down compilations noticeably. On my dev machine, level 5 saves 600MB: ``` λ ccache --recompress 5 Recompressing... 100.0% [=========================================================================] Original data: 24.6 GB Old compressed data: 4.4 GB (18.1% of original size) - Compression ratio: 5.537 x (81.9% space savings) New compressed data: 3.8 GB (15.5% of original size) - Compression ratio: 6.439 x (84.5% space savings) Size change: -623.5 MB ``` Sample run: https://github.com/openxla/iree/actions/runs/4327549301/jobs/7556311688#step:9:8617 (starting from existing compression level cache) --- Progress on iree-org#11009
actions/checkout@ef43818 may improve checkout time. Time to run some more experiments :) |
Nope, still catastrophically slow :( |
The GitHub-provided `actions/checkout` action is for some reason unusably slow on the large managed Windows runners. We assumed this was because of some tricky IO issue or something, but I decide to just try directly using `git` commands to see and lo the checkout time goes from 10 minutes to 1.5 🚀 With the caching improvements from #13183, this gets the Windows build down under 10 minutes, which means we can run it on presubmit (left for a future PR). Part of #11009 Tested: Enabled this workflow on push to my branch: https://github.com/openxla/iree/actions/runs/4750681034/jobs/8439091687 skip-ci: this only affects the Windows job, which isn't run on presubmit
One step towards enabling [ccache](https://ccache.dev/) on our Windows CI, but there are a few details to still work through: #11009 (comment). On my machine, I see these results (sample size 1): > clean build (no cache): 528 seconds > ``` > λ ccache --show-stats > Cacheable calls: 3942 / 3943 (99.97%) > Hits: 2 / 3942 ( 0.05%) > Direct: 0 / 2 ( 0.00%) > Preprocessed: 2 / 2 (100.0%) > Misses: 3940 / 3942 (99.95%) > Uncacheable calls: 1 / 3943 ( 0.03%) > Local storage: > Cache size (GB): 2.21 / 5.00 (44.21%) > Cleanups: 16 > ``` > clean build (with cache): 96 seconds > ``` > λ ccache --show-stats > Cacheable calls: 3942 / 3943 (99.97%) > Hits: 3939 / 3942 (99.92%) > Direct: 3939 / 3939 (100.0%) > Preprocessed: 0 / 3939 ( 0.00%) > Misses: 3 / 3942 ( 0.08%) > Uncacheable calls: 1 / 3943 ( 0.03%) > Local storage: > Cache size (GB): 2.21 / 5.00 (44.23%) > ``` My only changes to enable ccache were: * Download ccache.exe and put it on my `PATH` * Configure CMake with `-DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache` (added to `"cmake.configureArgs": [ ... ]` in VSCode settings)
This uses GitHub's [actions/cache](https://github.com/actions/cache) together with [ccache](https://ccache.dev/) to speed up our `build_test_all_windows` GitHub Actions CI job. I also tested caching with the `build_test_runtime_windows` job, but benefits were negligible there. We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The [actions/cache](https://github.com/actions/cache) action can be used to similarly store a remote cache, though with more steps in the job definitions. Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the `.git/modules/` directory with an initial set of files, at least until git or [actions/checkout](https://github.com/actions/checkout) improves behavior on Windows. This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (currently only when running workflows on postsubmit). Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using `ccache_all_windows_${{ github.sha }}` as the primary key for immutable cache entries, then `ccache_all_windows` as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches. Progress on #11009. Once this lands we can probably move the `build_test_all_windows` job to run on presubmit. ## Experimental results: Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed. ### `build_test_runtime_windows` cache size: 27MB (git) + 59MB (ccache) = 86MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130) | 4m 20s | 35s | 1m 13s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857) | 5m 26s | 39s | 1m50s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992) | 4m 5s | 20s | 42s ### `build_test_all_windows` cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967) | 31m 16s | 5m 58s | 22m 46s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752) | 30m 8s | 6m 30s | 14m 55s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849) | 14m 9s | 1m15s | 4m34s Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.
As we're using shallow clones, this is providing dubious value. 1.7GB of cache data is also cutting into our repo limit of 10GB, which may be better used by ccache (recent runs are getting only 16% cache hits reliably... need to also investigate that). Test run: https://github.com/openxla/iree/actions/runs/4326504405/jobs/7554009641 Progress on #11009
We've been going over the max cache size (see "cleanups" in [these logs](https://github.com/openxla/iree/actions/runs/4326924360/jobs/7554950036#step:9:8614)): ``` + ccache --show-stats Cacheable calls: 4826 / 4885 (98.79%) Hits: 594 / 4826 (12.31%) Direct: 83 / 594 (13.97%) Preprocessed: 511 / 594 (86.03%) Misses: 4232 / 4826 (87.69%) Uncacheable calls: 59 / 4885 ( 1.21%) Local storage: Cache size (GB): 2.65 / 3.00 (88.43%) Cleanups: 85 ``` [ccache](https://ccache.dev/) by default compresses with zstd level 1, but we can increase that: https://ccache.dev/manual/4.2.1.html#config_compression_level > As a rule of thumb, use level 5 or lower since higher levels may slow down compilations noticeably. On my dev machine, level 5 saves 600MB: ``` λ ccache --recompress 5 Recompressing... 100.0% [=========================================================================] Original data: 24.6 GB Old compressed data: 4.4 GB (18.1% of original size) - Compression ratio: 5.537 x (81.9% space savings) New compressed data: 3.8 GB (15.5% of original size) - Compression ratio: 6.439 x (84.5% space savings) Size change: -623.5 MB ``` Sample run: https://github.com/openxla/iree/actions/runs/4327549301/jobs/7556311688#step:9:8617 (starting from existing compression level cache) --- Progress on #11009
The GitHub-provided `actions/checkout` action is for some reason unusably slow on the large managed Windows runners. We assumed this was because of some tricky IO issue or something, but I decide to just try directly using `git` commands to see and lo the checkout time goes from 10 minutes to 1.5 🚀 With the caching improvements from #13183, this gets the Windows build down under 10 minutes, which means we can run it on presubmit (left for a future PR). Part of #11009 Tested: Enabled this workflow on push to my branch: https://github.com/openxla/iree/actions/runs/4750681034/jobs/8439091687 skip-ci: this only affects the Windows job, which isn't run on presubmit
This uses GitHub's [actions/cache](https://github.com/actions/cache) together with [ccache](https://ccache.dev/) to speed up our `build_test_all_windows` GitHub Actions CI job. I also tested caching with the `build_test_runtime_windows` job, but benefits were negligible there. We use ccache for our CMake Linux jobs, but those jobs are running on self-hosted runners and not GitHub-managed runners. The self-hosted runners have write access to the GCS bucket we store our remote cache in, while the GitHub-managed runners do not. The [actions/cache](https://github.com/actions/cache) action can be used to similarly store a remote cache, though with more steps in the job definitions. Git submodules have been taking much longer to update on Windows than on Linux (6-10 minutes vs 1-2 minutes). We can similarly use the cache action to start the `.git/modules/` directory with an initial set of files, at least until git or [actions/checkout](https://github.com/actions/checkout) improves behavior on Windows. This uses two caches: one for git submodules and one for ccache. The caches are always read/restored, and they are only written/saved when `needs.setup.outputs.write-caches` is true (currently only when running workflows on postsubmit). Note: we have 10GB of cache per repository, which is space for about 4 commits worth of cache entries at current sizes (2.4GB). I'm using `ccache_all_windows_${{ github.sha }}` as the primary key for immutable cache entries, then `ccache_all_windows` as the "restore" key pattern, which will match the most recently added cache entry. Cache entries can be managed at https://github.com/iree-org/iree/actions/caches. Progress on iree-org#11009. Once this lands we can probably move the `build_test_all_windows` job to run on presubmit. ## Experimental results: Note: these are best-case results. I've also observed many cache misses where hits would be expected, so more analysis will be needed. ### `build_test_runtime_windows` cache size: 27MB (git) + 59MB (ccache) = 86MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775683130) | 4m 20s | 35s | 1m 13s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395857) | 5m 26s | 39s | 1m50s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837992) | 4m 5s | 20s | 42s ### `build_test_all_windows` cache size: 230MB (git) + 2167MB (ccache) = 2397MB (total) Configuration | Logs | total time | submodule checkout timing | build timing ------------- | ---- | ---------- | ------------------------- | ------------ baseline | [logs](https://github.com/iree-org/iree/actions/runs/3956450018/jobs/6775681967) | 31m 16s | 5m 58s | 22m 46s new (cache miss) | [logs](https://github.com/iree-org/iree/actions/runs/3963023407/jobs/6790395752) | 30m 8s | 6m 30s | 14m 55s new (cache hit) | [logs](https://github.com/iree-org/iree/actions/runs/3963233498/jobs/6790837849) | 14m 9s | 1m15s | 4m34s Note: 5 minutes of the total time is spent uploading cache data, which will only happen on postsubmit.
As we're using shallow clones, this is providing dubious value. 1.7GB of cache data is also cutting into our repo limit of 10GB, which may be better used by ccache (recent runs are getting only 16% cache hits reliably... need to also investigate that). Test run: https://github.com/openxla/iree/actions/runs/4326504405/jobs/7554009641 Progress on iree-org#11009
We've been going over the max cache size (see "cleanups" in [these logs](https://github.com/openxla/iree/actions/runs/4326924360/jobs/7554950036#step:9:8614)): ``` + ccache --show-stats Cacheable calls: 4826 / 4885 (98.79%) Hits: 594 / 4826 (12.31%) Direct: 83 / 594 (13.97%) Preprocessed: 511 / 594 (86.03%) Misses: 4232 / 4826 (87.69%) Uncacheable calls: 59 / 4885 ( 1.21%) Local storage: Cache size (GB): 2.65 / 3.00 (88.43%) Cleanups: 85 ``` [ccache](https://ccache.dev/) by default compresses with zstd level 1, but we can increase that: https://ccache.dev/manual/4.2.1.html#config_compression_level > As a rule of thumb, use level 5 or lower since higher levels may slow down compilations noticeably. On my dev machine, level 5 saves 600MB: ``` λ ccache --recompress 5 Recompressing... 100.0% [=========================================================================] Original data: 24.6 GB Old compressed data: 4.4 GB (18.1% of original size) - Compression ratio: 5.537 x (81.9% space savings) New compressed data: 3.8 GB (15.5% of original size) - Compression ratio: 6.439 x (84.5% space savings) Size change: -623.5 MB ``` Sample run: https://github.com/openxla/iree/actions/runs/4327549301/jobs/7556311688#step:9:8617 (starting from existing compression level cache) --- Progress on iree-org#11009
As we're using shallow clones, this is providing dubious value. 1.7GB of cache data is also cutting into our repo limit of 10GB, which may be better used by ccache (recent runs are getting only 16% cache hits reliably... need to also investigate that). Test run: https://github.com/openxla/iree/actions/runs/4326504405/jobs/7554009641 Progress on iree-org#11009
We've been going over the max cache size (see "cleanups" in [these logs](https://github.com/openxla/iree/actions/runs/4326924360/jobs/7554950036#step:9:8614)): ``` + ccache --show-stats Cacheable calls: 4826 / 4885 (98.79%) Hits: 594 / 4826 (12.31%) Direct: 83 / 594 (13.97%) Preprocessed: 511 / 594 (86.03%) Misses: 4232 / 4826 (87.69%) Uncacheable calls: 59 / 4885 ( 1.21%) Local storage: Cache size (GB): 2.65 / 3.00 (88.43%) Cleanups: 85 ``` [ccache](https://ccache.dev/) by default compresses with zstd level 1, but we can increase that: https://ccache.dev/manual/4.2.1.html#config_compression_level > As a rule of thumb, use level 5 or lower since higher levels may slow down compilations noticeably. On my dev machine, level 5 saves 600MB: ``` λ ccache --recompress 5 Recompressing... 100.0% [=========================================================================] Original data: 24.6 GB Old compressed data: 4.4 GB (18.1% of original size) - Compression ratio: 5.537 x (81.9% space savings) New compressed data: 3.8 GB (15.5% of original size) - Compression ratio: 6.439 x (84.5% space savings) Size change: -623.5 MB ``` Sample run: https://github.com/openxla/iree/actions/runs/4327549301/jobs/7556311688#step:9:8617 (starting from existing compression level cache) --- Progress on iree-org#11009
…3186) The GitHub-provided `actions/checkout` action is for some reason unusably slow on the large managed Windows runners. We assumed this was because of some tricky IO issue or something, but I decide to just try directly using `git` commands to see and lo the checkout time goes from 10 minutes to 1.5 🚀 With the caching improvements from iree-org#13183, this gets the Windows build down under 10 minutes, which means we can run it on presubmit (left for a future PR). Part of iree-org#11009 Tested: Enabled this workflow on push to my branch: https://github.com/openxla/iree/actions/runs/4750681034/jobs/8439091687 skip-ci: this only affects the Windows job, which isn't run on presubmit
Going to call this fixed, though the compiler build is still postsubmit only. |
GitHub Actions managed runner images look pretty well set up for development already: https://github.com/actions/runner-images/blob/main/images/win/Windows2022-Readme.md
We could use Docker to manage other dependencies as needed (CUDA/Vulkan SDK, Python packages, etc.).
I'd probably start with
build_all
andtest_all
from https://github.com/iree-org/iree/blob/main/.github/workflows/ci.yml, which make use of these scripts:After getting something working, we try larger runners (self-hosted or managed), depending on how slow the builds are.
I don't expect much will be shared between pre/post-submit CI and release builds.
The text was updated successfully, but these errors were encountered: