Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-36154: [JS][CI] Use jest cache in CI #36373

Merged
merged 4 commits into from
Jul 11, 2023
Merged

Conversation

abetomo
Copy link
Contributor

@abetomo abetomo commented Jun 29, 2023

Rationale for this change

Use the cache of test to reduce test time.

What changes are included in this PR?

Use jest cache with actions/cache@v3.

Are these changes tested?

In GitHub Actions, test time was reduced.

Are there any user-facing changes?

Signed-off-by: abetomo <abe@enzou.tokyo>
uses: actions/cache@v3
with:
path: js/.jest-cache
key: js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-${{ hashFiles('js/yarn.lock') }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
key: js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-${{ hashFiles('js/yarn.lock') }}
key: js-jest-cache-${{ runner.os }}-${{ matrix.node }}-${{ hashFiles('js/yarn.lock') }}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thanks. Fixed.

path: js/.jest-cache
key: js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-${{ hashFiles('js/yarn.lock') }}
restore-keys: |
js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-
js-jest-cache-${{ runner.os }}-${{ matrix.node }}-

key: js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-${{ hashFiles('js/yarn.lock') }}
restore-keys: |
js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-
js-jest-cache-${{ runner.os }}-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this?
(Can we reuse cache for different Node.js version?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is reused even if different Node.js versions are used.
I have confirmed this by switching Node.js versions locally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then, can we remove -${{ matrix.node }} from key and restore-keys?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will you be testing on more than one node in the future?
(like node: [18, 20].)

I just tested a few times locally and it seemed a little faster to use the cache of the same Node.js version.
It is certainly unnecessary if you don't test with multiple node.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll not use multiple Node.js versions unless we reduce CI time. (40+ minutes is too long to run multiple jobs. If we want to test against multiple Node.js versions, we'll use nightly CI that are defined in dev/tasks/tasks.yml.)

If the difference between with/without -${{ matrix.node }}, we want to unify it to reduce cache size. apache/arrow has multiple language implementations. So cache limit provided by GitHub Actions is small for us. So we want to reduce cache size as much as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it.
I removed matrix.node.

Comment on lines 129 to 131
key: js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-${{ hashFiles('js/yarn.lock') }}
restore-keys: |
js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
key: js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-${{ hashFiles('js/yarn.lock') }}
restore-keys: |
js-jest-cache-${{ runner.os }}-${{ matrix.node-version }}-
key: js-jest-cache-${{ runner.os }}-${{ matrix.node }}-${{ hashFiles('js/yarn.lock') }}
restore-keys: |
js-jest-cache-${{ runner.os }}-${{ matrix.node }}-

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Jun 29, 2023
@domoritz
Copy link
Member

Great. Do you have some before and after measurements?

Signed-off-by: abetomo <abe@enzou.tokyo>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 29, 2023
@abetomo
Copy link
Contributor Author

abetomo commented Jun 29, 2023

@domoritz Thanks for the review.
Here is the link to the measurement results on GitHub Actions.
#36154 (comment)

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Jun 30, 2023
Signed-off-by: abetomo <abe@enzou.tokyo>
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Jun 30, 2023
@domoritz
Copy link
Member

Great. The times seem faster although not by as much as I expected. Are we sure that caching doesn’t introduce issues where tests are stale? What is in the cache?

@abetomo
Copy link
Contributor Author

abetomo commented Jun 30, 2023

https://jestjs.io/docs/code-transformation

Jest will cache the result of a transformation and attempt to invalidate that result based on a number of factors, such as the source of the file being transformed and changing configuration.

Jest caches transformed codes.
In most cases, the cache should not cause problems.

The documentation notes the following.
https://jestjs.io/docs/cli#--cache

The cache should only be disabled if you are experiencing caching related problems.

So, it is not always possible to say that problems will not occur.

@domoritz
Copy link
Member

So should we use the cache across lock files as well then?

@abetomo
Copy link
Contributor Author

abetomo commented Jun 30, 2023

Do you mean to cache node_modules as well?
That seems better.

@domoritz
Copy link
Member

No, caching node modules is a separate issue and I don't think it'll help much. What I mean is that in https://github.com/apache/arrow/pull/36373/files#diff-9992b6f0e329a6dc49913c62f6efe2102f7a64f55d4bdab3d47df86292db5e87R97 we key by the lockfile but if the jest cache is automatically only used when its applicable, we can remove the restriction to specific node and lockfile versions.

@abetomo
Copy link
Contributor Author

abetomo commented Jun 30, 2023

At first, I too thought that the key was js-jest-cache-${{ runner.os }} would be enough.
However, if the key is the same, the cache stored on CI is not updated and the same cache is restored all the time.
So I thought it would be better to update the jest cache on CI when the lock file is updated, so I included the lock file in the key.

@domoritz
Copy link
Member

domoritz commented Jul 1, 2023

Why would it be an issue if the cache is not updated? It feels odd to update based on the lock file rather than some other test related property.

@abetomo
Copy link
Contributor Author

abetomo commented Jul 1, 2023

I agree that using a lock file is not the best way to do this.
I would like to be able to key in changes to files in src and test, but which is best?
Is it better to use github.sha?

If the cache is not updated, the cache will not be usable and the speed improvement will be less effective.

@kou
Copy link
Member

kou commented Jul 1, 2023

What are the inputs for jest's cache? We should use a hash value for them in key.

@domoritz
Copy link
Member

domoritz commented Jul 1, 2023

What we really want is a way to use a cache and still update it. Anything else will not catch many cases and the CI will still time out.

@abetomo
Copy link
Contributor Author

abetomo commented Jul 1, 2023

Since the cache for jest is a transformed code, it is created from the code in src and test.
I could not find a way to use those hash values for the key in a good way.
I thought github.sha seemed to be the most like it.

@kou
Copy link
Member

kou commented Jul 2, 2023

Then, we can use js-jest-cache-${{ runner.os }}-${{ hashFiles('js/src/**/*.ts', ''js/test/**/*.ts') }} for key like other our workflow files. If we can't reuse most cache when we update dependencies, we can also add js/yarn.lock like js-jest-cache-${{ runner.os }}-${{ hashFiles('js/src/**/*.ts', ''js/test/**/*.ts', 'js/yarn.lock') }}.

We should not use github.sha. If we use it, our cache is always updated even when we don't change js/{src,test}/**/*.ts. It will waste our available cache limit.

@abetomo
Copy link
Contributor Author

abetomo commented Jul 3, 2023

@kou Thanks for the information.
I did not know that you could use * in hashFiles or that you could specify more than one.
js-jest-cache-${{ runner.os }}-${{ hashFiles('js/src/**/*.ts', ''js/test/**/*.ts') }} looks good.

@domoritz What do you think?

@assignUser
Copy link
Member

What we really want is a way to use a cache and still update it.

This is not possible due to the way actions/cache works once a cache with a certain key is created it is immutable. But using the ts file hashes in the key should work as you intend imo. (unless dependencies also come into play than you might want to ad the lock file as kou said)

@domoritz
Copy link
Member

domoritz commented Jul 5, 2023

Makes sense that caches are immutable. Do we have a way to check whether the cache is used? If it’s used even if dependencies change, I think we should not include the lock file in the hash.

@assignUser
Copy link
Member

I am not sure what you mean with if the cache is used? Used by jest? I don't know @abetomo? Though actions/cache sets an output value if there was a cache hit or not, if that is what you mean.

@kou
Copy link
Member

kou commented Jul 5, 2023

An existing cache is used even when dependencies are only changed because we have restore-keys: js-jest-cache-${{ runner.os }}-.
But the cache isn't updated if we don't have 'js/yarn.lock' in hashFiles(). Because the cache key isn't changed.
If jest's cache depends on dependencies too, we should add js/yarn.lock to hashFiles() too.

@domoritz
Copy link
Member

domoritz commented Jul 5, 2023

What I mean is that I’d like to make sure we know when the jest cache is actually used by jest or whether jest ignored it because the cache wasn’t applicable to the code.

@abetomo
Copy link
Contributor Author

abetomo commented Jul 5, 2023

The cache is generated in cacheDirectory.
If there is no cache (the cache is not used), a new cache file will be generated.
If the cache is used, no new file is generated.
This will help us determine if you used the cache or not.

There does not seem to be any option in jest to more easily know if the cache is used or not.

@abetomo
Copy link
Contributor Author

abetomo commented Jul 6, 2023

I tried it and when the package related to the code transform changes (when the lock file is updated), jest cache is also updated.

What about the key js-jest-cache-${{ runner.os }}-${{ hashFiles('js/src/**/*.ts', ''js/test/**/*.ts', 'js/yarn.lock') }}?

Signed-off-by: abetomo <abe@enzou.tokyo>
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

I'll merge in a few days unless nobody objects this.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Jul 9, 2023
@kou kou merged commit b07e3f1 into apache:main Jul 11, 2023
@kou kou removed the awaiting merge Awaiting merge label Jul 11, 2023
@abetomo abetomo deleted the ci-jest-cache branch July 11, 2023 05:46
@abetomo
Copy link
Contributor Author

abetomo commented Jul 11, 2023

Thanks for the merge.

@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit b07e3f1.There were 3 benchmark results indicating a performance regression:

The full Conbench report has more details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[JS][CI] Timeout on macos in GitHub Actions
5 participants