-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow to cache dependencies #147
Comments
See tektoncd/pipeline#3097 as it contains some background on current limitations and potential other approaches to the caching issue. |
@michaelsauter very interesting discussion! Unfortunately, it doesn't mention why PVC seems to not be the preferred option, thou works.
Do you see any blocker regarding this approach? Have you explored to use a PVC? |
The main argument I think is that PVC are local to nodes or at least to clusters. If your builders run anywhere then you need a place to upload caches to and download caches from. You see this pattern with e.g. GitHub Actions as well. That said, this is not the case for us (all our builders are in the same cluster) so I think using a PVC is a viable option. But maybe it is not the only option for us, therefore I linked this discussion :) |
I see some interesting advantages in using PVC: it can be mounted in a task step e.g. the cluster task |
Having https://tekton.dev/vault/pipelines-v0.24.3/workspaces/ and docs/design/relationship-shared-library.adoc in mind I believe the following is the situation:
Is it correct that the working directory is actually the root directory of the ods-pipeline PVC? One way to enable caching would be to have ods-start no longer wipe the entire pvc. For example it could spare Instead of hard coding this location an environment variable When would the pipeline cache directory be cleaned up? Another way could be to provide a PVC per pipeline. @michaelsauter @stitakis What do you think? |
@henrjk Yes, how you describe the current situation is spot on.
Caching underneath the pipeline name will cause the cache to be used only in rare cases I believe. Given the current architecture of one branch = one pipeline, the first push in any branch would run without cache. Assuming most work happens in branches, we wouldn't see a bit speed-up. Maybe caching per repository would work better? (As a side note: I am not fully convinced one branch = one pipeline is really what we want).
Right now, webhook events of type Maybe cleanup could be quite simple (at least to begin with)? The strategy could be to attempt to keep disk usage under e.g. 80%. Assuming a cache location of However if we had one PVC per repo (which is something we likely want to have I'd say), then the outlined strategy will clean up either all the time (because the PVC is "too small") or it will almost never clean up, only protecting the cache from growing forever. |
@gerardcl this may interest you as well! |
@henrjk I like the idea of one PVC per repo. It could work, even if the builds will not be able to run in parallel. I think is a limitation that we can accept. My initial approach was to mount a PVC per cluster task. E.g the |
@stitakis and @michaelsauter I thought that one PVC per pipeline=branch makes sense for the following reasons:
Of course this would not help with develop pipelines. For branch pipelines it appears that most people create a branch via Jira and the build triggered by that branch would then take the hit for initial caching. |
I thought a bit more about this and propose the following: Instead of implementing a certain caching strategy, we could add a
I am not sure what the default should be (but lean towards "no caching"). Regarding cleanup, I would still start with my proposal above to reduce cache dirs until disk usage drops below e.g. 80%. I believe that based on our above discussion regarding different preferences, and taking into account that different build tools also take different approaches when it comes to dependency management, it is best not to make a decision on the level of ODS pipeline as a whole but to delegate this decision to the pipeline authors. The implementation in ODS pipeline is then to simply configure the build tool to use the given cache key (which either may be empty and will be filled, or is already filled and will be modified). Note that this "cache key" approach works within the current "one PVC across all pipelines" situation, but will also work with "one PVC per repo". Once we switch to "one PVC per repo" though, the ability to cache across repos would not work anymore (I think this is acceptable). However we could also think about using a separate PVC just for caching as @stitakis suggested. But I think we need to be very careful what effect this has on parallel execution (knowing |
hi! right, trying to reuse dependencies/envs is not a direct thing:
I see as a good starting point to have such cache-key options proposed above, and per repo. In any case, keep in mind that PVC mounting, unmounting,... is also a resource consumption time which might sometimes be the same amount of time as not doing cache. Same applies to mv or cp or rsync commands. (****comment that, this might be already a known issue, if I push to a branch and then I create a PR, then I have two pipelines of the same branch...fail!) |
Note that for promotion you'd typically not do any build so I think caching wouldn't play a role there.
Therefore I would start by using the same PVC that is mounted anyway for the workspace.
Please open a separate issue. This needs more thought. I think we do want |
I agree with:
How would one decouple caching from the build scripts implementation? |
Why would you like to decouple this? I think we could start having this coupled. A separate cache task like GitHub Actions would have the disadvantage that it requires launching a new pod (until Tekton supports grouping multiple tasks in one pod). |
I meant decouple in the sense of that the build script does not itself impose the caching strategy. |
Oh ok, now I get it. My proposal would be to delegate the decision how to deal with the cache key to the build tasks. I think it will depend on the used technology what to do with it. I do not know how Python should handle it, and maybe there isn't a one-size-fits-all approach for Python. In that case we just have to make a call what we support I guess? I was approaching this with Go caching in mind. The task cache should simply cache the Go module cache. A short description of how that works is at https://go.dev/ref/mod#module-cache. The implementation in the build task would simply set |
@michaelsauter your proposal sounds good to me |
It would be great if
ods-build-go
and friends would be able to cache third party dependencies between builds to speed up build/test execution time. Given we already use a PVC as a workspace, it should be possible to cache. One issue is that the mount point currently gets wiped completely inods-start
.The text was updated successfully, but these errors were encountered: