-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v0.11 refactoring #2734
Comments
One thing I was thinking about during merge+diff is that it would be nice if laziness was implemented entirely on the solver level rather than in the cache package. Basic idea being:
The advantages of this approach would be
Hard parts:
This is obviously a vague idea still, but let me know if there's any thoughts on it. |
@sipsma I'm not sure I fully get it. Do you mean it should be handled in
We already have the
I'm not that happy with this part as well. The fact that we store cache records for items that we can't load without external things in context does not feel very solid. Recently this came up on discussing stargz cache import from local/github storage. This doesn't work atm because (iiuc) stargz is externally connected to registry reference. The way it should work is that we should provide stargz a |
Another refactoring issue. Some files are very big. |
I'm not sure if it's in the scope of the refactoring but related to #163, I think it would be nice to also consider moving the dockerfile frontend to another repo as at this time it's quite painful to track changes, specially for the dockerfile parser or the frontend itself but also find relative docs about the syntax. Pros:
Cons:
I have already made an initial implem in https://github.com/crazy-max/dockerfile. Here is the current status: crazy-max/dockerfile#8 and here is the repo of buildkit without the frontend https://github.com/crazy-max/buildkit-nofrontend. See also crazy-max/buildkit-nofrontend#1 for changes. |
@tonistiigi The thought process behind my suggestion is basically:
Here's a sketch of one possible approach (obviously open to suggestions though):
One part that's missing is how to deal with the fact that when MergeOp has a DiffOp input, it actually doesn't need to unlazy the DiffOp result (it can just look at the lower and upper of the DiffOp and apply it directly to the merge rather than create a snapshot for the diff and then apply it to the merged snapshot). But I think that might be fine, multiple potential ways of dealing with it that I can sketch out if we end up going this route.
Yeah I guess this approach would eliminate the need for
Yeah what I'm proposing wouldn't take care of all the problems that need solving to make that happen, but I think it would at least simplify a lot of things for it. We might need to extend |
Been thinking about how to clean up lazy refs more. I agree that the fundamental problem is that we store refs that can't be loaded without external context. The general high-level approaches I'm thinking of are:
@tonistiigi Do you have any thoughts on these options or any totally separate ideas? Right now option 1 seems the most practical to me and I believe it would allow us to get rid of all the ugly |
You will always need a session. And I think the request side would somewhat need to validate that it requested a blob though certain image. Eg. it shouldn't be that I'm building with image A and that hits a lazy record that starts pulling previous image B. Not only for security but image B might already be deleted etc. I also want these changes to solve the stargz cache that doesn't pull from registry case. That is similar but I think the way additional info is required to use the ref would need to be more generic. I think the main problem in here might not be that we store cache for incomplete refs that are not self-loadable, but the way we load cache for such refs is quite fragile. We expect that loader has added all extra requirements already in the context although there isn't much guarantee that it has happened because it comes from a different component. If we instead make it clear that some refs have additional requirements, to access them you need to provide the extra requirements and if you don't it behaves like the record does not exist then it could be more stable. The main difference would be that this would involve changes to the cache loading that would ignore the cases where result is incomplete and caller doesn't have enough info to load it. Maybe the data about cache only being available/loadable when additional data is provided needs to be stored in the instruction cache database. |
Proposed client integration split:
Maybe prefix all with |
Dockerfile tests
|
Is there a timeline for getting v0.11 out? CC @AhmedMozaly |
A lot of new code has been added in the last release to ship new features as soon as possible. Now that the v0.10 release is out we should take a step back and solidify the codebase to make sure we have a good foundation for future developments.
Rather than some local cleanups and renames, I'm more interested if the scope and dependencies for packages make sense. Should we have a new package or interface to have a clear separation of responsibilities between components?
Some problem areas that have grown a lot:
Other approaches:
cache
package should not depend onsolver
like it does now.Action plan:
We need to avoid big PRs and prefer many smaller PRs with well-defined small scope, so they don't get stuck in review. There are many ways to write the same functionality, and we should avoid changes where the improvement is not clear.
Before working on something, make a special issue for it or comment here. In some cases, you need to try an improvement before you can see if it really helps but at least proposed package/interface level changes should have a design proposal first.
Please post other problem areas or ideas in the comments.
@sipsma @ktock @crazy-max
The text was updated successfully, but these errors were encountered: