-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In-process exclusion of readers and writers #1346
Conversation
971d467
to
9bd681b
Compare
83527ff
to
41d56e2
Compare
|
External callers of But that would not protect internal calls within Also, the “Token” objects, unlike interface pointers, could in principle carry state to allow diagnosing their misuse (They don’t currently, they are zero-sized structs.). Interfaces pointers are ~stateless. |
For the record, the empty …Token structs are optimized out and cause no extra machine code to be generated, at least in toy examples. |
f18a6d6
to
b618ff3
Compare
a5e7ef9
to
988704d
Compare
Per the #1332 discussion, to see whether this locking has acceptable performance, essentially all of it has to be built first. So, as much as I like the tokens and the extra bit of safety, that’s a bit too much churn. So, I’ll be trying:
|
3a9c14a
to
309fcfa
Compare
I think this is now broadly representative of where I want to get — but I realize the size is completely unmanageable. Basically the only part worth looking at right now is 309fcfa (“PROTOTYPE: Turn loadMut into inProcessLock”) — that is a very rough sketch of the RFC part; the rest is just infrastructure. |
Next steps are to merge #1345 , then I have a set of thematic branches lined up for PRs. |
744008a
to
2025168
Compare
This is on my todo list but it's a massive PR that requires some uninterrupted time to review which is rare at the moment. |
I’m afraid this PR grew as I was learning things… but basically all of it is trivial infrastructure to just centralize the locking decisions (and I’m filing discovered parts that aren’t infrastructure as separate PRs). I’m also filing that infrastructure as a set of smaller PRs. The thing I’d like most feedback on is #1346 (comment) — both WRT correctness and performance. (And that’s also the thing that is least finished.) |
@mtrmac needs a rebase. |
e19014a
to
e4ea7dd
Compare
Yes, I think this is ready for review (or possible CRI-O benchmarking). |
@mrunalp @haircommander PTAL |
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Impressive work. LGTM
|
||
inProcessLock sync.RWMutex // Can _only_ be obtained with lockfile held. | ||
// The following fields can only be read/written with read/write ownership of inProcessLock, respectively. | ||
// Almost all users should use startReading() or startWriting(). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely non-blocking and just an idea. The Image
struct in libimage has some fields that may be cached. I think you (@mtrmac) suggested to create an embedded struct to highlight the caching nature of these fields (see https://github.com/containers/common/blob/main/libimage/image.go#L38). Cached fields are always accessed via (*Image).cached.field
which IMO makes the code much more expressive and comprehensible.
Would it help to add the fields that require locking in a similar embedded struct? (*containerStore).requiresLock.containers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, that did not occur to me at all, and it’s definitely worth doing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have prototyped this, and it does seem valuable — but it’s a lot of churn.
E.g. on containers.go
, the current PR is +80-14 non-whitespace non-comment changes (including some changes with comments), and moving the fields around would be about +51-45 . Also, in addition to C/I/L stores, the top-level store
would benefit from the same updates, but store.go
is mostly out of scope of this PR.
Overall, I’d prefer to do it in a separate PR a bit later: it would help maintenance but it should not have immediately user-visible impact, which makes it harder for me to prioritize right now — and the churn so far has already been disrupting Nalin’s work, so coordinating a bit better seems warranted.
I have added the suggestion to #1389; it might well be worth tracking as a higher-priority issue separate from the somewhat desperate #1389.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, let's do that in a separate PR. Thank you!
layers.go
Outdated
// calls to c/storage API. | ||
// | ||
// Users that need to handle concurrent mount/unmount attempts should not access this | ||
// field at all, and should only user the path returned by .Mount() (and that’s only |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/only users/only use/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (after the comments already pointed out)
I have tasked @wgahnagl with running some performance tests to see how this works. I can follow up here when we have results |
I've run the kubeBurner high-density stresstest against this PR, and it's maxing out at 500 pods, which is normal! 👍 |
@wgahnagl Thank you so much. |
@mtrmac Could you fix the comments and then we can merge and release a new containers/storage. |
1 similar comment
@mtrmac Could you fix the comments and then we can merge and release a new containers/storage. |
Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Instead of basing this on exclusivity loading via loadMut (which is AFAICS correct), use a very traditional read-write lock to protect the fields of containerStore. This is primarily so that all three stores use the same design. Also explicitly document how concurrent access to fields of containerStore is managed. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Instead of basing this on exclusivity loading via loadMut (which is AFAICS correct), use a very traditional read-write lock to protect the fields of imageStore. This is primarily so that all three stores use the same design. Also explicitly document how concurrent access to fields of imageStore is managed. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Instead of reading that value, releasing the mount lock, and then unmounting, provide a "conditional" unmount mode. And use that in the existing "loop unmounting" code. That's at least safer against concurrent processes unmounting the same layer. But the callers that try to "really unmount" the layer in a loop are still possibly racing against other processes trying to mount the layer in the meantime. I'm not quite sure that we need the "conditional" parameter as an explicit choice; it seems fairly likely that Umount() should just fail with ErrLayerNotMounted for all !force callers. I chose to use the flag to be conservative WRT possible unknown constraints. Similarly, it's not very clear to me that the unmount loops need to exist; maybe they should just be unmount(force=true, conditional=true). Signed-off-by: Miloslav Trmač <mitr@redhat.com>
We can't safely do that because the read-only callers don't allow us to write to layerStore state. Luckily, with the recent changes to Mounted, we don't really need to reload in those places. Also, fairly extensively document the locking design or implications for users. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This should be fixed, it just seems too hard to do without breaking API (and performance). So, just be clear about that to warn future readers. It's tracked in containers#1379 . Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Instead of basing this on exclusivity loading via loadMut (which was incorrect, because contrary to the original design, the r.layerspathModified check in r.Modified() could trigger during the lifetime of a read lock) use a very traditional read-write lock to protect the fields of imageStore. Also explicitly document how concurrent access to fields of imageStore is managed. Note that for the btrfs and zfs graph drivers, Diff() can trigger Mount() and unmount() in a way that violates the locking design. That's not fixed in this PR. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
In that case, we can just get read locks, confirm that nothing has changed, and continue; no need for any serialization on exclusively holding loadMut / inProcessLock. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
In that case, we can just get read locks, confirm that nothing has changed, and continue; no need for any serialization on exclusively holding loadMut / inProcessLock. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
In that case, we can just get read locks, confirm that nothing has changed, and continue; no need for any serialization on exclusively holding loadMut / inProcessLock. Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
LGTM |
EDIT: See the description in #1346 (comment) instead.
To see what fixing #1332 would be like. This is on top of, and includes, #1345.
The goal is to protect in-process readers against concurrent writes to memory by
Load
. Let’s discuss the theory of that, and the performance, in #1332 , this is an attempt to draft what that could look like.(Most of this PR could be merged, and might be useful, without actually changing the locking scheme — but without changing the locking scheme I don’t think we have a reason to do and review all of those code changes).
Right now, locking is open-coded all over the >3000 lines of
storage.store
:and adding extra code to each of these sites to deal with the in-process lock would be a tedious nightmare with too many opportunities to overlook something..
So, this PR proposes consolidating almost all of these repeated snippets into helpers that call closures:
layerReadAccess
andlayerWriteAccess
at the level of alayerStore
, and even higher-level (especially for reads)store.writeToLayerStore
andreadAllLayerStores[T](*store)
.The same function can now be:
which is < 2/3 of the original size.
At least for the read side, it seems to me that any variant without Go 1.18 generics would need 2 or 3 more tedious lines at every call site. That’s not a show-stopper, but let’s discuss whether we can do it this way. (A notable downside is that the Go 1.18 requirement might not allow us to backport the locking changes to any earlier version … but then with the size of the locking changes, more than almost all of
store.go
, we might not want to do that anyway?)After having the closures, we can now quite cheaply add
layerReadToken
andlayerWriteToken
types that serve as markers that the layer store’s methods are called with the correct lock held. That’s not nearly as good as a safe language-level support (making the lock-protected fields outright unreachable without holding a lock), but it’s something.This does not actually show handling of the harder cases, like
CreateContainer
, where we might need to lock several stores, and the like. I’m confident that can be done — we can always do locking manually instead of through the closures; but I didn’t want to spend time on this without having basic consensus on the approach at all.Also, there seem to be quite a few opportunities for small-scale optimization, typically to check for
ErrLayerUnknown
from a call instead of callingExists
before/after the main query. After a few false starts, this PR tries very hard not to touch that logic, because that would ballon the scope of this work to a completely unmanageable and impossible to review size.See the individual commit messages for details.
I’d very much appreciate
const
.)