Document and improve store locking #1438

mtrmac · 2022-11-22T19:20:53Z

Motivated by #1433 seeing unexpected layer store refreshes / existence of parallel layer stores for the same directory in a single process, this:

Attempts to document the concurrency design and rules for accessing store fields and store.graphLock. I’m not at all sure it’s correct, at least I want to establish the precedent that the design is expected to be documented
Consolidates the uses of store.graphLock to guard against Shutdown calls to store.startUsingGraphDriver / store.stopUsingGraphDriver helpers, instead of copy&pasted and code slowly deviating over time. This is similar to the CILStore.startWriting approach.
Fixes users that specifically rely on store.graphLock and can trigger a graph driver reload to actually use the reloaded graph driver for their layerStore operations, instead of using an obsolete layerStore (and graph driver) instance immediately after the reload.
Introduces a larger set of helpers to obtain the layerStore instances for the primary and additional stores, so that every operation only locks graphLock once instead of up to three times.
OTOH, removes helpers to obtain imageStore and containerStore instances; the users can just refer to a constant field, so they are modified to do that, removing many error checks and local variables
Fixes various individual instances of missing locking in store.

~~This is on top of unmerged #1436.~~ See individual commit messages for details.

@nalind @giuseppe PTAL, it’s quite likely I’m missing something about the current locking implementation

Cc: @alexlarsson : this might significantly reduce the number of locking operations on storage.lock. I have no idea if it is noticeably faster, but it’s probably a noticeable change to the strace at least, so it might be necessary to track this separately from other changes if you are quantifying other work.

This might not be entirely correct, but _some_ attempt to set rules must be better than nothing. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Don't modify store.graphDriverName on reloads, so that it is constant throughout the life of the store, and it can be accessed without locking, as it has actually been done (to this point incorrectly). That requires special-casing the initial load(); so, split the actual driver creation into a createGraphDriverLocked() method. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

At least containers#926 suggests that using timestamps "seems to fail", without elaborating further. At the very least, ModifiedSince is the more general check, because it can work across shared filesystems or on filesystems with bad timestamp granularity, it's about as costly (a single syscall, pread vs. fstat), and we can now completely eliminate tracking store.lastLoaded. The more common code shape will also help factor the common code out further. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

That way we don't lock store.graphLock twice, and we will have only one location that doesn't use the shared "lock and reload" utility for using store.graphLock. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... to avoid the repetitive checks for store.graphLock.ModifiedSince. store.getROLayerStores now checks for graphLock.ModifiedSince, we'll optimize that away again. store.Shutdown now checks for graphLock.ModifiedSince, that seems like a good idea anyway. Also attempt to document the purpose and rules of using graphLock; it's quite likely incomplete but at least it's a starting point. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

It's now the only caller, so: - inline it, eliminating an always-false graphDriver != nil check - only update graphLockLastWrite after we successfully reload. s.graphDriver is now never nil during the lifetime of store (but it is only safe to access with graphLock held.) Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... and use it in callers that already obtain the graphLock. This is both an optimization, and a correctness fix: if we need to reload the graph driver, we now don't do the operation on an obsolete layerStore instance. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac · 2022-12-01T15:25:46Z

Now ready for review.

Downstream tests:

mtrmac · 2022-12-02T14:36:25Z

Added one more commit to document the locking hierarchy within store.

rhatdan · 2022-12-02T18:45:24Z

@alexlarsson @nalind @vrothberg @giuseppe @flouthoc @umohnani8 PTAL

rhatdan · 2022-12-03T11:03:13Z

store.go

@@ -982,11 +982,8 @@ func (s *store) getLayerStore() (rwLayerStore, error) {

 // getROLayerStores obtains additional read/only layer store objects used by the


This comment is repeated I believe.

Thanks, fixed.

rhatdan

LGTM, One potential issue with duplicated comments.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... and use it to only lock graphLock once in store.allLayerStores. More improvements to come. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... and use it in store.Diff to only lock graphLock once, and to use a fresh layerStore object instead of an obsolete one. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

This allows to get all the layer stores with a single s.graphLock access. We do it even in store.CreateContainer, where the read-only layers are only necessarily conditionally, because getting the read-only layers is almost always a simple field access, so not having to lock graphLock again is almost certainly the improvement. Stand-alone store.getROLayerStores is now unused and has been removed. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

This was using the graphDriver field without locks, and the graph driver itself, while the implementation assumed exclusivity. Luckily all callers are actually holding the layer store lock for writing, so use that for exclusion. (layerStore already seems to extensively assume that locking the layer store for writing guarantees exclusive access to the graph driver, and because we always recreate a layer store after recreating the graph driver, that is true in practice.) Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... to add an ...UseGetters suffix, to warn against direct use. This is not currently necessary, but we will encourage direct use of the container and image store fields, so the asymmetry vs. layer store objects needs to be warned against. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

store.containerStore is always initilialized, and never nil, after the store is initialized, so store.getContainerStore() amounts to a single filed access. So just do that, and eliminate the error path and the local variables all over; just use s.containerStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

The lambda can access s.containerStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

store.imageStore and store.roImageStores are always initialized, and never nil, after the store is initialized, so store.getROImageStores() amounts to a single field access. So just do that, and eliminate the error path and the local variables all over; just use s.roImageStores directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

store.imageStore is always initialized, and never nil, after the store is initialized, so store.getImageStore() amounts to a single field access. So just do that, and eliminate the error path and the local variables all over; just use s.imageStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

The lambda can access s.imageStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

It is now always nil, so simplify the callers. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Just access s.imageStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

rhatdan · 2022-12-05T18:24:28Z

@alexlarsson @nalind @vrothberg @giuseppe @flouthoc @umohnani8 PTAL

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

vrothberg

LGTM
Nice work!

giuseppe

LGTM

mtrmac mentioned this pull request Nov 22, 2022

In-process exclusion of readers and writers #1346

Merged

mtrmac changed the title ~~WIP: document and improve store locking~~ Document and improve store locking Nov 25, 2022

mtrmac added 8 commits December 1, 2022 16:00

Annotate store fields with the assumed concurrency rules

8a7f427

This might not be entirely correct, but _some_ attempt to set rules must be better than nothing. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Move initialization of store.graphLockLastWrite to store.load

d80b028

That way we don't lock store.graphLock twice, and we will have only one location that doesn't use the shared "lock and reload" utility for using store.graphLock. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Fix unlocked access to s.graphDriver in LookupAdditionalLayer

1c6ed02

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac force-pushed the graphLock-research branch from 22bd607 to 26d820c Compare December 1, 2022 15:01

mtrmac added a commit to mtrmac/buildah that referenced this pull request Dec 1, 2022

DO NOT MERGE: Test containers/storage#1438

057c049

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac added a commit to mtrmac/libpod that referenced this pull request Dec 1, 2022

DO NOT MERGE: Test containers/storage#1438

f905549

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac mentioned this pull request Dec 2, 2022

Podman hangs, possible deadlock ? containers/podman#16062

Closed

rhatdan marked this pull request as ready for review December 2, 2022 18:44

rhatdan reviewed Dec 3, 2022

View reviewed changes

rhatdan approved these changes Dec 3, 2022

View reviewed changes

mtrmac added a commit to mtrmac/buildah that referenced this pull request Dec 5, 2022

DO NOT MERGE: Test containers/storage#1438

a76032b

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac added 9 commits December 5, 2022 18:50

Split store.getROLayerStoresLocked from store.getROLayerStores

f68d450

... and use it to only lock graphLock once in store.allLayerStores. More improvements to come. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Split store.allLayerStoresLocked from store.allLayerStores

489efa7

... and use it in store.Diff to only lock graphLock once, and to use a fresh layerStore object instead of an obsolete one. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Eliminate the parameter to the lambda passed to writeToContainerStore

5f54ee0

The lambda can access s.containerStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Eliminate the parameter to the lambda passed to s.writeToAllStores

e9d2b13

The lambda can access s.containerStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac added 6 commits December 5, 2022 18:50

Eliminate the parameter to the lambda passed to s.writeToImageStore

6aad828

The lambda can access s.imageStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Eliminate the parameter to the lambda passed to s.writeToAllStores

4e9fffb

The lambda can access s.imageStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Remove the error return value from store.allImageStores

365d848

It is now always nil, so simplify the callers. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Remove the primaryImageStore parameter of imageTopLayerForMapping

bf13cef

Just access s.imageStore directly. Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Document the locking hierarchy of the various store locks

273e81c

Should not change behavior. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac force-pushed the graphLock-research branch from 7071fdc to 273e81c Compare December 5, 2022 17:51

mtrmac added a commit to mtrmac/libpod that referenced this pull request Dec 5, 2022

DO NOT MERGE: Test containers/storage#1438

f3c89fe

Signed-off-by: Miloslav Trmač <mitr@redhat.com>

vrothberg approved these changes Dec 6, 2022

View reviewed changes

giuseppe approved these changes Dec 6, 2022

View reviewed changes

rhatdan merged commit bfa6cfa into containers:main Dec 6, 2022

mtrmac deleted the graphLock-research branch December 6, 2022 19:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document and improve store locking #1438

Document and improve store locking #1438

mtrmac commented Nov 22, 2022 •

edited

Loading

mtrmac commented Dec 1, 2022

mtrmac commented Dec 2, 2022

rhatdan commented Dec 2, 2022

rhatdan Dec 3, 2022

mtrmac Dec 5, 2022

rhatdan left a comment

rhatdan commented Dec 5, 2022

vrothberg left a comment

giuseppe left a comment

		@@ -982,11 +982,8 @@ func (s *store) getLayerStore() (rwLayerStore, error) {

		// getROLayerStores obtains additional read/only layer store objects used by the

Document and improve store locking #1438

Document and improve store locking #1438

Conversation

mtrmac commented Nov 22, 2022 • edited Loading

mtrmac commented Dec 1, 2022

mtrmac commented Dec 2, 2022

rhatdan commented Dec 2, 2022

rhatdan Dec 3, 2022

Choose a reason for hiding this comment

mtrmac Dec 5, 2022

Choose a reason for hiding this comment

rhatdan left a comment

Choose a reason for hiding this comment

rhatdan commented Dec 5, 2022

vrothberg left a comment

Choose a reason for hiding this comment

giuseppe left a comment

Choose a reason for hiding this comment

mtrmac commented Nov 22, 2022 •

edited

Loading