Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document and improve store locking #1438

Merged
merged 23 commits into from
Dec 6, 2022
Merged

Conversation

mtrmac
Copy link
Collaborator

@mtrmac mtrmac commented Nov 22, 2022

Motivated by #1433 seeing unexpected layer store refreshes / existence of parallel layer stores for the same directory in a single process, this:

  • Attempts to document the concurrency design and rules for accessing store fields and store.graphLock. I’m not at all sure it’s correct, at least I want to establish the precedent that the design is expected to be documented
  • Consolidates the uses of store.graphLock to guard against Shutdown calls to store.startUsingGraphDriver / store.stopUsingGraphDriver helpers, instead of copy&pasted and code slowly deviating over time. This is similar to the CILStore.startWriting approach.
  • Fixes users that specifically rely on store.graphLock and can trigger a graph driver reload to actually use the reloaded graph driver for their layerStore operations, instead of using an obsolete layerStore (and graph driver) instance immediately after the reload.
  • Introduces a larger set of helpers to obtain the layerStore instances for the primary and additional stores, so that every operation only locks graphLock once instead of up to three times.
  • OTOH, removes helpers to obtain imageStore and containerStore instances; the users can just refer to a constant field, so they are modified to do that, removing many error checks and local variables
  • Fixes various individual instances of missing locking in store.

This is on top of unmerged #1436. See individual commit messages for details.

@nalind @giuseppe PTAL, it’s quite likely I’m missing something about the current locking implementation

Cc: @alexlarsson : this might significantly reduce the number of locking operations on storage.lock. I have no idea if it is noticeably faster, but it’s probably a noticeable change to the strace at least, so it might be necessary to track this separately from other changes if you are quantifying other work.

@mtrmac mtrmac changed the title WIP: document and improve store locking Document and improve store locking Nov 25, 2022
This might not be entirely correct, but _some_ attempt to set rules
must be better than nothing.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Don't modify store.graphDriverName on reloads, so that it is constant
throughout the life of the store, and it can be accessed without locking,
as it has actually been done (to this point incorrectly).

That requires special-casing the initial load(); so, split
the actual driver creation into a createGraphDriverLocked() method.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
At least containers#926 suggests that
using timestamps "seems to fail", without elaborating further.

At the very least, ModifiedSince is the more general check, because it
can work across shared filesystems or on filesystems with bad timestamp
granularity, it's about as costly (a single syscall, pread vs. fstat),
and we can now completely eliminate tracking store.lastLoaded.

The more common code shape will also help factor the common code out further.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
That way we don't lock store.graphLock twice, and we will have
only one location that doesn't use the shared "lock and reload"
utility for using store.graphLock.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... to avoid the repetitive checks for store.graphLock.ModifiedSince.

store.getROLayerStores now checks for graphLock.ModifiedSince, we'll optimize
that away again.

store.Shutdown now checks for graphLock.ModifiedSince, that seems like
a good idea anyway.

Also attempt to document the purpose and rules of using graphLock;
it's quite likely incomplete but at least it's a starting point.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
It's now the only caller, so:
- inline it, eliminating an always-false graphDriver != nil check
- only update graphLockLastWrite after we successfully reload.
  s.graphDriver is now never nil during the lifetime of store
  (but it is only safe to access with graphLock held.)

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... and use it in callers that already obtain the graphLock.

This is both an optimization, and a correctness fix: if we need to
reload the graph driver, we now don't do the operation on an obsolete
layerStore instance.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
mtrmac added a commit to mtrmac/buildah that referenced this pull request Dec 1, 2022
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
mtrmac added a commit to mtrmac/libpod that referenced this pull request Dec 1, 2022
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@mtrmac
Copy link
Collaborator Author

mtrmac commented Dec 1, 2022

@mtrmac
Copy link
Collaborator Author

mtrmac commented Dec 2, 2022

Added one more commit to document the locking hierarchy within store.

@rhatdan rhatdan marked this pull request as ready for review December 2, 2022 18:44
@rhatdan
Copy link
Member

rhatdan commented Dec 2, 2022

store.go Outdated
@@ -982,11 +982,8 @@ func (s *store) getLayerStore() (rwLayerStore, error) {

// getROLayerStores obtains additional read/only layer store objects used by the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is repeated I believe.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed.

Copy link
Member

@rhatdan rhatdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, One potential issue with duplicated comments.

mtrmac added a commit to mtrmac/buildah that referenced this pull request Dec 5, 2022
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... and use it to only lock graphLock once in store.allLayerStores.

More improvements to come.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... and use it in store.Diff to only lock graphLock once,
and to use a fresh layerStore object instead of an obsolete one.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This allows to get all the layer stores with a single s.graphLock access.

We do it even in store.CreateContainer, where the read-only layers
are only necessarily conditionally, because getting the read-only
layers is almost always a simple field access, so not having to lock
graphLock again is almost certainly the improvement.

Stand-alone store.getROLayerStores is now unused and has been removed.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
This was using the graphDriver field without locks, and the graph driver itself,
while the implementation assumed exclusivity.

Luckily all callers are actually holding the layer store lock for writing, so
use that for exclusion. (layerStore already seems to extensively assume
that locking the layer store for writing guarantees exclusive access to the graph driver,
and because we always recreate a layer store after recreating the graph driver,
that is true in practice.)

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
... to add an ...UseGetters suffix, to warn against direct use.

This is not currently necessary, but we will encourage direct
use of the container and image store fields, so the asymmetry vs.
layer store objects needs to be warned against.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
store.containerStore is always initilialized, and never nil,
after the store is initialized, so store.getContainerStore()
amounts to a single filed access.

So just do that, and eliminate the error path and the local variables
all over; just use s.containerStore directly.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The lambda can access s.containerStore directly.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The lambda can access s.containerStore directly.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
store.imageStore and store.roImageStores are always initialized, and never nil,
after the store is initialized, so store.getROImageStores()
amounts to a single field access.

So just do that, and eliminate the error path and the local variables
all over; just use s.roImageStores directly.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
store.imageStore is always initialized, and never nil,
after the store is initialized, so store.getImageStore()
amounts to a single field access.

So just do that, and eliminate the error path and the local variables
all over; just use s.imageStore directly.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The lambda can access s.imageStore directly.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
The lambda can access s.imageStore directly.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
It is now always nil, so simplify the callers.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Just access s.imageStore directly.

Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Should not change behavior.

Signed-off-by: Miloslav Trmač <mitr@redhat.com>
@rhatdan
Copy link
Member

rhatdan commented Dec 5, 2022

mtrmac added a commit to mtrmac/libpod that referenced this pull request Dec 5, 2022
Signed-off-by: Miloslav Trmač <mitr@redhat.com>
Copy link
Member

@vrothberg vrothberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Nice work!

Copy link
Member

@giuseppe giuseppe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rhatdan rhatdan merged commit bfa6cfa into containers:main Dec 6, 2022
@mtrmac mtrmac deleted the graphLock-research branch December 6, 2022 19:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants