-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ostree repo commit: Speed up composing trees with --tree=ref
#1643
Conversation
Can one of the admins verify this patch? |
src/libostree/ostree-mutable-tree.c
Outdated
} | ||
|
||
/** | ||
* ostree_mutable_tree_fill_empty_from_dirtree: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a better name would be ostree_mutable_tree_fast_merge
? It's more intention revealing. Or perhaps this should be moved into a -private.h
header and not be made part of the public API?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would agree with that (keeping it private for now).
bot, add author to whitelist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! Let's start by splitting out the first three commits in a prep PR? I did a quick review for those already.
tests/basic-test.sh
Outdated
$OSTREE commit ${COMMIT_ARGS} -b test3-ref-ref -s "A" --tree=ref=test3-C1 --tree=ref=test3-D | ||
$OSTREE commit ${COMMIT_ARGS} -b test3-dir-ref -s "A" --tree=dir=tree-C --tree=ref=test3-D | ||
$OSTREE commit ${COMMIT_ARGS} -b test3-ref-dir -s "A" --tree=ref=test3-C1 --tree=dir=tree-D | ||
$OSTREE commit ${COMMIT_ARGS} -b test3-dir-dir -s "A" --tree=dir=tree-C --tree=dir=tree-D |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor/optional: the -s
switch is optional nowadays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in #1645
tests/basic-test.sh
Outdated
@@ -450,16 +450,20 @@ echo sudoers2 >tree-C/etc/sudoers | |||
$OSTREE commit ${COMMIT_ARGS} -b test3-C2 -s "C" --tree=dir=tree-C |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this whole hunk be part of the previous commit where we added this test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, fixed in #1645
src/libostree/ostree-mutable-tree.c
Outdated
/* This is the checksum of the Dirtree object that corresponds to the current | ||
* contents of this directory. contents_checksum can be NULL if the SHA was | ||
* never calculated or contents of the mtree has been modified. Even if | ||
* contents_checksum is not NULL it may be out of date. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even if contents_checksum is not NULL it may be out of date.
Is that true though? We invalidate it before it gets out of date, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, see the comment in ostree_mutable_tree_get_contents_checksum
:
ostree/src/libostree/ostree-mutable-tree.c
Lines 120 to 135 in e120a6b
/* Ensure the cache is valid; this implementation is a bit | |
* lame in that we walk the whole tree every time this | |
* getter is called; a better approach would be to invalidate | |
* all of the parents whenever a child is modified. | |
* | |
* However, we only call this function once right now. | |
*/ | |
GLNX_HASH_TABLE_FOREACH_V (self->subdirs, OstreeMutableTree*, subdir) | |
{ | |
if (!ostree_mutable_tree_get_contents_checksum (subdir)) | |
{ | |
g_free (self->contents_checksum); | |
self->contents_checksum = NULL; | |
return NULL; | |
} | |
} |
We do invalidate it if this mtree is modified, but not if subdirectories of this mtree are modified.
In state == LAZY
it is always accurate.
I've been doing some more fiddling with this and I think there is a bug in here still. I'm trying to create a minimal testcase that reproduces the symptom. |
See #1569 - I just did a bit of digging and added a commit link there. |
I've figured out what the bug is: |
☔ The latest upstream changes (presumably 5d031ae) made this pull request unmergeable. Please resolve the merge conflicts. |
src/libostree/ostree-mutable-tree.c
Outdated
self->state = MTREE_STATE_LAZY; | ||
g_set_object (&self->repo, repo); | ||
set_string (&self->contents_checksum, contents_checksum); | ||
set_string (&self->metadata_checksum, metadata_checksum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I need to invalidate the parent checksum here. Or have a LAZY tree that (publically) has a NULL checksum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - this was the issue. I've fixed it now.
61380e2
to
a8f24d0
Compare
Ok, tests are passing now, including the new test I wrote for the bug I found in my previous implementation. The diff is a little more complex now that I've had to add the parent pointer, but I think it all makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's do the same thing again and split the first two commits (1c1be48
and 3c091e4
) into a separate PR? Those are 👍 to me (with a few comments).
The lazy stuff... it looks sane at a high-level, though I'd like to play around some more to fully contextualize it.
src/libostree/ostree-mutable-tree.c
Outdated
* a child from within this file (see insert_child_mtree). We ensure that the | ||
* parent pointer is either valid or NULL because when the parent is destroyed | ||
* it sets parent = NULL on all its children (see remove_child_mtree) */ | ||
OstreeMutableTree *parent; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds like it might be a good fit for weak refs: https://developer.gnome.org/gobject/stable/gobject-The-Base-Object-Type.html#g-object-add-weak-pointer. So then your hash table value free function can just be g_object_unref
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did consider this. I figured it's not worth the overhead both in the source and at runtime. The difference would be:
- Atomic instructions would be used for the weak refcnt. It isn't threadsafe to use different
OstreeMutableTree
instances that are part of the same hierarchy from different threads anyway as we have no synchronisation aroundcontents_checksum
. We could add more synchronisation here but it hardly seems worth it. - The (immediate) parents would not be freed until the children had been destroyed: the weak pointer still needs to point to somewhere to check the weak refcnt (assuming it's similar to C++'s
shared_ptr
withmake_shared
)
They're minor points but I think the current implementation is nicer and we avoid a bunch of error handling for codepaths that will never be taken. I'm not wed to the current implementation so I can change it to use weak_refs if you'd like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a quick comment, I haven't fully reviewed this patch yet.
Yeah, GLib's weak refs use a global mutex. I don't know if that really matters here (probably not) - bigger perf wins would probably be having a more efficient string-tree structure.
Having non-owning pointers for trees is pretty standard I believe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just looked at Rust's BTree for example, and there the parent pointer is a raw pointer. It kind of has to be in Rust of course, otherwise it'd be cyclic.
src/libostree/ostree-mutable-tree.c
Outdated
break; | ||
|
||
g_free (self->contents_checksum); | ||
self->contents_checksum = NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two can be g_clear_pointer (&self->contents_checksum, g_free)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed (on #1655).
@@ -303,9 +343,10 @@ ostree_mutable_tree_ensure_parent_dirs (OstreeMutableTree *self, | |||
next = g_hash_table_lookup (subdir->subdirs, name); | |||
if (!next) | |||
{ | |||
invalidate_contents_checksum (subdir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be cleaner to just put this call in insert_child_mtree
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't do that because we call insert_child_mtree
in make_whole
where we're not invalidating any checksums.
src/libostree/ostree-mutable-tree.c
Outdated
if (g_strcmp0 (checksum, self->metadata_checksum) == 0) | ||
return; | ||
|
||
invalidate_contents_checksum (self->parent); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice fix!
src/libostree/ostree-mutable-tree.c
Outdated
/* We're not empty - can't convert to a LAZY tree */ | ||
return FALSE; | ||
default: | ||
g_return_val_if_reached (FALSE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually just do g_assert_not_reached ()
for these.
src/libostree/ostree-mutable-tree.c
Outdated
} | ||
|
||
/** | ||
* ostree_mutable_tree_fill_empty_from_dirtree: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I would agree with that (keeping it private for now).
☔ The latest upstream changes (presumably 488365f) made this pull request unmergeable. Please resolve the merge conflicts. |
Running `ostree commit --tree=ref=a --tree=dir=b` involves reading the whole of a into an `OstreeMutableTree` before composing `b` on top. This is inefficient if a is a complete rootfs and b is just touching one file. We process O(size of a + size of b) directories rather than O(number of touched dirs). This commit makes `ostree commit` more efficient at composing multiple directories together. With `ostree_mutable_tree_fill_empty_from_dirtree` we create a lazy `OstreeMutableTree` which only reads the underlying information from disk when needed. We don't need to read all the subdirectories just to get the checksum of a tree already checked into the repo. This provides great speedups when composing a rootfs out of multiple other rootfs as we do in our build system. We compose multiple containers together with: ostree commit --tree=ref=base-rootfs --tree=ref=container1 --tree=ref=container2 and it is much faster now. As a test I ran time ostree --repo=... commit --orphan --tree=ref=big-rootfs --tree=dir=modified_etc Where modified_etc contained a modified sudoers file under /etc. I used `strace` to count syscalls and I seperatly took timing measurements. To test with a cold cache I ran sync && echo 3 | sudo tee /proc/sys/vm/drop_caches Results: | | Before | After | | -------------------- | ------ | ----- | | Time (cold cache) | 8.1s | 0.12s | | Time (warm cache) | 3.7s | 0.08s | | `openat` calls | 53589 | 246 | | `fgetxattr` calls | 78916 | 0 | I'm not sure if this will have some negative interaction with the `_ostree_repo_commit_modifier_apply` which is short-circuited here. I think it was disabled for `--tree=ref=` anyway, but I'm not certain. All the tests pass anyway. I originally implemented this in terms of the `OstreeRepoFile` APIs, but it was *way* less efficient, opening and reading many files unnecessarily.
I've rebased (and squashed) this since #1655 was merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave this a quick scan. The high level idea makes sense.
Can you add some unit tests for this patch too? Or is that too hard to do without lowering the higher level usage too? If that's the case I'd probably prefer having the changes as a single PR.
/* ======== Valid for state LAZY: =========== */ | ||
|
||
/* The repo so we can look up the checksums. */ | ||
OstreeRepo *repo; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's going to result in a lot of refs on the repo. 🤔(In Rust we'd just have the mutable tree have a lifetime bounded by that of the repo)
But eh, it's fine for now, if we ever profile and find this is an issue we can perhaps just have the ref be maintained by the toplevel mutable tree, and children chase up to the root.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's going to result in a lot of refs on the repo.
Not too many I think. There's only one per LAZY OstreeMutableTree
. This means that it's only leaf directories that have refs, which initially is only 1 directory (the root).
I agree that it's a shame to churn the repo refcount, but it's unlikely to be worth fixing.
|
||
/* _ostree_mutable_tree_make_whole can fail if state == MTREE_STATE_LAZY, but | ||
* we have getters that preceed the existence of MTREE_STATE_LAZY which can't | ||
* return errors. So instead this function will fail and print a warning. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. One approach I've taken in other places for things like this is to cache the error, changing the object into an "error state". Have all API functions be "noops", and return the error at the next opportunity.
We might also be able to add a new API to check for and consume that error?
Not a hard requirement right now though.
What did you have in mind? I added some unit tests for this functionality in 36f7a6d which was already merged as part of #1645. I suppose I could write some tests in C which explicitly call |
Let's back up a second here; in this scenario you mention in the commit message:
Are the trees disjoint? For example If I'm understanding the optimization here, it's really predicated on that. Now...I'm staring at that test case code and only sort of understanding it. It looks like it's more testing combinations of |
(Also I pushed a fixup for the error handling, let me know what you think) |
To rephrase some of the musings from #1569 - in your use case (as well as the promotion case), it feels like we basically want to make it a fatal error if we're not hitting the fast path right? I completely get that we need to support mixing How about something like this:
The args to
One (IMO) nicer thing about this is that the path prefix is specified at commit time, so one can have the And further it would be a fatal error if the refs overlapped at all.
And if you didn't want to change the "schema" for your existing containers refs, that'd just be a matter of using |
That's right, this is what our use-case looks like.
The optimisation is more general than this and not limited to this use-case. For example: lets say you have a rootfs and want to add dash to it. Maybe the dash package looks like:
so then we do `ostree commit --tree=ref=rootfs --tree=ref=dash With the new implementation we only need to read the following dirtrees:
to produce a complete merged rootfs. That's 14 dirtree files. This is optimal taking advantage of the fact that ostree is implemented in terms of trees. On my laptop I have 48417 directories under
There is another orthogonal way in which this implementation is faster too. I discovered using
The intention was to test this code by comparing the results of using
I don't think so. I consider this optimisation more general than that.
This is similar to my implementation in #1651, but with a different API. With #1651 this would look like:
I had considered the The other issue with the The
When building our trees we have a lot of intermediate trees. So out build process looks like
With all my branches merged this would look like:
I've considered enhancing our buildsystem to do the merging and moving in one step, but it's not important if ostree is blazing fast at this anyway.
I don't see the advantage of the fatal error as compared to just being as fast as we can be in any given situation. |
(That's a bug/missed-optimization in
That makes sense to me! OK, I'm convinced by that. That said, I think the points I raised about not having the commit modifiers work are still valid; I have a vague worry that someone is relying on e.g. Thanks again for this really nice patch! |
☀️ Test successful - status-atomicjenkins |
I want to add some more testing of OS updates to https://github.com/openshift/machine-config-operator/ and it's not really necessary to do a full e2e build - just adding a dummy file to the oscontainer is going to be sufficient to start. Related to this, requiring privileges (like `compose tree`) does would be a huge ergonomic hit. This script aims to improve the code that landed for overriding the kubelet in Origin: https://github.com/smarterclayton/origin/blob/4de957b019aee56931b1a29af148cf64865a969b/images/os/Dockerfile Once this lands I'll do a PR to origin to change it to use coreos-assembler. (though that image is enormous, maybe we want explicit layering for things that don't need image building, also we really need to cut out Anaconda) Side note: while writing this I ended up rediscovering ostreedev/ostree#1643 It was less than a year ago but that already feels like forever...
I want to add some more testing of OS updates to https://github.com/openshift/machine-config-operator/ and it's not really necessary to do a full e2e build - just adding a dummy file to the oscontainer is going to be sufficient to start. Related to this, requiring privileges (like `compose tree`) does would be a huge ergonomic hit. This script aims to improve the code that landed for overriding the kubelet in Origin: https://github.com/smarterclayton/origin/blob/4de957b019aee56931b1a29af148cf64865a969b/images/os/Dockerfile Once this lands I'll do a PR to origin to change it to use coreos-assembler. (though that image is enormous, maybe we want explicit layering for things that don't need image building, also we really need to cut out Anaconda) Side note: while writing this I ended up rediscovering ostreedev/ostree#1643 It was less than a year ago but that already feels like forever...
I want to add some more testing of OS updates to https://github.com/openshift/machine-config-operator/ and it's not really necessary to do a full e2e build - just adding a dummy file to the oscontainer is going to be sufficient to start. Related to this, requiring privileges (like `compose tree`) does would be a huge ergonomic hit. This script aims to improve the code that landed for overriding the kubelet in Origin: https://github.com/smarterclayton/origin/blob/4de957b019aee56931b1a29af148cf64865a969b/images/os/Dockerfile Once this lands I'll do a PR to origin to change it to use coreos-assembler. (though that image is enormous, maybe we want explicit layering for things that don't need image building, also we really need to cut out Anaconda) Side note: while writing this I ended up rediscovering ostreedev/ostree#1643 It was less than a year ago but that already feels like forever...
I want to add some more testing of OS updates to https://github.com/openshift/machine-config-operator/ and it's not really necessary to do a full e2e build - just adding a dummy file to the oscontainer is going to be sufficient to start. Related to this, requiring privileges (like `compose tree`) does would be a huge ergonomic hit. This script aims to improve the code that landed for overriding the kubelet in Origin: https://github.com/smarterclayton/origin/blob/4de957b019aee56931b1a29af148cf64865a969b/images/os/Dockerfile Once this lands I'll do a PR to origin to change it to use coreos-assembler. (though that image is enormous, maybe we want explicit layering for things that don't need image building, also we really need to cut out Anaconda) Side note: while writing this I ended up rediscovering ostreedev/ostree#1643 It was less than a year ago but that already feels like forever...
I was trying to followup the `--selinux-policy-from-base` work to add a `cosa build --fast=overlay` for coreos-assembler, but hit on the fact that using e.g. `--owner-uid` disables commit optimizations. A while ago, ostreedev#1643 landed which optimized this for the case where no modifications are provided. But, we really need the SELinux policy bits, and it's super convenient to run `ostree commit` as non-root. It's fairly surprising actually that it's taken us so long to iterate on a good interface for this "commit changes on top of a base" model. In practice, many nontrivial cases really end up needing to do a (hardlink) checkout, and that case is optimized. But for this coreos-assembler work I want to directly overlay onto a commit object another commit object. That previous PR above added exactly the API we need, so let's expose it in the CLI. What you can see happening in the test is that we provide `--owner-uid 42`, but that only applies to directories/files that were added in the commit. And now that I look at this, I think what we really want here is to avoid changing directories that exist in the base, but eh; in practice the main use here is for `--owner-uid 0` while committing as non-root; and that works fine with this since the baseline uid will be zero as well.
I was trying to followup the `--selinux-policy-from-base` work to add a `cosa build --fast=overlay` for coreos-assembler, but hit on the fact that using e.g. `--owner-uid` disables commit optimizations. A while ago, ostreedev#1643 landed which optimized this for the case where no modifications are provided. But, we really need the SELinux policy bits, and it's super convenient to run `ostree commit` as non-root. It's fairly surprising actually that it's taken us so long to iterate on a good interface for this "commit changes on top of a base" model. In practice, many nontrivial cases really end up needing to do a (hardlink) checkout, and that case is optimized. But for this coreos-assembler work I want to directly overlay onto a commit object another commit object. That previous PR above added exactly the API we need, so let's expose it in the CLI. What you can see happening in the test is that we provide `--owner-uid 42`, but that only applies to directories/files that were added in the commit. And now that I look at this, I think what we really want here is to avoid changing directories that exist in the base, but eh; in practice the main use here is for `--owner-uid 0` while committing as non-root; and that works fine with this since the baseline uid will be zero as well.
I was trying to followup the `--selinux-policy-from-base` work to add a `cosa build --fast=overlay` for coreos-assembler, but hit on the fact that using e.g. `--owner-uid` disables commit optimizations. A while ago, ostreedev#1643 landed which optimized this for the case where no modifications are provided. But, we really need the SELinux policy bits, and it's super convenient to run `ostree commit` as non-root. It's fairly surprising actually that it's taken us so long to iterate on a good interface for this "commit changes on top of a base" model. In practice, many nontrivial cases really end up needing to do a (hardlink) checkout, and that case is optimized. But for this coreos-assembler work I want to directly overlay onto a commit object another commit object. That previous PR above added exactly the API we need, so let's expose it in the CLI. What you can see happening in the test is that we provide `--owner-uid 42`, but that only applies to directories/files that were added in the commit. And now that I look at this, I think what we really want here is to avoid changing directories that exist in the base, but eh; in practice the main use here is for `--owner-uid 0` while committing as non-root; and that works fine with this since the baseline uid will be zero as well.
I was trying to followup the `--selinux-policy-from-base` work to add a `cosa build --fast=overlay` for coreos-assembler, but hit on the fact that using e.g. `--owner-uid` disables commit optimizations. A while ago, ostreedev#1643 landed which optimized this for the case where no modifications are provided. But, we really need the SELinux policy bits, and it's super convenient to run `ostree commit` as non-root. It's fairly surprising actually that it's taken us so long to iterate on a good interface for this "commit changes on top of a base" model. In practice, many nontrivial cases really end up needing to do a (hardlink) checkout, and that case is optimized. But for this coreos-assembler work I want to directly overlay onto a commit object another commit object. That previous PR above added exactly the API we need, so let's expose it in the CLI. What you can see happening in the test is that we provide `--owner-uid 42`, but that only applies to directories/files that were added in the commit. And now that I look at this, I think what we really want here is to avoid changing directories that exist in the base, but eh; in practice the main use here is for `--owner-uid 0` while committing as non-root; and that works fine with this since the baseline uid will be zero as well.
TODO
Description
Running
ostree commit --tree=ref=a --tree=dir=b
involves reading thewhole of a into an
OstreeMutableTree
before composingb
on top. Thisis inefficient if a is a complete rootfs and b is just touching one file.
We process O(size of a + size of b) directories rather than
O(number of touched dirs).
This commit makes
ostree commit
more efficient at composing multipledirectories together. With
ostree_mutable_tree_fill_empty_from_dirtree
we create a lazy
OstreeMutableTree
which only reads the underlyinginformation from disk when needed. We don't need to read all the
subdirectories just to get the checksum of a tree already checked into the
repo.
This provides great speedups when composing a rootfs out of multiple other
rootfs as we do in our build system. We compose multiple containers
together with:
and it is much faster now.
As a test I ran
Where modified_etc contained a modified sudoers file under /etc. I used
strace
to count syscalls and I seperatly took timing measurements. Totest with a cold cache I ran
Results:
openat
callsfgetxattr
callsI'm not sure if this will have some negative interaction with the
_ostree_repo_commit_modifier_apply
which is short-circuited here. Ithink it was disabled for
--tree=ref=
anyway, but I'm not certain. Allthe tests pass anyway.
I originally implemented this in terms of the
OstreeRepoFile
APIs, butit was way less efficient, opening and reading many files unnecessarily.