Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support the extension for worktree-specific config #6202

Closed

Conversation

vermiculus
Copy link
Contributor

@vermiculus vermiculus commented Feb 6, 2022

Fix #6044.

Tasks:

Open questions:

  • How does this interact (if at all) with git.git:enum config_scope?
  • Are there any places in the code that are assuming these values don't change?
  • What other areas of the code need to be updated to adjust?
  • Does this break ABI? Yes, but evidently there's no way to avoid this.
  • (meta) What subsystem even is this? (for the commit subject prefix)

@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch 4 times, most recently from 7d84792 to a20800c Compare February 6, 2022 15:36
src/repository.c Outdated Show resolved Hide resolved
@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch from 39d036e to f4cc52c Compare February 6, 2022 16:06
@vermiculus vermiculus changed the title WIP: Support the extension for worktree-specific config Support the extension for worktree-specific config Feb 6, 2022
@vermiculus
Copy link
Contributor Author

I've realized while writing up a corresponding PR to rust-lang/git2-rs that inserting a value here could have a detrimental impact on compatibility with existing bindings.

Does this need to wait for a version break of some sort? Or do I need to come up with an alternate implementation of git_config_get_* to be able to use some sort of explicit 'this level overrides this other level' definition?

If we want to keep the existing int-comparison based overrides, we should probably take the opportunity to insert some wiggle room by making these values multiples of ten instead.

@vermiculus
Copy link
Contributor Author

Pinging on this. What's the next step here? Does this patch break ABI? Is there anything we can actually do about that?

@ethomson
Copy link
Member

Thanks for the ping @vermiculus - I'm happy to review this, I should be able to take some time today.

You're correct that it does break ABI, and no, I don't think that there's anything that we can do about this. If this is the correct ordering, then we'll give people a heads up in the next release but I'm afraid that there's little else that we can do here.

@vermiculus
Copy link
Contributor Author

Thanks for taking the time 😃

If there's no way to avoid breaking ABI, will it be just as much work for consumers if we change all the enum values to allow for future inserts?

GIT_CONFIG_LEVEL_PROGRAMDATA = 10,
GIT_CONFIG_LEVEL_SYSTEM = 20,
// etc.

I can't see why or how we'd ever have another value (even at a conceptual level), but I imagine the same was said before git-worktree came along.

@vermiculus
Copy link
Contributor Author

Coming back around to this. What's the next step here?

Expecting I'll need to rebase to resolve the new conflicts, that I don't expect those conflicts to be terribly difficult to resolve.

@ethomson
Copy link
Member

ethomson commented Mar 1, 2022

Yep, sorry for the conflicts. 😢 If you rebase, I think that it's ready to go.

@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch from f4cc52c to 7809dbb Compare March 3, 2022 10:46
@vermiculus
Copy link
Contributor Author

vermiculus commented Mar 3, 2022

No worries! Conflicts are a sign of active development 😉 (edit: also, holy cow is #6133 exciting! Definitely agree dogfooding is going to be a boon here -- especially for applications that might want to link to a porcelain like https://github.com/magit/libegit2.)

Rebased; looks like the conflicts were resolved automatically.

This is ready to roll from my end assuming you're good with #6202 (comment) and the remaining open questions in the top post.

@vermiculus
Copy link
Contributor Author

Pinging on this again 😄

@bierbaum
Copy link
Contributor

While testing this, I found the following bug: #6246

After that patch is merged (or integrated here) libgit2 is able to open repos with extensions.worktreeconfig set.

@vermiculus
Copy link
Contributor Author

vermiculus commented Mar 18, 2022

Thanks for testing and finding that issue, @bierbaum 😄 I'll happily rebase once your PR is merged.

Otherwise I could rebase on it now and both can merge at the same time -- but it does feel like your fix can stand alone. Depends on what @ethomson would like to do here.

@ethomson
Copy link
Member

Sorry for the delay, @vermiculus. And thanks, again.

@ethomson
Copy link
Member

Hmm. Not sure what’s going on with the tests. It’s possible that this was a problem caused by #6252 but I’ll have to take a closer look tomorrow.

@ethomson
Copy link
Member

ethomson commented Apr 4, 2022

🤔 Doesn't look like it's related to #6252; these test failures look legitimate. I'm afraid that I will need to refamiliarize myself with the config code to offer any suggestions for how to proceed.

@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch from 7809dbb to 8bda4a6 Compare April 9, 2022 09:56
@vermiculus
Copy link
Contributor Author

a5a6eee includes some logging output for investigation. It appears the search space is null for these two calls:

log: new call to git_config_entries_get_unique
log: needle: branch.local-track.remote
log: haystack:
log: end
log: 'branch.local-track.remote' NOT FOUND

log: new call to git_config_entries_get_unique
log: needle: branch.track-local.remote
log: haystack:
log:   branch.local-track.remote => .
log: end
log: 'branch.track-local.remote' NOT FOUND

@vermiculus
Copy link
Contributor Author

vermiculus commented Apr 9, 2022

I'm investigating config::rename::can_rename as a test that's failing -- I think all the other failures are coming from the git_config_delete_entry in rename_config_entries_cb.

I think I've learned that at git_config_delete_entry, the git_config struct has all the expected 'entries' (in quotes because I'm using git_config_foreach to print everything and I'm not 100% sure what it does).

Once we get into config_file_delete, though, backend->entries does not have all the entries that were printed before. It's worth noting that backend->level:=6 here -- 6 is the new 'worktree' value this patch introduced, whereas I'm willing to bet the value that needs to be deleted is on level 5. I believe config.c:get_backend_for_use is returning the wrong backend.

Continuing investigation on that assumption.

@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch 2 times, most recently from ec1a85d to b1ca747 Compare April 9, 2022 15:11
@vermiculus
Copy link
Contributor Author

vermiculus commented Apr 9, 2022

I believe the above patch is in the right direction (based on ~three hours' learning about config backends...) though some tests are still failing. I believe this is because 'backends' is not sorted by config-level. I'll try to come back to this in a few hours.

return 0;
case BACKEND_USE_DELETE:
/* return the first backend that contains the given config-name */
backend->backend->get(backend->backend, name, &entry);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I need to lock/unlock the backend here? Is there a risk of data getting written out from underneath me?

@vermiculus
Copy link
Contributor Author

@ethomson Just saw that https://github.com/libgit2/libgit2/labels/v1.8 was added. Cool! As it happens, I'm also looking at the possibility of spending some $DAY_JOB hours on this in the near future. Any chance I could get permission to run the tests again?

@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch from ded2432 to e1b0bcb Compare February 10, 2024 07:21
@vermiculus
Copy link
Contributor Author

vermiculus commented Feb 16, 2024

@ethomson We're hoping to look at this again next week; any chance I can get approval to run the workflow tests?

Well it's not exactly kosher, but I was able to find a way to at least run some of the tests on my fork by merging vermiculus/libgit2:sallred/* into vermiclus/libgit2:main, so we might be good on this front for now.

Introduce the logical concept of a worktree-level config.  The new
value sits between _LOCAL and _APP to allow `git_config_get_*` to
'just work'.

The assumption of how `git_config_get_*` works was tested
experimentally by setting _WORKTREE to some nonsense value (like -3)
and watching the new test fail.
Now that GIT_CONFIG_LEVEL_WORKTREE exists logically, define and load
$GIT_DIR/config.worktree into that level.
@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch from c513014 to 11a859e Compare February 20, 2024 17:30
Now that worktree-level configuration can be read from disk and
manipulated in memory, we should be able to say we support
'extensions.worktreeConfig'.
@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch from 432510f to c34c6ed Compare February 21, 2024 04:03
This structure provides for cleaner diffs in upcoming commits.
@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch from c34c6ed to 71bb851 Compare February 21, 2024 04:55
@vermiculus
Copy link
Contributor Author

@ethomson I believe this is ready to go – or at least ready for review. The only failures I see in my CI runs are of the form:

4: online::customcertFF
4: 
4:   1) Failure:
4: online::customcert::file [/home/libgit2/source/tests/libgit2/online/customcert.c:68]
4:   Function call failed: (git_clone(&g_repo, "https://test.libgit2.org:1443/anonymous/test.git", "./cloned", ((void *)0)))
4:   error -1 - failed to resolve address for test.libgit2.org: Temporary failure in name resolution
4: 
4:   2) Failure:
4: online::customcert::path [/home/libgit2/source/tests/libgit2/online/customcert.c:76]
4:   Function call failed: (git_clone(&g_repo, "https://test.libgit2.org:2443/anonymous/test.git", "./cloned", ((void *)0)))
4:   error -1 - failed to resolve address for test.libgit2.org: Temporary failure in name resolution

and this may even be expected in my somewhat-hacked-together setup (force-pushing to vermiculus:main to get CI to run).

Let me know if there's any way I can help expedite review.

Thanks!

@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch 3 times, most recently from 09d26d9 to daeef15 Compare February 21, 2024 19:52
@vermiculus
Copy link
Contributor Author

vermiculus commented Feb 21, 2024

@zivarah brought up during internal code review that this PR could end at f4b1113 and a separate PR be opened for ..daeef15. I'm happy to do so, but I thought I'd ask what your preference would be as a maintainer. Let me know.

vermiculus and others added 5 commits February 21, 2024 15:54
It would seem that `get_backend_for_use` is primarily used when
writing config data -- either to set keys or delete them (based on the
possible values of `backend_use`). When git-config(1) is used for
side-effects, it will modify only the local (repository-level)
configuration unless explicitly overridden.

From git-config(1):

    --local
        For writing options: write to the repository .git/config file.
	This is the default behavior.

`get_backend_for_use` does not have the ability to specify a config
level and typically is expected (it seems) to 'do the right thing'.
Taking its cue from git-config(1), don't update worktree-specific
config unless it's the only option.

If that functionality is needed by consumers, I assume they would find
the appropriate backend with `git_config_open_level` and feed that
`git_config` object through to the `git_config_set_*` functions (as
demonstrated in the provided test).
When deleting a key from a repository with multiple config
backends (like a --local config and a --worktree config), it's
important to return the correct backend to modify.

This patch ensures that we don't return a backend that is incapable of
deleting a given piece of configuration when that is the required use.
Before passing the config key name to backend->get(), it needs to be
normalized first. Not all current callers are performing this
normalization, so perform it centrally instead.

Co-authored-by: Brian Lyles <brianmlyles@gmail.com>
This looks like a simple copy/paste error.
Now that get_backend_for_use can return other error codes (by virtue
of key-name normalization), make sure to propagate the appropriate
error code when used.
@vermiculus vermiculus force-pushed the sallred/support-worktree-config branch from daeef15 to 4cd2d17 Compare February 21, 2024 21:54
@ethomson
Copy link
Member

@zivarah brought up during internal code review that this PR could end at f4b1113 and a separate PR be opened for ..daeef15. I'm happy to do so, but I thought I'd ask what your preference would be as a maintainer. Let me know.

I'm happy with two things intermingled in a single PR, no need to split it out. I'm sorry that I haven't had time to 👀 this yet, but I'm hopeful to do it this weekend. 🙏

@csware
Copy link
Contributor

csware commented Feb 24, 2024

fixes #6650

/* If we're trying to delete a piece of config, make
sure the backend we return actually defines it in
the first place. */
if (use == BACKEND_USE_DELETE) {
Copy link
Member

@ethomson ethomson Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to go through and refresh my understanding of our configuration logic. This change makes sense if the perspective of libgit2 is when we say delete a configuration key, then just delete it wherever it is. (And I think that's the semantics that our current config data sort of implies.)

This is different than git config, which takes a config level (eg git config --global ...) to operate on.

But I think that this change actually suggests a deeper problem with our API. IIUC, the goal of this change is to go find the config that we should delete a key in. What should we do if there's a multivar split over two configuration files? Instead of trying to find the backend in get_backend_for_use, should we iterate over all backends?

(Not asking you to make this change, asking for your opinion on the API.)

Copy link
Member

@ethomson ethomson Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To expand on that — I think that an API that lets you specify a key to delete and the config level to delete it from makes sense. I think that an API that lets you specify a key to delete and then deletes all occurrences across all config levels makes sense. I'm less convinced that an API that lets you specify a key to delete and then deletes the first occurrence makes sense.

I think that we have the latter option — where it deletes the first occurrence — and I wonder if that's really the right API? 🤔

Copy link
Contributor Author

@vermiculus vermiculus Feb 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should we do if there's a multivar split over two configuration files?

Certainly this case would surprise folks; I would not expect to get different results with repeated calls:

git_config_delete_entry(cfg, "foo.bar"); // deletes --local
git_config_delete_entry(cfg, "foo.bar"); // deletes --global

That just feels 'wrong'. It would feel much more appropriate to say

git_config_delete_entry_at_level(cfg, GIT_CONFIG_LEVEL_LOCAL, "foo.bar"); // deletes --local
git_config_delete_entry_at_level(cfg, GIT_CONFIG_LEVEL_GLOBAL, "foo.bar"); // deletes --global
// with companion functions
git_config_set_string_at_level(cfg, GIT_CONFIG_LEVEL_LOCAL, "foo.bar", "baz");
git_config_get_entry_at_level(&entry, cfg, GIT_CONFIG_LEVEL_LOCAL, "foo.bar");
git_config_get_entry_with_inheritence(&entry, cfg, "foo.bar");
  // ^^ this may be git_config_get_entry unchanged?
// etc.

It was surprising to us to learn that the current config API did not work this way. In a vacuum, I would consider git_config_set_string_at_level to be the correct abstraction for a C library: explicit and unambiguous even if a little verbose.

I feel comfortable enough with the config code at this point to tackle such a thing in a separate PR if that's a direction you'd like to go. (In fact, I've rather been enjoying this codebase – it's a nice change of pace for me 😄.) It probably deserves a more robust design process, though.

It would also make sense to me to incorporate the worktree-config support only into that new API if that helps resolve concerns with this PR, but that will come with different considerations for doing due diligence to ensure consumer applications continue to 'work reasonably correctly'. (Although maybe that's not as high of a concern in this context as long as we make the appropriate deprecation warnings; I've not worked with this particular kind of project in a FOSS setting before.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not positive but my thinking here is that maybe we should keep things mostly as they are but, yes, avoid surprise:

I think that the "simple" git_config functions should probably mirror the git command line itself. Reads return the "effective" configuration (applying the config files in order of priority and giving a result.) But we change writes/deleted to always refer to the local.

If somebody wants a specific level then they can open that file and operate on it directly. I don't think we necessarily need APIs to read and write at a specific level directly, I think that people can probably go get the config file that they're interested in.

But this rather presupposes that git's model is right here — and I'm increasingly convinced that it actually is.

The rather tricky part is the user configuration(s). We may need to be more thoughtful than "always write to local".

Copy link
Contributor Author

@vermiculus vermiculus Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reviewing this comment once more before writing up what this would look like

I think that the "simple" git_config functions should probably mirror the git command line itself. Reads return the "effective" configuration (applying the config files in order of priority and giving a result.) But we change writes/deleted to always refer to the local.

I think that's a good instinct as it would follow some principle of least surprise.

What I'm not sure of is what is meant by this:

The rather tricky part is the user configuration(s). We may need to be more thoughtful than "always write to local".

By 'user configurations', do you mean things like --global and --system config?

Here's what I'm going to propose:

For reads: keep the current implementation and use the most specific level available to us.

For writes:

  1. If there is exactly one config backend, use it.
  2. If --local is available, use it.
  3. Signal an error (as git does).

The alternative to (3) (which might be the 'more thoughtful' approach, as best as I can imagine one) would be to set the most specific level of config available to us if --local is not available, but I rather think that might be too smart for this 'plumbing' function. I do think the need of 'set this at the system/global level' would be met with git_config_open_level followed by git_config_set_string following the logic proposed above. (Some wrapper 'porcelain' could be devised as above, but it does seem like that would be low-value API.)


I have some errands to run, but I hope to have a patch here in a few hours' time. I'll just plop one commit on top so there are two versions to easily compare.

Copy link
Contributor Author

@vermiculus vermiculus Mar 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have pushed c82bef3 which takes the approach outlined above. Before the final version, I would like to add a test that git_config_open_level(&out, cfg, GIT_CONFIG_LEVEL_LOCAL) only ever returns a config with one backend (even in degenerate cases) since I'm now relying on that behavior in the base case of get_backend_for_use.

Note that tests introduced in #4217 are non-trivially altered (@pks-t), but I believe the test is still effective in demonstrating that the readonly flag is respected.

There is another problem test_remote_httpproxy__env that I've not tracked down yet; some details are in the commit though. I'll add that I've only gotten test_remote_httpproxy__env to fail when running the full suite; I'm not able to get it to fail on its own or even just running the remote test suite. There might be something wrong with the test; it's hard to say.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent a little time staring at this over the weekend and there's much more complexity here than I expected. You may have identified some of these problems as well. In particular, backends are shared across multiple git_config instances. (The repository has a configuration with multiple backends with an ordering. When you create a snapshot, you get a new configuration with read only backends. And you can open_level, you get a new configuration object with only one of those backends.)

The other "gotcha" is that we support users of the library to create their own custom configuration, GIT_CONFIG_LEVEL_APP (or higher) which is higher-priority than the system/global/etc configuration. And we have no idea what they're doing. So trying to be clever about where writes go needs to take into account the APP level which may or may not exist and may or may not be writeable.

The current flow today is "write to the highest level backend that isn't readonly", which may be APP and if it's not it will be LOCAL.

So, because of this shared state, you can't simply make the worktree backend "not writable" for the repository configuration, since it will also be not writable when you open_level it, which you do want to write to. 🙃

And trying to hardcode any logic about defaulting to the local configuration is tricky, since there might be APP-levels that you need to contend with (and that are being written to today and we would like to preserve backcompat).

My initial thought was that we could re-use the readonly bit (or possibly make a new "writes need to be explicit" bit) and make the worktree config readonly when it's created via git_repository_open, but that doesn't work, because subsequent open_level will happily just give you the existing worktree config which is... readonly.

Instead, I'm coming back to the notion that configuration needs separate read and write ordering. Providing a set_writeorder API allows callers (in particular, git_repository) to set that they want LOCAL to be the only write target during configuration creation. Users can subsequently add an APP level, which will be be the highest-level writer (but they can change that themselves with set_writeorder).

Ultimately, set_writeorder only applies to the things that are in your config object. So if you want to override the default write order of the thing(s) you just added, you will need to call it again after adding them. (You can't reconfigure your write order.)

This required us to pull apart the refcounted backend object from the list of backends that are in a config so that they can be separate. Alas. In any case, I pushed up a branch ethomson/worktree-config (diff). I'd love your feedback.

I also found and fixed some interesting bugs, notably that you can't git_transaction_free an uncommitted transaction today. And that if you start a transaction and then mess around with backends, you're going to have a bad time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we were typing our comments at roughly the same time. 😅

Right, so, my concern here is definitely around the CONFIG_LEVEL_APP no longer being a write target. I think that's critical to backcompat... Our CLI uses it to map -c command-line arguments into a configuration backend (though this is read-only). I know that at least one hosting provider uses CONFIG_LEVEL_APP, and ... I have no idea if they're using it to write config data or only read?

I'm loathe to break it, though. 😬

if (backend->backend->readonly)
continue;

/* git-config doesn't update worktree-level config
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is interesting — isn't this true of all git config entry levels? IOW, doesn't git config always operate on the local configuration unless otherwise specified?

I actually think that this exposes an oddity in our whole ordering. Our configuration levels appear to be trying to support a scenario where we have a system of ordering so that higher priority backends can override the data in lower priority backends. (eg, if I have foo.bar in system, and foo.bar in local, then the foo.bar in local is used.)

But we also want a system that's relatively flexible so that library users can plug in their own backends. (In other words, we don't want to just always default to the local backend, which is what we would do in the absence of any pluggability.)

The wrinkle here is that you're introducing a new backend that should take priority over reads but should not take priority over writes. I think that we should consider doing two things:

  1. For the sort of generic, unopinionated configuration system, we should have two priority ordered lists — one for read priority, and one for write priority
  2. When a repository is created by default, it should set the read priority in the order that we've specified here. The write priority should only have a single option, which is the local configuration. (By default, calling set without a specific entry level should probably just always go there.)

This lets users have control over the ordering, but sets a default that aligns with git.

Does this make sense? Am I entirely off base? Am I overthinking things?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you're over-thinking things. I agree that the different behavior for reading/writing is not something that's nicely explicit in the current API.

This feels very related to #6202 (comment). I will have to noodle on these points more (I've got some household chores to do in the meantime), but I suspect my other reply might be a start toward a resolution here.

In this version:

1. If there is exactly one config backend, use it.
2. If `--local` is available, use it.
3. Signal an error (as `git` does).

This closely simulates the behavior of git-config.

A test does fail, however, introduced in [1]:

    test_config_readonly__writing_to_cfg_with_ro_precedence_succeeds

Reading the message, I think the spirit of the readonly functionality
is still satisfied and the test can be adjusted.

Another test also fails:

    test_remote_httpproxy__env

This one eludes me -- because it only fails *sometimes*. It does not
fail if I just run that suite or surrounding suites. When it does
fail, it fails on the very first assertion:

    /* HTTP proxy is ignored for HTTPS */
    cl_setenv("HTTP_PROXY", "http://localhost:9/");
    assert_proxy_is(NULL);

I'm not sure what kind of test interaction could be going on here, but
I suspect (given my changes) that the problem is somewhere within
`http_proxy_config`.

[1]: 95f29fb
@ethomson
Copy link
Member

Fixed via #6756

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Support extensions.worktreeconfig for repositoryformatversion=1
5 participants