-
Notifications
You must be signed in to change notification settings - Fork 365
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix lakectl local commits remote changes outside synced prefix #7796
Fix lakectl local commits remote changes outside synced prefix #7796
Conversation
@eladlachmi I honestly think this issue was mislabled as bug. This is a feature request and regardless it constitutes as a breaking change |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, really like the approach.
Regarding the tests - I like the unit tests but I would honestly just replace them with a scenario in esti - as we already have tests for lakectl local + commit. I would just add the relevant scenarios there
DieOnErrorOrUnexpectedStatusCode(dirtyResp, err, http.StatusOK) | ||
|
||
// The above gives us SeekGT. Since we need SeekGE, we do another stat for exact match | ||
statResp, err := client.StatObjectWithResponse(ctx, remote.Repository, remote.Ref, &apigen.StatObjectParams{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to handle err
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very true. Fixed.
cmd/lakectl/cmd/local_commit.go
Outdated
@@ -24,6 +32,47 @@ func findConflicts(changes local.Changes) (conflicts []string) { | |||
return | |||
} | |||
|
|||
func hasUncommittedOutsideLocalPrefix(ctx context.Context, client *apigen.ClientWithResponses, remote *uri.URI, idx *local.Index) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A very elegant solution 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share the credit with @arielshaqed 😁
cmd/lakectl/cmd/local_commit.go
Outdated
@@ -24,6 +32,47 @@ func findConflicts(changes local.Changes) (conflicts []string) { | |||
return | |||
} | |||
|
|||
func hasUncommittedOutsideLocalPrefix(ctx context.Context, client *apigen.ClientWithResponses, remote *uri.URI, idx *local.Index) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest something more concise, for example:
func hasUncommittedOutsideLocalPrefix(ctx context.Context, client *apigen.ClientWithResponses, remote *uri.URI, idx *local.Index) bool { | |
func hasExternalChange(ctx context.Context, client *apigen.ClientWithResponses, remote *uri.URI, idx *local.Index) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you know, concise is my middle name 😜
} | ||
|
||
// Get the first uncommitted change after the prefix. If it exists, we're also done | ||
nextPrefix := fmt.Sprintf("%s%s", filepath.Clean(*currentURI.Path), asciiCharAfterSlash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not clear what we are doing here. We should optimally get the next prefix in line and perform the diff on that.
I don't understand how the statObject will help, we're assuming some object with some path will exist?
Overall this is very confusing and feels over-complicated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, so I'll explain the approach. Let's say I synced locally with the prefix test-data/
.
Case 1: There are no uncommitted changes. The first check doesn't return any results. We're done—continue with the commit.
Case 2: We have uncommitted changes before test-data/
, e.g., abc/file.parquet
. The first check will return a result outside the prefix. We're done—exit with an error message.
Case 3: We have uncommitted changes after test-data/
, e.g., zzz/file.parquet
. The second check needs to fetch the first uncommitted change after test-data/
. In lexicographic order, this is test-data0
because /
is ordinal 47 and the next ordinal (48) is 0
. This fetch will return a result-exit with an error message.
These three cases seem to cover all the options. However, there's a subtle edge case. Our After
parameter acts like SeekGT
. This means that we've covered everything up to test-data0
(end not inclusive) and everything from test-data0
(start not inclusive). So if we happen to have an object specifically named test-data0
, we'll miss it. That's what the last StatObject
call is for.
Hope this helps clear things up 😊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it now 😅
Can we please add the explanation in the code so it will be clear for anyone looking at it?
After: apiutil.Ptr(apigen.PaginationAfter(nextPrefix)), | ||
}) | ||
DieOnErrorOrUnexpectedStatusCode(dirtyResp, err, http.StatusOK) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can stop here instead of creating an unnecessary API call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the explanation in the previous comment's reply and let me know what you think.
DieOnErrorOrUnexpectedStatusCode(dirtyResp, err, http.StatusOK) | ||
|
||
// The above gives us SeekGT. Since we need SeekGE, we do another stat for exact match | ||
statResp, err := client.StatObjectWithResponse(ctx, remote.Repository, remote.Ref, &apigen.StatObjectParams{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feels like a shot in the dark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the explanation in the previous comment's reply and let me know what you think 😅
cmd/lakectl/cmd/local_commit.go
Outdated
@@ -56,6 +106,11 @@ var localCommitCmd = &cobra.Command{ | |||
resp, err := client.GetBranchWithResponse(cmd.Context(), remote.Repository, remote.Ref) | |||
DieOnErrorOrUnexpectedStatusCode(resp, err, http.StatusOK) | |||
|
|||
hasChangesOutsideSyncedPrefix := hasUncommittedOutsideLocalPrefix(cmd.Context(), client, remote, idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hasChangesOutsideSyncedPrefix := hasUncommittedOutsideLocalPrefix(cmd.Context(), client, remote, idx) | |
hasExternalChange := hasUncommittedOutsideLocalPrefix(cmd.Context(), client, remote, idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/lakectl/cmd/local_commit.go
Outdated
@@ -56,6 +106,11 @@ var localCommitCmd = &cobra.Command{ | |||
resp, err := client.GetBranchWithResponse(cmd.Context(), remote.Repository, remote.Ref) | |||
DieOnErrorOrUnexpectedStatusCode(resp, err, http.StatusOK) | |||
|
|||
hasChangesOutsideSyncedPrefix := hasUncommittedOutsideLocalPrefix(cmd.Context(), client, remote, idx) | |||
if hasChangesOutsideSyncedPrefix && !force { | |||
DieFmt("The branch you are trying to commit to contains uncommitted changes outside the lakeFS path your local directory '%s' is synced with.\nTo proceed, use the --force flag.", localPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You already know the name of the branch:
DieFmt("The branch you are trying to commit to contains uncommitted changes outside the lakeFS path your local directory '%s' is synced with.\nTo proceed, use the --force flag.", localPath) | |
DieFmt("Branch %s contains uncommitted changes outside of local path '%s'.\nTo proceed, use the --force flag.", remote.ref, localPath) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
cmd/lakectl/cmd/local_commit.go
Outdated
@@ -170,6 +225,7 @@ var localCommitCmd = &cobra.Command{ | |||
|
|||
//nolint:gochecknoinits | |||
func init() { | |||
withForceFlag(localCommitCmd, "Commit changes even if remote branch includes uncommitted changes outside the synced directory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
withForceFlag(localCommitCmd, "Commit changes even if remote branch includes uncommitted changes outside the synced directory") | |
withForceFlag(localCommitCmd, "Commit changes even if remote branch includes uncommitted changes external to the synced path") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
@N-o-Z IMO, any surprising/unexpected behavior hidden from or not clearly communicated to the user is a bug. I also think we're using the term "breaking change" too loosely. This isn't the same thing as a breaking change in an API. For example, changing the default checked/unchecked state of a checkbox in a modal is undoubtedly a missable change, and you can and should draw attention clearly to it, at least for the next few versions. However, unlike an API, this is user-interactive, and the option to revert to the previous behavior is just a I agree that we need to clearly mark this in the release notes as a change Differing opinions aside, beyond clearly marking and communicating that this is a behavior change, is there anything else you think we need to do before, during, or after releasing this change? (p.s. to spare the need to add |
@N-o-Z I don't mind adding esti tests, but since the unit tests are already written, and they provide a localized and fast feedback loop (unlike esti tests), I don't see a reason to remove them. I think a multi-layer testing approach provides good coverage + a good developer experience. WDYT? |
Breaking change doesn't necessarily means we need to bump a major version and we shouldn't be afraid to use the term. Simply communicate it appropriately and verify this is understood by all stakeholders (@treeverse/product FYI). |
♻️ PR Preview a6e1af6 has been successfully destroyed since this PR has been closed. 🤖 By surge-preview |
IMO, mocking the server does not help us identify changes in behavior vs the actual server (extra important since lakectl and lakeFS are different binaries) so we have to have esti test. Once we have this covered in esti anyway, the unit tests become pretty redundant. I don't mind leaving them but I have to insist on having the integration tests |
@N-o-Z How confident are you about your coverage? 😅 Anyway... I think both the CLI and UI are user-interactive tools that users can choose to automate, but in doing so, they should be aware that interfaces (command line or otherwise) don't carry the same guarantees as APIs and SDKs. As long as we agree about what we intend to do, we're good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@eladlachmi Thanks for the great work.
Approved with a couple of comments.
It should be clear that this solution is not bulletproof and is susceptible to a race
} | ||
|
||
// add local files | ||
if len(tc.uncommittedLocal) > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure but do you need this check?
@@ -464,6 +464,145 @@ func TestLakectlLocal_commit(t *testing.T) { | |||
} | |||
} | |||
|
|||
func TestLakectlLocal_commit_remote_uncommitted(t *testing.T) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the tests! 💪🏽
} | ||
|
||
// Get the first uncommitted change after the prefix. If it exists, we're also done | ||
nextPrefix := fmt.Sprintf("%s%s", filepath.Clean(*currentURI.Path), asciiCharAfterSlash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get it now 😅
Can we please add the explanation in the code so it will be clear for anyone looking at it?
@@ -56,6 +109,11 @@ var localCommitCmd = &cobra.Command{ | |||
resp, err := client.GetBranchWithResponse(cmd.Context(), remote.Repository, remote.Ref) | |||
DieOnErrorOrUnexpectedStatusCode(resp, err, http.StatusOK) | |||
|
|||
hasChangesOutsideSyncedPrefix := hasExternalChange(cmd.Context(), client, remote, idx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimally this should move to line 171 to reduce the chance of a race
Closes #7687
Change Description
Bug Fix
The Problem
Please see the associated issue for the full description of the problem.
Root Cause
lakeFS, unlike Git, doesn't support the concept of committing only staged changes. A commit operation will commit any uncommitted diffs between the refs or fail if conflicts occur (all or nothing).
lakectl local
isn't special in this regard. However, being able to sync with any arbitrary prefix creates the expectation that any operations will be scoped to that prefix. In practice, this isn't the case. Merging a ref inlakectl local
unexpectedly merges all changes in the remote ref, regardless of prefix.Solution
This PR adds to
lakectl local commit
a check for existing uncommitted changes on the remote ref, which are outside of the synced prefix. If such changes are found, the operation is aborted and an informative error message is shown. This behavior can be overridden by adding the the--force
flag to the commit command.Testing Details
These changes were tested manually and with unit tests included in this PR.
Note to reviewers: I believe the included tests cover the possible cases. I would like your feedback if you think that any additional tests are needed for better coverage.
Breaking Change?
This is a breaking change!
Although this behavior is unexpected and considered a bug, it will change following this PR.
lakectl local
users relying on this behavior will see an error message instead of the operation completed successfully and will need to add the--force
flag to get back the previous behavior.Special thanks to @arielshaqed for the refresher course on
char
ordinals and string sort order 😅👑