-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088
base: main
Are you sure you want to change the base?
Conversation
5a67ac6
to
04083d4
Compare
1797b68
to
eb0f55d
Compare
|
||
// From now on, we assume some invariants guaranteed by the encoding function | ||
// above. See its docstring for details. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said, I believe it's wrong to preprocess source texts to be merged. What we need here is to encode merge_result
, right? Why do we have to do source -> serialize1 -> merge -> serialize2
instead of source -> merge -> serialize
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the main problem we were discussing in #3977 (comment) was that the transformation wasn't reversible (or, more precisely, that two different sides could be mapped to the same serialized version, causing a conflict to disappear). Now, the transformation is perfectly reversible.
The problem with serializing once is that there are many corner cases that I find it hard to reason about. I do not currently have a clear idea how to do it (so that it works in every case and is human-readable), and even if I did know a way, I wouldn't know how to do it elegantly without a gigantic test suite (and maintenance burden) for a feature that only comes in play rarely.
Some corner cases:
- Conflict markers inside a conflict vs conflict markers that are the same on all sides of the conflict.
- "No newline at the end of file" outside a conflict (this one is pretty easy) vs on all sides of a conflict vs (on some sides of a conflict that is on the end of file for all sides) vs (on no sides of a conflict at the end of file) vs (one of the last two cases for a conflict that's only at the end of file for some of the sides).
- Unresolvable conflict when merging with empty file #3223 (which this approach provides a natural solution for, as I described in the PR description)
By having an intermediate state, we don't have to worry about most corner cases, we only have to make sure the first serialization satisfies a few invariants and is reversible. This way, the implementation is much simpler and, more importantly, verifying that it works correctly in all cases is much simpler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but we have to instead design the serialize1
part robust against line-by-line merging logic, right? Does it handle merge of two "no-eol" hunks for example? I suspect matching "no-eol" markers would be rendered out of the conflict region.
I didn't think about corner cases thoroughly, but I think nested conflict markers can be represented as a hunk (i.e. padded with " ") instead of a "base" content.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, but we have to instead design the serialize1 part robust against line-by-line merging logic, right?
Yes, that's what I think I did. Or do you mean something else?
Does it handle merge of two "no-eol" hunks for example? I suspect matching "no-eol" markers would be rendered out of the conflict region.
Yes, and then jj
should import it correctly. I can add a test, it should look like:
<<<<<<<
aaa
...
>>>>>>>
\JJ: No newline at end of file
For comparison, if the last line is "bbb" and not conflicted, it'd look like:
<<<<<<<
aaa
...
>>>>>>>
bbb
\JJ: No newline at end of file
Finally, see the test in eb0f55d#diff-a63c4cb277c53acb282fa66016517078d2d8b452259d061ba2449a27cf9ef277R390 for a conflict in whether there is a newline at the end of a file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooh, so \JJ:
lines are file global, not hunk local?
I would say it's super easy for users to lose track of them. I wouldn't expect that I had to fix up lines out of the loud conflict markers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the file is in conflict state, maybe we'll need to represent verbatim <<<<<<< as a fake 1-way conflict (to escape)?
This is similar to a previous idea I had. I ultimately decided against it because I thought it would look super-messy and would be relatively complicated to implement.
The idea was to allow the opening conflict marker to have more than 7 <
s. If it has, say, 9 <
s, then all conflict markers in that conflict are required to be 9 characters long. So, a sequence of 7 <
s would be considered as just part of the file.
This solves the problem of conflict markers inside conflicts. Then, we'd need some syntax for one-sided conflicts outside conflicts, e.g.
<<<<<<<<<<
<<<<<<<
aaa
=======
bbb
>>>>>>>
>>>>>>>>>>
This has a two issues:
-
The parser for conflict markers becomes more complicated. We couldn't parse it with regular expressions, and seemingly not with
pest
as it's not a simple enough grammar. We could do it manually or with another parser combinator library, probably (I forget what the one I'm thinking of is called). -
With the simplest implementation of materialization, the conflict example above would look much more complicated (though this is equivalent):
<<<<<<<<<<
<<<<<<<
>>>>>>>>>>
aaa
<<<<<<<<<<
=======
>>>>>>>>>>
bbb
<<<<<<<<<<
>>>>>>>
>>>>>>>>>>
Or, we could do a simple greedy implementation that surrounds huge regions of the file with these markers, making the markers hard to find.
So, I ended up leaning away from the idea. OTOH, the second problem would only occur for a minority of files, so maybe it's not too terrible?
Just to keep count, I now have 3 ideas in mind: this PR's idea, this PR + explicit unsetting of the conflicted state of the file in some cases (to avoid jj
automatically removing \JJ
lines, have a command do it), or the long conflict markers. I think the current version is the simplest, but I'm not sure by how much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to allow the opening conflict marker to have more than 7
<
s. If it has, say, 9<
s, then all conflict markers in that conflict are required to be 9 characters long.
That seems not bad, and relatively simple to implement.
Let's say any of the base file contents have lines matching ^([<>]{7,})
, we'll use len(max(matches)) + N
<
s as conflict marker. When parsing, the (minimum?) number of the <
s can be determined from the base files, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When parsing, the (minimum?) number of the <s can be determined from the base files, I think.
I initially forgot about this issue, and it's critical. Relying on the original conflict sides for this is a nice idea. I'm not entirely sure it will work without being too confusing, but it might.
To elaborate for others, for the "long conflict marker" approach to have any advantage over \JJ Verbatim Line:
, jj
must be able to tell the difference between:
- a file that contains a 7-
<
conflict and - a file that used to have 9-
<
conflicts and text with embedded 7-<
conflict markers (like one of the examples I gave just above in lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088 (comment)), but now the user removed the conflict markers, and the remaining 7-<
conflict markers should be part of the text.
The text of the file is identical in the two situations, but jj
has to know there are no conflicts in the latter file.
jj
already does something similar: before parsing the file as a conflict, it currently has to know the number of sides expected in the conflict. jj
gets it from the in-tree version of the conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I realized that if we get the required number of <
s from the original conflict, there is no longer a technical need to surround unconflicted conflict markers with artificial "one-sided conflicts". OTOH, I'm worried that with or without those one-sided conflicts, they'll be confusing to the user.
Currently, my best idea is to add something like this to the beginning of the file:
<<<<<<<<< Conflict 1 of 5 (Informational, not a real conflict)
\\\\\\\\\ JJ: JJ conflict markers in this file are 9 characters long, longer than the normal 7.
\\\\\\\\\ JJ: This is because this file contains text that looks like conflict markers.
\\\\\\\\\ JJ: This fake conflict can be safely deleted, and must be deleted for the file to be considered resolved.
>>>>>>>>> End of informational conflict 1 of 5
Blah, blah. This wouldn't be a conflict:
<<<<<<<
aaa
=======
bbb
>>>>>>>
bleh bleh
(4 other conflicts should exist in this file somewhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think, in any case, the user will need to resolve the last remaining conflict in a way that the file content is identical to the non-conflict materialization. Otherwise, the working-copy file would have diffs from the resolved version of the tree content.
I assumed the "minimum length marker" idea practically solves this problem as the user can inline verbatim <<<<<<<
line if the base file contained verbatim <<<<<<<
and therefore the marker line must be at least 8-char long.
pub fn parse_conflict(input: &[u8], num_sides: usize) -> Option<Vec<Merge<ContentHunk>>> { | ||
pub fn parse_merge_result(input: &[u8], num_sides: usize) -> Option<Merge<ContentHunk>> { | ||
let hunks = parse_conflict_into_list_of_hunks(input, num_sides)?; | ||
let mut result: Merge<Vec<u8>> = Merge::from_vec(vec![vec![]; num_sides * 2 - 1]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: you can build Merge<ContentHunk>
, or Vec<ContentHunk>
and apply Merge::from_vec()
at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you're trying to help out with this in #4106 . Thanks! I may clean it up in this PR, or we can do it separately; in any case we should figure out the bigger issues first.
ccbb160
to
5519ffd
Compare
18b3795
to
51d761e
Compare
`parse_merge_result` Now, it matches `materialize_merge_result`
There might still be rough spots I forgot about, but I think this is more or less ready to go.
User docs are TODO in this or a following PR.
Fixes #3968
Fixes #3975
The plan is to address #3223 in a follow-up PR, by having empty files show up as something along the lines of
\JJ There is no file at this path on this side of the conflict. It may have been deleted.
Checklist
If applicable:
CHANGELOG.md