Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed #4088

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

ilyagr
Copy link
Contributor

@ilyagr ilyagr commented Jul 15, 2024

There might still be rough spots I forgot about, but I think this is more or less ready to go.

User docs are TODO in this or a following PR.


Fixes #3968

Fixes #3975

The plan is to address #3223 in a follow-up PR, by having empty files show up as something along the lines of \JJ There is no file at this path on this side of the conflict. It may have been deleted.

Checklist

If applicable:

  • I have updated CHANGELOG.md
  • I have updated the documentation (README.md, docs/, demos/)
  • n/a I have updated the config schema (cli/src/config-schema.json)
  • I have added tests to cover my changes

@ilyagr ilyagr changed the title Materialize files with conflict markers in a way that can be parsed Materialize files with conflict markers or no newline at EOF in a way that can be parsed Jul 15, 2024
@ilyagr ilyagr changed the title Materialize files with conflict markers or no newline at EOF in a way that can be parsed Materialize ((unconflicted files) with (conflict markers) or (no newline at EOF)) in a way that can be parsed Jul 15, 2024
@ilyagr ilyagr changed the title Materialize ((unconflicted files) with (conflict markers) or (no newline at EOF)) in a way that can be parsed Materialize conflicts containing (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed Jul 15, 2024
@ilyagr ilyagr changed the title Materialize conflicts containing (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed lib conflicts: Materialize (sides with (conflict markers) or (no newline at EOF)) in a way that can be parsed Jul 15, 2024
@ilyagr ilyagr force-pushed the unmat branch 11 times, most recently from 5a67ac6 to 04083d4 Compare July 15, 2024 18:44
@ilyagr ilyagr marked this pull request as ready for review July 15, 2024 18:44
@ilyagr ilyagr marked this pull request as draft July 15, 2024 18:44
@ilyagr ilyagr marked this pull request as ready for review July 15, 2024 18:48
@ilyagr ilyagr force-pushed the unmat branch 4 times, most recently from 1797b68 to eb0f55d Compare July 16, 2024 23:48

// From now on, we assume some invariants guaranteed by the encoding function
// above. See its docstring for details.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, I believe it's wrong to preprocess source texts to be merged. What we need here is to encode merge_result, right? Why do we have to do source -> serialize1 -> merge -> serialize2 instead of source -> merge -> serialize?

Copy link
Contributor Author

@ilyagr ilyagr Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the main problem we were discussing in #3977 (comment) was that the transformation wasn't reversible (or, more precisely, that two different sides could be mapped to the same serialized version, causing a conflict to disappear). Now, the transformation is perfectly reversible.

The problem with serializing once is that there are many corner cases that I find it hard to reason about. I do not currently have a clear idea how to do it (so that it works in every case and is human-readable), and even if I did know a way, I wouldn't know how to do it elegantly without a gigantic test suite (and maintenance burden) for a feature that only comes in play rarely.

Some corner cases:

  • Conflict markers inside a conflict vs conflict markers that are the same on all sides of the conflict.
  • "No newline at the end of file" outside a conflict (this one is pretty easy) vs on all sides of a conflict vs (on some sides of a conflict that is on the end of file for all sides) vs (on no sides of a conflict at the end of file) vs (one of the last two cases for a conflict that's only at the end of file for some of the sides).
  • Unresolvable conflict when merging with empty file #3223 (which this approach provides a natural solution for, as I described in the PR description)

By having an intermediate state, we don't have to worry about most corner cases, we only have to make sure the first serialization satisfies a few invariants and is reversible. This way, the implementation is much simpler and, more importantly, verifying that it works correctly in all cases is much simpler.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but we have to instead design the serialize1 part robust against line-by-line merging logic, right? Does it handle merge of two "no-eol" hunks for example? I suspect matching "no-eol" markers would be rendered out of the conflict region.

I didn't think about corner cases thoroughly, but I think nested conflict markers can be represented as a hunk (i.e. padded with " ") instead of a "base" content.

Copy link
Contributor Author

@ilyagr ilyagr Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but we have to instead design the serialize1 part robust against line-by-line merging logic, right?

Yes, that's what I think I did. Or do you mean something else?

Does it handle merge of two "no-eol" hunks for example? I suspect matching "no-eol" markers would be rendered out of the conflict region.

Yes, and then jj should import it correctly. I can add a test, it should look like:

<<<<<<<
aaa
...
>>>>>>>
\JJ: No newline at end of file

For comparison, if the last line is "bbb" and not conflicted, it'd look like:

<<<<<<<
aaa
...
>>>>>>>
bbb
\JJ: No newline at end of file

Finally, see the test in eb0f55d#diff-a63c4cb277c53acb282fa66016517078d2d8b452259d061ba2449a27cf9ef277R390 for a conflict in whether there is a newline at the end of a file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooh, so \JJ: lines are file global, not hunk local?

I would say it's super easy for users to lose track of them. I wouldn't expect that I had to fix up lines out of the loud conflict markers.

Copy link
Contributor Author

@ilyagr ilyagr Jul 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the file is in conflict state, maybe we'll need to represent verbatim <<<<<<< as a fake 1-way conflict (to escape)?

This is similar to a previous idea I had. I ultimately decided against it because I thought it would look super-messy and would be relatively complicated to implement.

The idea was to allow the opening conflict marker to have more than 7 <s. If it has, say, 9 <s, then all conflict markers in that conflict are required to be 9 characters long. So, a sequence of 7 <s would be considered as just part of the file.

This solves the problem of conflict markers inside conflicts. Then, we'd need some syntax for one-sided conflicts outside conflicts, e.g.

<<<<<<<<<<
<<<<<<<
aaa
=======
bbb
>>>>>>>
>>>>>>>>>>

This has a two issues:

  • The parser for conflict markers becomes more complicated. We couldn't parse it with regular expressions, and seemingly not with pest as it's not a simple enough grammar. We could do it manually or with another parser combinator library, probably (I forget what the one I'm thinking of is called).

  • With the simplest implementation of materialization, the conflict example above would look much more complicated (though this is equivalent):

<<<<<<<<<<
<<<<<<<
>>>>>>>>>>
aaa
<<<<<<<<<<
=======
>>>>>>>>>>
bbb
<<<<<<<<<<
>>>>>>>
>>>>>>>>>>

Or, we could do a simple greedy implementation that surrounds huge regions of the file with these markers, making the markers hard to find.

So, I ended up leaning away from the idea. OTOH, the second problem would only occur for a minority of files, so maybe it's not too terrible?


Just to keep count, I now have 3 ideas in mind: this PR's idea, this PR + explicit unsetting of the conflicted state of the file in some cases (to avoid jj automatically removing \JJ lines, have a command do it), or the long conflict markers. I think the current version is the simplest, but I'm not sure by how much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to allow the opening conflict marker to have more than 7 <s. If it has, say, 9 <s, then all conflict markers in that conflict are required to be 9 characters long.

That seems not bad, and relatively simple to implement.

Let's say any of the base file contents have lines matching ^([<>]{7,}), we'll use len(max(matches)) + N <s as conflict marker. When parsing, the (minimum?) number of the <s can be determined from the base files, I think.

Copy link
Contributor Author

@ilyagr ilyagr Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When parsing, the (minimum?) number of the <s can be determined from the base files, I think.

I initially forgot about this issue, and it's critical. Relying on the original conflict sides for this is a nice idea. I'm not entirely sure it will work without being too confusing, but it might.

To elaborate for others, for the "long conflict marker" approach to have any advantage over \JJ Verbatim Line:, jj must be able to tell the difference between:

The text of the file is identical in the two situations, but jj has to know there are no conflicts in the latter file.

jj already does something similar: before parsing the file as a conflict, it currently has to know the number of sides expected in the conflict. jj gets it from the in-tree version of the conflict.

Copy link
Contributor Author

@ilyagr ilyagr Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that if we get the required number of <s from the original conflict, there is no longer a technical need to surround unconflicted conflict markers with artificial "one-sided conflicts". OTOH, I'm worried that with or without those one-sided conflicts, they'll be confusing to the user.

Currently, my best idea is to add something like this to the beginning of the file:

<<<<<<<<< Conflict 1 of 5 (Informational, not a real conflict)
\\\\\\\\\ JJ: JJ conflict markers in this file are 9 characters long, longer than the normal 7.
\\\\\\\\\ JJ: This is because this file contains text that looks like conflict markers.
\\\\\\\\\ JJ: This fake conflict can be safely deleted, and must be deleted for the file to be considered resolved.
>>>>>>>>> End of informational conflict 1 of 5

Blah, blah. This wouldn't be a conflict:
<<<<<<<
aaa
=======
bbb
>>>>>>>
bleh bleh

(4 other conflicts should exist in this file somewhere)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, in any case, the user will need to resolve the last remaining conflict in a way that the file content is identical to the non-conflict materialization. Otherwise, the working-copy file would have diffs from the resolved version of the tree content.

I assumed the "minimum length marker" idea practically solves this problem as the user can inline verbatim <<<<<<< line if the base file contained verbatim <<<<<<< and therefore the marker line must be at least 8-char long.

lib/src/merge.rs Outdated Show resolved Hide resolved
pub fn parse_conflict(input: &[u8], num_sides: usize) -> Option<Vec<Merge<ContentHunk>>> {
pub fn parse_merge_result(input: &[u8], num_sides: usize) -> Option<Merge<ContentHunk>> {
let hunks = parse_conflict_into_list_of_hunks(input, num_sides)?;
let mut result: Merge<Vec<u8>> = Merge::from_vec(vec![vec![]; num_sides * 2 - 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can build Merge<ContentHunk>, or Vec<ContentHunk> and apply Merge::from_vec() at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you're trying to help out with this in #4106 . Thanks! I may clean it up in this PR, or we can do it separately; in any case we should figure out the bigger issues first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants