Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Formatter problem with multi-line strings #2270

Closed
jakepearson opened this issue Aug 1, 2023 · 26 comments
Closed

Formatter problem with multi-line strings #2270

jakepearson opened this issue Aug 1, 2023 · 26 comments
Labels
bug Something isn't working formatting

Comments

@jakepearson
Copy link
Contributor

I was writing a multi-line SQL as a def and found an interesting issue. I have turned off all extensions except for Calva and removed the contents of my cljfmt.edn to simplify and it still happens. I wasn't able to repro this by calling cljfmt from a prompt.

To repro, start with a file like

(ns a-project.repro)

(def a-def "
            
            ")

(defn a-function [])

Put your cursor on the end of the blank line after a-def. Type a letter and then format the file and it turns into

(ns little-differ.repro)

(def a-def "
a
")

(defn a-function [])"

Notice that an extra " is added to the end of the file after the last ). If I take out the newlines at the beginning and end of my string then no problems happen. Any ideas? Happy to work on a fix if people have ideas what I should do.

@bpringe
Copy link
Member

bpringe commented Aug 2, 2023

Notice that an extra " is added to the end of the file

I saw this at least once or twice recently and wasn't sure what caused it! Thanks for the repro.

Pinging @PEZ.

@bpringe bpringe added formatting bug Something isn't working labels Aug 2, 2023
@PEZ
Copy link
Collaborator

PEZ commented Aug 2, 2023

Thanks for the report!

I can reproduce it with this document:

 "
"

But only if I update the first line of the string. So it the above was the start document, and I update it such:

 "a
"

Then I get the extra quote, and also the content of the string gets updated, which might be a clue:

 " a
""

Then if I undo to get:

 "a
"

And format the document, nothing happens. Not even if I add something to get formatted do I get a repro of the bug:

 "
"
( a)

Becomes

 "
"
(a)

As it should. This is significant, because Calva checks the formatter result before applying any edits.

It all gets a bit extra crazy if we instead of Format Document, use Format Selection. If I have:

   "
"

Then select the string, and format selection, I get:

   "
   "

If I keep the selection and format again, I get:

   "
      "

So it seems it is the indent of the opening quote that gets inserted. It gets inserted on all lines in the string, following the first:

   "a
a
"

Format Selection a few times:

   "a
               a
               "

Updating the first line:

   "aa
               a
               "

and then selecting the string and format the selection:

   " aa
   a
   ""

Maybe there are two different bugs in play, even.

You are super welcome to have a look at this and what the potential fix might be, @jakepearson. Since we get different behaviour with the OP issue depending on wether the test is updated or not I suspect the bug is somewhere in the mutable TypeScript land. I also suspect it will be a bit of work to pinpoint the root cause.

The second issue/symptom is easier to reproduce without mutation, so maybe try to expose it with some tests in src/cljs-lib/test/calva/fmt/formatter_test.cljs, because this issue might be in the CLJS code.

Neither the TS, nor the CLJS code for all this is in very tidy shape, so if it all gets too messy to try to disentangle, just let me know and I could possibly shed some light on things. Or pick things up wherever you run out of stamina, should that happen. 😄

@PEZ
Copy link
Collaborator

PEZ commented Aug 2, 2023

As for further ideas of what to do, @jakepearson, I think it will be about using breakpoints and log-points in format.ts to try catch where the shenanigans starts. It's truly strange with this that it the same text gets differently treated depending on if it's recently updated or not. Strange to the point where it is slightly worrisome, I would say.

@jakepearson
Copy link
Contributor Author

Thanks for the starting ideas. I'll take some time next week (hackathon week).

@bpringe
Copy link
Member

bpringe commented Aug 3, 2023

It might also be a good idea to go back to commits of recent releases and see if you can reproduce the issue there. If you can't, then you have a smallish change set to provide clues on where the issue lies.

@bpringe
Copy link
Member

bpringe commented Aug 3, 2023

I say the above partly because I hadn't seen a double quote be added to a file until only recently.

@jakepearson
Copy link
Contributor Author

@bpringe Good call. It for sure seems like a recent issue.

@jakepearson
Copy link
Contributor Author

jakepearson commented Aug 7, 2023

Hi. I bisected and landed on this commit as the issue (d5f585a). I'll see if I can fiddle around with it a bit and get a fix. The lamest solution was a revert, but there was a conflict over the whole section.

@jakepearson
Copy link
Contributor Author

I might have hit where I can be useful with my very limited knowledge of how Calva works. Here is what I have learned:

it('is true in multiline string', () => {
  const contexts = context.determineContexts(
    docFromTextNotation('"•        abc|•        def•       •       "')
  );
  expect(contexts.includes('calva:cursorInString')).toBe(true);
});

Please let me know what the next steps should be for me or if you would like to take over.

Have a great day!

PEZ added a commit that referenced this issue Aug 10, 2023
@PEZ
Copy link
Collaborator

PEZ commented Aug 14, 2023

Sorry about the radio silence, @jakepearson! I had completely missed that you had come back with these results. Let me read what you have written and think a bit and I will try to figure what the next steps might be.

@PEZ
Copy link
Collaborator

PEZ commented Aug 14, 2023

One thing that this reminds me about is that we have a one or two tests involving strings that are disabled because they cause other tests to fail. I think that the lexical scanner might have a leak. No idea if that is related to this bug, but it could be...

@PEZ
Copy link
Collaborator

PEZ commented Aug 14, 2023

but makes a bunch of tests in cursor-token-test.ts fail

Do these tests involve withinString, or do they seem unrelated to that?

@jakepearson
Copy link
Contributor Author

It was last week (so my memory is mostly gone), but I think so.

@PEZ
Copy link
Collaborator

PEZ commented Aug 14, 2023

I'll make the same experiment and see what I get.

@jakepearson
Copy link
Contributor Author

@PEZ I was wondering if you had made any progress on this issue or if there was anything I could help with?

@PEZ
Copy link
Collaborator

PEZ commented Aug 23, 2023

Oh, I started from the end of cljfmt, trying to eliminate it as the source. Ended up upgrading it and wrecking havoc with peoples formatting configs... Then I forgot about why I had started it all. I did add a test or two trying to expose this, but also failed. And added a test file.

I'm a bit unsure what the next step is, tbh. Probably chasing the bug with breakpoints in format.ts and that test file I added: 061c1a0

@evaogbe
Copy link

evaogbe commented Jan 10, 2024

I'm also running into the same issue

Steps to Reproduce

  1. Create a file containing a multiline string literal
(def single-line-str
  "Ut labore et dolore")

(def multi-line-str
  "Lorem ipsum dolor sit amet
   consectetur adipiscing elit
   sed do eiusmod tempor incididunt")

(defn inc-fn
  [x]
  (
   inc x))
  1. Edit any line of the multiline string except the last one
    For example, add NEW TEXT to the second line of multi-line-str
  2. Run the "Format Document" command

Expected Outcome

The file is formatted (if formatting needed) and syntax remains valid

(def single-line-str
  "Ut labore et dolore")

(def multi-line-str
  "Lorem ipsum dolor sit amet
   consectetur adipiscing elit NEW TEXT
   sed do eiusmod tempor incididunt")

(defn inc-fn
  [x]
  (inc x))

Actual Outcome

The file is mangled:

  • Does not format the file
  • Adds a space to the front and end of every string literal in the file
  • Removes all whitespace from the beginning of every line of the multiline string
  • Adds a single quotation mark to the end of the file
  • The file is no longer valid Clojure syntax because of the unclosed ending quotation mark. (clj-kondo reports "Unexpected EOF while reading string.")
(def single-line-str
  " Ut labore et dolore ")

(def multi-line-str
  " Lorem ipsum dolor sit amet
consectetur adipiscing elit NEW TEXT
sed do eiusmod tempor incididunt ")

(defn inc-fn
  [x]
  (
   inc x))"

It mangles the file even if it's already formatted correctly and should not be changed at all. However, it doesn't mangle the file if the last change before formatting is not an edit to the multiline string.

I also tried adding a single unclosed quotation mark to the end of the file before formatting. It doesn't add a second quotation mark to the end of the file but it still does all the other mangling. It will add another unclosed quotation mark to the end of the file if I end the file with an empty string. So this bug will always make the file have invalid syntax.

Running the formatter on the command line (i.e. clojure-lsp format or clj -Tcljfmt fix) correctly produces the expected outcome.

@bpringe
Copy link
Member

bpringe commented Jan 17, 2024

Thanks for the additional info @eoogbe!

@bpringe
Copy link
Member

bpringe commented Jan 17, 2024

I did a little bit of looking into this so far. With the following text in a file:

(ns calva-issue-2270
  "To reproduce the issue, update the multi-line-str docstring then run the command Format Document.")

(def multi-line-str
  "Lorem ipsum dolor sit amet
   consectetur adipiscing elit
   sed do eiusmod tempor incididunt")

(defn inc-fn
  [x]
  (inc x))

If I put the cursor in the docstring of multi-line-str and format the document, at this line, missingTexts is:

{
  prepend: "\"",
  append: "\"",
}

which I find odd. Then healedText is:

'"(ns calva-issue-2270\n  "To reproduce the issue, update the multi-line-str docstring then run the command Format Document.")\n\n(def multi-line-str\n  " Lorem ipsum dolor sit amet\n   consectetur adipiscing elit\n   sed do eiusmod tempor incididunt")\n\n(defn inc-fn\n  [x]\n  (inc x))"'

Notice the original text of the document is now wrapped in double quotes. This then gets passed to formatCode, which returns a promise that resolves to the undesirable new document text:

'"(ns calva-issue-2270\n  " To reproduce the issue, update the multi-line-str docstring then run the command Format Document. ")\n\n(def multi-line-str\n  " Lorem ipsum dolor sit amet\nconsectetur adipiscing elit\nsed do eiusmod tempor incididunt ")\n\n(defn inc-fn\n  [x]\n  (inc x))"'

That text is later transformed in a way that the " at the beginning of the file is removed, but the one at the end remains (see the value of formattedText while debugging).

@bpringe
Copy link
Member

bpringe commented Jan 18, 2024

The output of getMissingBrackets here is different depending on whether the multi-line string was the last thing edited before formatting, even given the same exact string argument (the contents of the file), which means getMissingBrackets is stateful in a way that is causing undesirable returned data in the issue case. What it returns depends on whether the multi-line string was the last thing edited before it was run.

There may be other issues besides that, but I think starting with fixing the issue with getMissingBrackets would be wise.

@bpringe
Copy link
Member

bpringe commented Jan 18, 2024

@PEZ I don't know the history of the formatting code in Calva, but it seems that it does things in addition to what cljfmt does to format the text. Can you summarize what additional things Calva does and why? I'm wondering if it would make sense to get rid of the additional things (or some of them) and just use cljfmt as purely as possible.

Maybe some of those additional things were added to compensate for things cljfmt didn't do or wasn't good at in the past, but can handle now? I'm guessing also maybe some of the additional code is to handle VS Code details.

@PEZ
Copy link
Collaborator

PEZ commented Jan 18, 2024

@bpringe We do no additional formatting over what cljfmt does. There are some manipulation of the input we give cljfmt, which we do to:

  • prevent folding of brackets
  • prevent trimming of the empty line where the cursor is
  • get a new cursor position for the cursor on empty lines (we use this when the user hits enter, or presses tab on an empty line)

This is also partly achieved by relaxing some cljfmt settings when we perform on-type formatting (while editing, as opposed to direct formatting commands). We can get rid of some of this by abandoning the legacy indenter in favor of the “new” one. The problem with that is that the new one has some issues that people get around by switching to the old one. Fixing those issues would take us far. I have improved it quite a lot the past few months, but it is still misbehaving in some situations.

I'll have a look at your latest findings. Sounds a lot on the descriptions here like it has to do with the bug in the token scanner that has us disable some unit tests. Enabling these causes a lot of following unit tests to fail. Some statefulness there that we haven't figured out, but obviously need to figure out.

@PEZ
Copy link
Collaborator

PEZ commented Jan 18, 2024

Now had a bit of a look at your findings, @bpringe, especially this line:

const missingTexts = cursorDocUtils.getMissingBrackets(originalText);

With some luck the whole issue is less hairy than the ugly stateful problem we have in the token scanner. That line was added by me on June 25 as part of making unbalanced text format:

calva/CHANGELOG.md

Lines 171 to 174 in f3bca84

## [2.0.373] - 2023-06-27
- [Re-indent non-complete code, when pasting or formatting selection](https://github.com/BetterThanTomorrow/calva/issues/1709)
- Fix: [Selection, closing brackets, fails in mysterious ways](https://github.com/BetterThanTomorrow/calva/issues/2232)

Though, since you notice different behaviour of getMissingBrackets depending on some editor state, I fear we are actually running up against this unsolved problem with the scanner going crazy affecting future runs...

Seeing that this issue was filed a few days later, I now think it is a regression and some oversight on my part, maybe around how string quotes are also brackets. Maybe we can reproduce the error with some assertions here:

describe('addMissingBrackets', () => {
it('Adds missing brackets', () => {
const text = '(a b) (c {:d [1 2 3 "four"';
const append = ']})';
expect(utilities.getMissingBrackets(text)).toEqual({ append: append, prepend: '' });
});
it('Closes strings', () => {
const text = '(a b) (c {:d [1 2 3 "four';
const append = '"]})';
expect(utilities.getMissingBrackets(text)).toEqual({ append: append, prepend: '' });
});
it('Leaves non-broken structure alone', () => {
const text = '(a b) (c {:d [1 2 3 "four"] :e :f} g) h';
expect(utilities.getMissingBrackets(text)).toEqual({ append: '', prepend: '' });
});
it('Adds missing opening brackets', () => {
const text = '(a b) (c {:d [1 2 3 "four"]}) extra ] closing } brackets)';
const prepend = '({[';
expect(utilities.getMissingBrackets(text)).toEqual({ append: '', prepend: prepend });
});
it('Ignores brackets in strings', () => {
const text = '(a b) "{{" (c {:d [1 2 3 "four([{" [';
const append = ']]})';
expect(utilities.getMissingBrackets(text)).toEqual({ append: append, prepend: '' });
});
it('Ignores brackets in line comments', () => {
const text = '(a b) (c {:d \n; ( [ ( {\n[1 2 3 "four"';
const append = ']})';
expect(utilities.getMissingBrackets(text)).toEqual({ append: append, prepend: '' });
});
});

@bpringe
Copy link
Member

bpringe commented Jan 21, 2024

Thanks for the info @PEZ.

Maybe we can reproduce the error with some assertions here

I haven't done so yet, but I assume the tests will pass as expected since the issue seems to depend on an edit happening in a multi-line string prior to that function being run. Maybe we can reproduce that in a test, though. I'm not sure if that would involve an integration test, but maybe an integration test is a good idea.

Edit: I did add a unit test for it and it passed as expected.

@bpringe
Copy link
Member

bpringe commented Jan 21, 2024

Just to add some more info:

If I hardcode missingTexts here to be { prepend: '', append: '' }, then run the repro case, the issue does not occur.

That leads me to believe the issue may be fully or mostly isolated to getMissingBrackets.

@bpringe bpringe changed the title Formatter problem with multi-line strings if first and last line are blank Formatter problem with multi-line strings Jan 22, 2024
@jakepearson
Copy link
Contributor Author

Yay! Thanks so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working formatting
Projects
None yet
Development

No branches or pull requests

4 participants