Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] CPP raw string literals with quotes in delimiter breaks colorization #3128

Closed
1 of 2 tasks
nebularnoise opened this issue Jun 2, 2022 · 3 comments · Fixed by #4436
Closed
1 of 2 tasks

[Bug] CPP raw string literals with quotes in delimiter breaks colorization #3128

nebularnoise opened this issue Jun 2, 2022 · 3 comments · Fixed by #4436
Labels
bug Issue identified by VS Code Team member as probable bug help wanted Issues identified as good community contribution opportunities tokenization
Milestone

Comments

@nebularnoise
Copy link

Reproducible in vscode.dev or in VS Code Desktop?

  • Not reproducible in vscode.dev or VS Code Desktop

Reproducible in the monaco editor playground?

Monaco Editor Playground Code

#include <string>
#include <iostream>


int main(){
    
    auto s = R""""(
    Hello World
    )"""";

    std::cout << "hello";
    
}

Actual Behavior

Colorization is broken in the code window, after the raw string literal.
This seems to be due to the used delimiter, which contains double quotes.

Expected Behavior

The following line

std::cout << "hello";

Should not be colorized fully in 'string colour'.

Additional Context

This issue was first opened on godbolt, I was told to report it upstream.
compiler-explorer/compiler-explorer#3684

Note: for the playground editor, I went here: https://microsoft.github.io/monaco-editor/index.html
set the language to CPP before pasting the code, the playground seemed JS only.

Screenshot_20220518_124917

@hediet hediet added bug Issue identified by VS Code Team member as probable bug help wanted Issues identified as good community contribution opportunities tokenization labels Jul 19, 2022
@hediet hediet changed the title [Bug] CPP raw string literals with quotes in delimiter breaks semantic colorization [Bug] CPP raw string literals with quotes in delimiter breaks colorization Jul 19, 2022
@hediet
Copy link
Member

hediet commented Jul 19, 2022

I'm very sure this is a problem in the monarch grammar.

@jeremy-rifkin
Copy link
Contributor

I think this might not be possible to implement in monarch, in the general case.

The rule for finding the end of a raw string in https://github.com/microsoft/monaco-editor/blob/main/src/basic-languages/cpp/cpp.ts is /(.*)(\))(?:([^ ()\\\t"]*))(\")/, so it's only looking at what's between the ) and ". Then there's a $3==$S2 case to check if that sequence matches the one at the start of the string.

Maybe there is some fancy state machine trick that can be applied. Otherwise one temporary solution could be to hardcode tests for sequences of <=10 characters, for example.

@jeremy-rifkin
Copy link
Contributor

jeremy-rifkin commented May 25, 2023

I thought there might be some way to do it with dot-separated sub-states but now I don't think that's the case. It doesn't appear we can write $S2 within a regex string, but that would be nice.

Maybe there's a way with goBack but I'm not thinking of it.

If only multi-line regexes worked, it could just be /@encoding?R\"(?:([^ ()\\\t]*))\(.*\)\1\"/m.


Edit the next day: I've learned the standard does specify a maximum length of 16 characters for the delimiter sequence so hard-coding would be possible just horribly ugly. I've opened a PR at microsoft/vscode to expand the functionality of monarch and hopefully that goes somewhere.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
bug Issue identified by VS Code Team member as probable bug help wanted Issues identified as good community contribution opportunities tokenization
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants