Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved javascript regex highlight #1468

Closed
wants to merge 6 commits into from

Conversation

RexSkz
Copy link
Contributor

@RexSkz RexSkz commented Jul 9, 2018

Improve JS regex highlight according to references in https://regexr.com/.

Note: this PR can't recognize capture group because "matching bracket" is not a thing where a pure regex way can do.

Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR. I had something similar in mind, so I will share my own ideas as well.

  1. I don't think that this alias-approach for styling is good. The fact that the alias escape doesn't do any coloring in the default themes, doesn't help.
    (I personally prefer the subtle color changes for regexes like in editors like VS Code, but this requires that you change all themes Prism offers, so that might be a little overkill...)

  2. Please add highlighting for escapes and character ranges inside of character sets.
    This is where I for example use escapes the most often.

Apart from these two things, I really like it. Keep up the good work ^^

components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
@RexSkz RexSkz force-pushed the rex/improved-js-regexp branch 2 times, most recently from 6e43fda to 5ddc3fd Compare July 12, 2018 09:49
@RunDevelopment
Copy link
Member

Regarding the quantifiers:
I think that it's a good idea to keep them separately, but *, + and ? (for optional) are also quantifiers, so we should include them as well.
And when we include those, then maybe we should include ? (for lazy) as well? I'm not really sure, what do you think?

Regarding the escapes:
I still think that we should put anchors and default character classes into their own category, as they are not really escape sequences.

@RexSkz
Copy link
Contributor Author

RexSkz commented Jul 12, 2018

I started to think that highlighting regex is not a simple job, maybe I should treat regex as a kind of "pseudo language" rather than just improve highlight for some situations. I'll keep trying.

@RexSkz
Copy link
Contributor Author

RexSkz commented Jul 15, 2018

@RunDevelopment

Hi, I have re-written my code according to the reference in https://regexr.com.

20180715135337

In this reference, ? for both optional and lazy are called "quantifier".

Now the code can recognize flag, character set, escape, reference, anchor, quantifier and alternation.

It's a pity that recognizing capture group is not a thing where pure regex way can do, because regex can't be used for matching bracket. VSCode can't do this either, it can be proved from here (the class .mtk6 means normal character):

20180715140234

Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RexSkz

Nice work!
With this, you now support almost the full JS regex syntax!

I do have some suggestions:

  1. Run escape before any other pattern matching backslashes.
    This will ensure that you don't accidentally match something which is part of an escape sequence. Just add what you want to match later to the [^ ... ] at the end.
  2. Rename regex-charset1 to regex-charset and only let it capture stuff like [^...]. Also rename regex-charset2 to regex-charclass and let it capture ., \d can co.
    This will make it easier to correctly match \d (and co). Right now, \\d is matched as a charset1. Searching for it after the escapes ensures correctness. (This requires that the escapes don't match them.)

Apart from that and the minor things I commented on in the code, this looks pretty good.
Also, using VS Code and RegExr for reference was a good idea.

The only thing that is left to do now is to highlight the stuff that is inside of char sets and that's it.

components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
@RexSkz
Copy link
Contributor Author

RexSkz commented Jul 15, 2018

@RunDevelopment

Run escape first will seperate the regex (for example [a\\b\\c] will be seperated to [a, b and c]), so that \\b\\ won't be wrapped in charset.

Highlighting stuff inside the character set was not taken into consideration at first, but if it's necessary, using inside (and extra patterns) is a way. The problem is: what should I name the pattern like a-z or the ^ right after [?

Other suggestions seems great, I'll make changes to my code tomorrow, it's midnight here now :(

@RunDevelopment
Copy link
Member

@RexSkz

You are absolutely right. charset has to be run before escape. (I forgot to mention it...)

Don't worry, take your time ^^

@RexSkz
Copy link
Contributor Author

RexSkz commented Jul 16, 2018

@RunDevelopment

I have pushed another commit, which fixes the bugs and provides more test cases. The most important thing is: it can recognize stuff inside charset now!

Copy link
Member

@RunDevelopment RunDevelopment left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RexSkz

Looks really good! Well done!

Now, apart from the comments, I have some other things:

  1. You used the pattern for escape twice.
    It's quite long. Maybe you could extract it into a variable? (Don't forget to wrap the whole thing with a function block. Just like e.g. Crystal.)
  2. Maybe we could highlight the [] of a char set as something like punctuation?
  3. Ranges are hard. Do we really need them?
    (Look at my comment for more info)

components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
components/prism-javascript.js Outdated Show resolved Hide resolved
@RexSkz
Copy link
Contributor Author

RexSkz commented Jul 17, 2018

@RunDevelopment

All issues are fixed, range is removed.

@RunDevelopment
Copy link
Member

@RexSkz

Looks good!

From a correctness point of view, this PR is ready for merge, but the maintainers might have a thing or two to say about it.

@RunDevelopment RunDevelopment mentioned this pull request Dec 30, 2018
@mAAdhaTTah
Copy link
Member

Maybe this should be combined w/ #1714 and put into a js-extras language?

@RunDevelopment
Copy link
Member

This is already captured by #1682.

@RexSkz
Thank you for your hard work!

@RexSkz RexSkz deleted the rex/improved-js-regexp branch March 11, 2019 03:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants