-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enh(cpp) Improve C++ class template headers #2728
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request for comment:
template <int a, int b, bool c = (a > b), class T> void f1(); | ||
template <int a, int b, bool c = (a >> b), class T> void f2(); | ||
|
||
template <int a, int b, bool c = (a < b), class T> void g1(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is failing, and I don’t know why: I thought my rules would handle the nested parenthesised expression — and this works for the corresponding test case above, i.e. for (a > b)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd guess the <
in a < b
is starting a new TEMPLATE, messing everything up. I'm not sure I see anywhere in the code where you handle this possibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, of course! I actually don’t know how to handle this: I thought about adding )
to the end
of the TEMPLATE_USE
, mode; this works, but then the mode started by (
won’t terminate, even when I also set excludeEnd: true
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding the following to the TEMPLATE_USE
mode seems to work:
'on:begin': (matchData, response) => {
const rest = matchData.input.slice(matchData.index + 1);
let parens = 0;
let brackets = 1; // start with `<`
for (let i = 0; i < rest.length; i++) {
const c = rest.charAt(i);
switch (c) {
case '(': parens++; break;
case ')': parens--; break;
case '<': brackets++; break;
case '>': brackets--; break;
}
if (brackets == 0) return;
if (parens == -1) {
response.ignoreMatch();
return;
}
}
},
However, this feels like overkill (potentially also inefficient?); it also only works in cases where the comparison expression is nested inside parentheses, i.e. it works for this test case but not the next one.
If this approach is taken at all, we’d probably want to add some emergency stop characters (;
, {
) to the switch
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is too much... and definitely has performance implications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the best maybe we can do here is liberal use of endsWithParent to make sure we escape if we hit a {
}, line end, etc... and try to return to normal matching...
template <int a, int b, bool c = (a < b), class T> void g1(); | ||
class C; | ||
|
||
template <int a, int b, bool c = a < b, class T> void g1(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is also failing, and there’s probably no fix, since it’s context sensitive: a<b, class T>
could be a valid template parameter, if a
were a template template parameter (i.e. if we had template <typename> typename a
instead of int a
). Of course in that case we’d be missing an additional terminating >
, but I can’t see how to make the parser aware of this since it comes (potentially much) later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could do someone like this with a callback that looks ahead and does code analysis, but that's definitely crossing over into "parsing" territory I think so I don't think we'd want to do that.
contains: [ | ||
{ | ||
begin: /</, end: />/, | ||
endsWithParent: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so you know this endsWithParent
does not auto-recurse... it simply means that within the scope of the <...>
rule (line 230/231) that it will attempt to match not only the contains rules on line 235, but ALSO the end rule on line 228. But it doesn't go any deeper than that unless you added endsWithParent
at every level...
I'm not sure if this has implications for your runaway guard on line 227 or not...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question! So, if I understand you correctly, I could equivalently remove endsWithParent
, and change the end
rule to /[>;{]/
(i.e. the same as the parent end
)? In this case I should probably do that instead, and add the same end to the other contained rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well if the desire is that they end with the parent you'd probably want to just use that... that's what it does - pull the parent "end" down into the children... I was just pointing out that it only goes one layer deep... so if it got "stuck" in a child of a child then your "guard" would not save you... so that needs to be considered and might result in adding some more "false positive" edge case tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that works now. I needed to add a nested {begin: />/}
to prevent a >
inside a (…)
from terminating the template declaration mode prematurely. That said, the behaviour on my unit tests is actually the same as before, I need to think of some example which works now and would previously have failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Damn, I think I’ve just kicked the can down the road now. I think this fundamentally doesn’t work because I would need to nest the parent mode inside the mode that handles (…)
. With my newest change, the the two first lines parse correctly, but the template declaration in the third line ends prematurely, and the “class title” starts at class T
rather than at class C
:
template <int A = (B<C>), class T> class C;
template <int A = B<(c > D<E>)>, class T> class C;
template <int A = B<(c > D<(e > f)>)>, class T> class C;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
var TEMPLATE_USE = {};
Object.assign(TEMPLATE_USE, {
// .. can now use TEMPLATE_USE recursively
All you need is a reference. :-) Though I'm not sure we do this anywhere else and not sure it's a pattern we should start using now though. ...
I'm guessing the nesting here could be unlimited but is there some practical limits for MOST C++ code as far as what you'd see in the real world?
Closing this in favor of #2752. Thanks for all the hard work on this though - it's been appreciated! Hopefully you've learned a bit about the grammars and parser that will help you if you continue to contribute. :-) If we find more failure cases we can always re-review, but this was getting way to complicated. |
Yes, I’m definitely in favour of closing this for now and re-submitting another PR later (in particular, after splitting the definitions for C and C++), I just need to find some time … |
This fixes #2716 and it fixes #21021. In order to parse class and function headers correctly, it is insufficient to just skip greedily (or, worse, non-greedily) over
/<.*>/
, since C++ template declarations as well as template instantiations can contain nested template parameter lists. As a further complication, they can even contain the operators<
,>
,<<
and>>
(as part of non-type template parameters or their defaults (in declarations)).This PR attempts to be a minimal change — a much more comprehensive refactor of the C and C++ language definitions is somewhat overdue (see #2173 and #2146). But this will take time, and in the meantime the improvement from this PR is probably “good enough”.
This PR is a WIP: there are some open issues that I’ve commented inside the changes.
1 The failing code from the second issue isn’t added as a test case because hljs still fails to highlight it as a function declaration, but that’s unrelated to templates and rather because hljs currently doesn’t recognise operator overload definitions at all.