Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve consistency in our parser code. #65508

Merged
merged 6 commits into from
Nov 29, 2022

Conversation

CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi CyrusNajmabadi commented Nov 18, 2022

Followup to #65480.

Should be reviewed with whitespace off.

Our parser code is heavily inconsistent from one method to the next. This makes understanding and maintenance more difficult, and makes it easier for bugs to creep in. New code doesn't have clear patterns to follow to keep things consistent. Note that this is not stylistic, but rather patterns that help ensure clarity and correctness.

Comments in-line in the major consistency efforts made in teh code.

return _syntaxFactory.AttributeList(
openBracket,
attrLocation,
_pool.ToListAndFree(attributes),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of try/finally that we use so that we can just return things directly without having an intermediary cleanup step, we instead have a clean pattern we use most of hte time to just ask the pool to free the list once we are done adding to it.

openBracket,
attrLocation,
_pool.ToListAndFree(attributes),
this.EatToken(SyntaxKind.CloseBracketToken));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as much as possible, parsing and creation of hte nodes tries to follow the design of a simple recursive descent parser. e.g. a construct like X -> A B C should be parsed as return new X(ParseA(), ParseB(), ParseC()). This makes the tree structure and recursive nature readily apparent, and gives no opportunity for confusing state tracking or the possibility for other complex code to run in between making htings more difficult to understand.

Note: there is more we can do here. For example, our list-parsing today always requires statement forms to get the list. This can be improved so even with complex syntactic constructs, we can model things as simple recursive descent for all portions of hte node. Leaving that for the future though as that will be a much larger change.

return _syntaxFactory.Attribute(name, argList);
return _syntaxFactory.Attribute(
this.ParseQualifiedName(),
this.ParseAttributeArgumentList());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: while there are many fixups like this in teh code, looking through, the majority of the existing constructs already follow the new form, so this is just brining things in line with a proven approach that has the benefits listed in teh prior comments.

}

return _syntaxFactory.BaseList(colon, _pool.ToListAndFree(list));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

7 lines to 1 simple idiomatic line

this.CurrentToken.Kind == SyntaxKind.ImplicitKeyword ||
this.CurrentToken.Kind == SyntaxKind.ExplicitKeyword ||
this.CurrentToken.Kind == SyntaxKind.OperatorKeyword;
return this.CurrentToken.Kind is SyntaxKind.ImplicitKeyword or SyntaxKind.ExplicitKeyword or SyntaxKind.OperatorKeyword;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

patterns make kind checks much simpler (And faster to boot, though likely in a way that is utterly negligible). The readability here is much better though.

@@ -4024,42 +3960,39 @@ private AccessorListSyntax ParseAccessorList(bool isEvent)
{
// parse property accessors
var builder = _pool.Allocate<AccessorDeclarationSyntax>();
try
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

above we do the pattern of default(SyntaxList<AccessorDeclarationSyntax>) and then lazily creating the builder. i'm super sketical this is worth it. but i didn't want to change that as i didn't want to accidentally cause a regression. I'll do a perf analysis of such a change in a spearate change in the future.

@@ -4270,7 +4203,7 @@ private enum PostSkipAction
nodes.Add(token);
}

trailingTrivia = (nodes.Count > 0) ? nodes.ToListNode() : null;
trailingTrivia = nodes.ToListNode();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToListNode() already does the Count check.

}
finally
{
_termState = saveTerm;
_pool.Free(variables);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the saving/restoring of _termState is something i'd like a helper for. But it may involve lambda callbacks, so i didn't want to make it without a plan to test perf.

Copy link
Member

@jjonescz jjonescz Nov 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why lambda callbacks? I imagine a disposable ref struct holding a ref to _termState.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lambda callback would be for the code to execute inbetween the termState change/restoration.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't using block be used instead? Like the following in this particular case:

using (new Temp(ref _termState, _termState | TerminatorState.IsEndOfFieldDeclaration))
{
    var variables = _pool.AllocateSeparated<VariableDeclaratorSyntax>();
    this.ParseVariableDeclarators(type, VariableFlags.Fixed, variables, parentKind);

    return _syntaxFactory.FieldDeclaration(
        attributes, modifiers.ToList(),
        _syntaxFactory.VariableDeclaration(type, _pool.ToListAndFree(variables)),
        this.EatToken(SyntaxKind.SemicolonToken));
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it could. but that means each case has to repeat teh same pattern. The idea would be to encapsulte the pattern in a helper that itself takes a callback for the work that is specific to that grammar production :)

return _syntaxFactory.TupleElement(type, name);
return _syntaxFactory.TupleElement(
ParseType(),
IsTrueIdentifier() ? this.ParseIdentifierToken() : null);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use of ternaries allows us to model constructs like X -> A B? C giving the same nice construction/recursive-descent parity/clarity. We use this in many other places, so this is just making things consistent here.

semi2,
_pool.ToListAndFree(incrementors),
this.EatToken(SyntaxKind.CloseParenToken),
ParseEmbeddedStatement());
}
finally
{
_termState = saveTerm;
this.Release(ref resetPoint);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i want to make it possible to use a simple using statement for reset-points in the future. Needing a try/finally to simulate this feels silly.

semicolon);
}
finally
{
_pool.Free(variables);
_pool.Free(mods);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, there's no helper for SyntaxTokenLists, but i want to add one to clean this up as well.

@@ -12611,11 +12477,11 @@ private AnonymousObjectCreationExpressionSyntax ParseAnonymousTypeExpression()
var openBrace = this.EatToken(SyntaxKind.OpenBraceToken);
var expressions = _pool.AllocateSeparated<AnonymousObjectMemberDeclaratorSyntax>();
this.ParseAnonymousTypeMemberInitializers(ref openBrace, ref expressions);
var closeBrace = this.EatToken(SyntaxKind.CloseBraceToken);
var result = _syntaxFactory.AnonymousObjectCreationExpression(@new, openBrace, expressions, closeBrace);
_pool.Free(expressions);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an example of needing to break things up over multiple statements to free things. Both this pattern and thr try/finally pattern go away with the simple ToListAndFree pattern that already existed.

@CyrusNajmabadi CyrusNajmabadi marked this pull request as ready for review November 19, 2022 02:31
@CyrusNajmabadi CyrusNajmabadi requested a review from a team as a code owner November 19, 2022 02:31
@CyrusNajmabadi
Copy link
Member Author

@jcouv this is ready for review.

@jcouv jcouv self-assigned this Nov 28, 2022
Copy link
Member

@jcouv jcouv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM Thanks (iteration 6)

@CyrusNajmabadi CyrusNajmabadi merged commit d40add2 into dotnet:main Nov 29, 2022
@CyrusNajmabadi CyrusNajmabadi deleted the parsingConsistency branch November 29, 2022 01:26
@ghost ghost added this to the Next milestone Nov 29, 2022
@allisonchou allisonchou modified the milestones: Next, 17.5 P2 Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants