Extract askama_parser from askama_derive #782

couchand · 2023-02-28T16:20:02Z

This builds on the great work by @Kijewski in #772 to make the parser code more maintainable, and replaces the draft PR on this matter at #748.

An attempt is made to keep the code unchanged where possible. Several minor changes were made to try to keep the dependencies and API surface minimal, including:

removing the span from CompileError (it's always Span::call_site anyway)
effectively reverting the revert at 747958b. I'm curious to know if any benchmarking has been done on the use of borrowed Strings in the Syntax struct -- in my back-of-the-napkin analysis I don't see how that one-time copy of six short strings would be performance-sensitive.

Next steps include bringing over the parser AST documentation that I had started on #748, and additional API cleanup as necessary.

c.f. #760

Kijewski · 2023-02-28T19:29:32Z

Could you please split up the commit a little? Your explanations sound sensible, but it would be a bit easier to review if every change was its own commit (removing Span, making Config 'static again, moving the code).

Cc @GuillaumeGomez

GuillaumeGomez · 2023-02-28T21:08:02Z

Seems like we can now create the parser outside of the proc macro so that looks good to me. 👍

couchand · 2023-02-28T22:26:21Z

@Kijewski I've split things up here into hopefully bite-sized pieces. Please let me know if you there's anything else you think could use improvement.

couchand · 2023-02-28T23:08:21Z

Upon further review, I'm not sure the cut is in exactly the right place here. Towit: Syntax should be in askama_parser but I now think Config should remain in askama_derive. I'm going to take a closer look at what it would take to split up the config module.

couchand · 2023-03-01T05:23:15Z

Thanks for your patience @Kijewski @djc @GuillaumeGomez. I've re-split the crates per my previous comment, and I think the commit sequence looks good.

Please take another look and let me know any feedback you have, thank you!

djc

I agree with most of the direction here, though I have bunch of (mostly stylistic) nits.

djc · 2023-03-01T10:50:27Z

askama_derive/src/parser/syntax.rs

@@ -0,0 +1,22 @@
+#[derive(Debug)]


I don't like tiny modules like this. Please move it to the bottom of parser/mod.rs instead.

djc · 2023-03-01T10:53:40Z

askama_derive/src/parser/error.rs

+use std::borrow::Cow;
+use std::fmt;
+
+#[derive(Debug, Clone)]


It makes no sense to stick CompileError in parser. We can just return Cow<'static, str> directly in the Result from parse(), instead.

Roger that. I had not grokked that it's not used internally within the parser, just at that top-level parse method. After further reflection, I'd prefer an error type with public accessors to get the line and column numbers and input snippet, and perhaps in the future additional details.

After looking at the three error cases returned by the parse method, I believe they can be coalesced into a single case (logically, if not at the type level) with the combinators nom::combinator::all_consuming and nom::compbinator::complete, so I'd propose doing that as well.

I'm fine with having a fully-formed Error type, if you want to wire something up -- it just shouldn't be the CompileError type that's used for all the other stuff in derive.

djc · 2023-03-01T10:55:54Z

askama_derive/src/generator.rs

@@ -1,7 +1,8 @@
 use crate::config::{get_template_source, read_config_file, Config, WhitespaceHandling};
 use crate::heritage::{Context, Heritage};
 use crate::input::{Print, Source, TemplateInput};
-use crate::parser::{parse, CompileError, Cond, CondTest, Expr, Loop, Node, Target, When, Whitespace, Ws};
+use crate::parser::node::{Cond, CondTest, Loop, Node, Target, When, Whitespace, Ws};


At the current number of items, re-exporting items from the nested modules in the parser module seems like a better solution to me than exposing parser's internal module structure.

I imported like this to make the change minimal, but if I had been writing the code from scratch all the references here would be in the style node::Cond, node::When, etc. But I hear you on the blanket imports here. If we're expecting them to be imported directly, would it be a good idea to make the names more verbose (e.g. LoopNode)?

I don't think so? Again, the number of types exposed here is pretty small, and pretty much all of them are nodes anyway. You added a bunch of documentation, I think that should be good enough.

djc · 2023-03-01T10:56:58Z

askama_parser/src/expr.rs

@@ -35,17 +35,17 @@ pub(crate) enum Expr<'a> {
 }

 impl Expr<'_> {
-    pub(super) fn parse(i: &str) -> IResult<&str, Expr<'_>> {
+    pub(crate) fn parse(i: &str) -> IResult<&str, Expr<'_>> {


These are actually identical in this case, just leave them as-is for now.

djc · 2023-03-01T10:57:44Z

askama_parser/src/lib.rs

-pub(crate) mod expr;
-pub(crate) mod node;
-pub(crate) mod syntax;
+pub mod error;


Should be kept pub(crate).

No longer applicable.

djc · 2023-03-01T11:09:52Z

askama_parser/src/lib.rs

+//! which represent either some literal text in the template or one of the
+//! three types of template tags: expressions, blocks, or comments.
+//!
+//! The main entry point to this crate is the [`parse()`](./fn.parse.html)


I don't think this is a very nice public API. Maybe change it to wrap a struct Ast<'a> { nodes: Vec<Node<'a>> } and change parse() to become Ast::from_str()? (In a separate commit.)

I'd agree, and propose that the type be named Template.

I don't like Template since that's already what the trait is called.

Initially I was thinking std::str::FromStr, but upon review I noticed this:

FromStr does not have a lifetime parameter, and so you can only parse types that do not contain a lifetime parameter themselves.

Perhaps you mean a non-trait from_str method? Since it also takes a Syntax parameter, I'm wondering if from_str fully captures the meaning.

As to the new struct's name, some things stood out. The parse_template helper is what produces this top-level result and the bodies of loops, if and else clauses, etc. and it makes sense to unify the concepts. Those elements are frequently referred to as blocks. Perhaps struct Block<'a> { nodes : Vec<Node<'a>> }?

djc · 2023-03-01T11:10:36Z

askama_parser/src/node.rs

 #[derive(Debug, PartialEq)]
 pub enum Node<'a> {
+    /// Literal text to output directly.
+    ///
+    /// The first and third tuple elements are the left- and right-side


If making this API public, consider changing these variants to name their fields instead of using tuple variants. (Separate commit.)

Following the API refactor, these tuple variants all seem self-explanatory, please advise if you disagree.

djc · 2023-03-01T11:12:27Z

askama_parser/src/node.rs

    Name(&'a str),
+    /// Destruture a tuple value.


Thanks! Done.

djc · 2023-03-01T11:13:17Z

askama_parser/src/node.rs

 ///
-/// Second field is "minus/plus sign was used on the right part of the item".
+/// First tuple value is the setting (`-`/`+`/`~`) for the left side of the block.


I would probably name the fields here.

djc · 2023-03-01T11:13:52Z

askama_parser/src/node.rs

    pub ws3: Ws,
 }

+/// A single branch of a match block.


I'm not very fond of type aliases these days. I think we should make When a proper struct with fields, and same for Cond.

Kijewski · 2023-03-01T11:26:42Z

askama_derive/src/generator.rs

+        Err(e) => syn::Error::new(Span::call_site(), e)
+            .to_compile_error()
+            .into(),


At this point I guess we could implement to_compile_error() manually (untested):

format!("::core::compile_error!({:?});", format_args!("{e}")).parse()

If we can omit the proc_macro2 dependency altogether in this case.

Done, although proc_macro2 is still required by a number of the integration crates.

couchand · 2023-03-03T23:13:37Z

For those following along at home, the requested changes not related to API improvements and documentation are up at main...couchand:askama:2023-03/extract-parser

couchand · 2023-03-06T20:18:37Z

I've rewritten the patch set with the idea that we want to keep everything within askama_derive until we've completely refactored the parser API to a public form. I've removed the net new stuff (capturing comment bodies and the interstitial between match and the first match arm), figuring that it's better to focus just on extracting the parser as it's used here.

There are a large number of commits that are each the implementation of some big picture idea for one of the AST node types, repeated for each node type. These goals are all in pursuit of making the AST well-structured, so that containment within sub-blocks is clearly maintained. The major sticking point to that was the treatment of Ws. Here are the various threads:

Redefine Ws semantics for nodes with nested blocks to keep the Ws for the "outside" of the node in one place, and associate the "inside" Ws with the sub-block that it logically contains.
Create the Tag struct to represent any Askama tag with its "outer" Ws, migrate non-Lit nodes to Tag.
Create the Block struct to represent a sub-block with its "inner" Ws, migrate all paired Vec<Node<'a>> and Ws to Block.
Extract structs for each non-trivial Tag.

I appreciate your patience reviewing this patch set, hopefully I have made each commit small enough that it is not too much of a bear to go through it. Happy to take any feedback or suggestions!

Kijewski · 2023-03-06T21:15:25Z

I like the PR very much! Every change is well thought out and well explained.

Actually the only thing I don't like is the name "Tag". I don't associate anything with the name, but maybe that's just me. I guess if we apply the names XML uses, then our "Tag" is an "Element", our "Lit" is "Data", and our "Node" is, well, a "Node".

djc · 2023-03-07T10:48:54Z

I appreciate your patience reviewing this patch set, hopefully I have made each commit small enough that it is not too much of a bear to go through it. Happy to take any feedback or suggestions!

I would prefer to merge something closer to the original scope of the PR, before going into increasing depth of changing the structure of the AST types to your preferences. In particular, it would be nice to get the documentation changes in separately, because one potentially useful input to judging AST type refactoring is to review the diff's line count changes.

In its current state (with a bunch of changes with naming choices I question and other choices that don't have obvious value to me), I don't think I'll be able to prioritize digesting this PR among the other ~70 GitHub notifications I should be attending to.

couchand · 2023-03-07T15:02:51Z

I like the PR very much! Every change is well thought out and well explained.

Thanks @Kijewski!

I would prefer to merge something closer to the original scope of the PR

I hear you @djc. If you'd prefer separate PRs for the refactorings rather than including it all here, I'm happy to split those commits out. Before doing so, I'd like to get general consensus on the overall direction so I'm not spinning my wheels. (It's a lot of commits because there are many variants but it's only a handful of "changes", and without the context of the end goal each can seem to lack purpose).

This PR works backwards from the last commit (which extracts the crate). Per previous discussion, it seems wise to figure out the public API before extracting the code rather than after. So based on your previous comments I worked to design that public API in a way that's internally consistent and exposes the structure a downstream consumer would need.

one potentially useful input to judging AST type refactoring is to review the diff's line count changes.

I'd suggest one of the best ways to judge this refactor is to look at askama_derive/src/generator.rs. That file lost 40 lines net, basically just by consolidating a bunch of handle_ws, prepare_ws, and flush_ws calls. These whitespace-handling calls are sprinkled about the generation code, somewhat inconsistently, and it's not clear what the principled place is for any one of them. The new AST structure pairs the whitespace handling to the template structure it corresponds to (rather than merely its location in the textual source), so it's clear where whitespace handling must be done.

with a bunch of changes with naming choices I question

Happy to take naming suggestions. The structs currently called Tag and Block are big and visible so I'm not surprised we'd want to bikeshed on them for a bit. Otherwise new names tend to follow directly from context, but please do leave comments on specific examples.

djc · 2023-03-08T10:53:12Z

I would prefer to merge something closer to the original scope of the PR

I hear you @djc. If you'd prefer separate PRs for the refactorings rather than including it all here, I'm happy to split those commits out. Before doing so, I'd like to get general consensus on the overall direction so I'm not spinning my wheels. (It's a lot of commits because there are many variants but it's only a handful of "changes", and without the context of the end goal each can seem to lack purpose).

I'm not sure what overall direction you mean. If by overall direction you mean a bunch of the refactoring of the Node type then there is no good way to get to consensus without me having to plow through all the changes to get a good sense of what they are, so that doesn't help me deal with reviewing this.

This PR works backwards from the last commit (which extracts the crate). Per previous discussion, it seems wise to figure out the public API before extracting the code rather than after. So based on your previous comments I worked to design that public API in a way that's internally consistent and exposes the structure a downstream consumer would need.

"internally consistent" is too abstract to be meaningful to me, and "the structure a downstream consumer" seems quite specific to your opinions -- I think the existing code generator is a downstream consumer and while it might not be the prettiest code, it does okay.

I'd suggest one of the best ways to judge this refactor is to look at askama_derive/src/generator.rs. That file lost 40 lines net, basically just by consolidating a bunch of handle_ws, prepare_ws, and flush_ws calls. These whitespace-handling calls are sprinkled about the generation code, somewhat inconsistently, and it's not clear what the principled place is for any one of them. The new AST structure pairs the whitespace handling to the template structure it corresponds to (rather than merely its location in the textual source), so it's clear where whitespace handling must be done.

I can get behind the idea that the whitespace handling could be more principled. Maybe that should be a separate first PR, with minimal changes to the rest of the structure and naming?

with a bunch of changes with naming choices I question

Happy to take naming suggestions. The structs currently called Tag and Block are big and visible so I'm not surprised we'd want to bikeshed on them for a bit. Otherwise new names tend to follow directly from context, but please do leave comments on specific examples.

I don't like Tag and I don't like Block in the way it currently seems to be used. In general, it's not obvious me to me that splitting the Node type in Tag and Block makes things clearer. In particular, there's the notion of a block in the semantics here which we would not want to confuse with the notion of a Block here, which is different. You kept the Node type but it actually no longer represents the highest-level type in the AST type hierarchy, which IMO is confusing. In the old structure, things with {% %} were nodes and things with {{ }} were expressions, which seems conceptually clear. The distinction between Node, Tag and Block seems pretty unclear to me (and comments saying that blocks come in three flavors certainly don't make it seem like the hierarchy you've picked is clearly represented in the type system).

The way to avoid bikeshedding is to avoid renaming things for now while we get the structure right, then renaming things can come last.

couchand · 2023-03-08T15:12:06Z

Ok, to be honest I'm struggling with what's actionable here. Do you want me to:

rename things before you review in detail?
submit a refactoring PR first?
- if so, do you have a suggestion about which commits should be included?
just be patient?

I'm not sure what overall direction you mean.

I mean the shape of the new public API. Here is one proposal, if it's not right let's discuss where it falls short so that my next attempt can meet your goals.

"the structure a downstream consumer" seems quite specific to your opinions -- I think the existing code generator is a downstream consumer and while it might not be the prettiest code, it does okay.

These changes are mostly driven by the generator code, the whitespace handling in particular. I've even removed the new things I'd added for other potential users. But, I'd point out, we also shouldn't totally ignore potential new users when designing a new public API.

I can get behind the idea that the whitespace handling could be more principled. Maybe that should be a separate first PR, with minimal changes to the rest of the structure and naming?

I hear what you're saying, but I'm not sure if there's a big win to be had there? I don't know if it really makes sense without the structural changes -- the structure is what makes it principled, just as with structured programming generally.

By my count, that PR would have 51 of the 64 commits from this one, everything from 1366d70 to 09a264a. Is there a smaller subset of these commits that you'd suggest go first?

In particular, there's the notion of a block in the semantics here which we would not want to confuse with the notion of a Block here, which is different.

Fair point, though OTOH, it's already a bit muddled given usage like:

askama/askama_derive/src/parser/node.rs

Line 66 in 077ba18

pub(crate) else_block: Vec<Node<'a>>,

askama/askama_derive/src/heritage.rs

Line 62 in 077ba18

Some(_) => return Err("multiple extend blocks found".into()),

askama/askama_derive/src/heritage.rs

Line 76 in 077ba18

"extends, macro or import blocks not allowed below top level".into(),

askama/askama_derive/src/generator.rs

Line 638 in 077ba18

Node::Loop(ref loop_block) => {

You kept the Node type but it actually no longer represents the highest-level type in the AST type hierarchy, which IMO is confusing.

FWIW, Node was not previously the highest-level in a template's AST either, that honor goes to Vec<Node>, which as you previously commented is a bit awkward.

In the old structure, things with {% %} were nodes and things with {{ }} were expressions

Well, not exactly. In the old structure, {{ }} became Node::Expr(Expr), {# #} became Node::Comment, and {% %} became Node::AnyOtherVariantExceptLit, none of which has changed much.

The distinction between Node, Tag and Block seems pretty unclear to me

If I may, a Node is either Literal text or an Askama Tag, and a Block is a sequence of nodes. All Tags have "outer" Ws that applies to the last whitespace before and the first whitespace after. All Blocks have "inner" Ws that applies to the first and last whitespace within.

comments saying that blocks come in three flavors certainly don't make it seem like the hierarchy you've picked is clearly represented in the type system

I played around with having an enum with variants for these (Jinja docs call them "delimiters"), but I didn't find it to be practically useful either for the generator code or for writing clear documentation, so I would now consider that a non-goal.

What is incredibly useful is correlating whitespace suppression to the structure it concerns, which is why my focus is there.

The way to avoid bikeshedding is to avoid renaming things for now while we get the structure right, then renaming things can come last.

I strongly agree. Let's focus on the purpose of the structures first and then later identify the right names for them.

djc · 2023-03-13T13:22:49Z

"just be patient" won't get us going anywhere. Honestly, I don't have clear idea on how to navigate this either. Maybe an idea would be to first transpose the prev_ws/next_ws structure to outer_ws/inner_ws, while otherwise keeping the current structure? Would that be enough to create more of the structure you got to in this PR?

Another approach would be to start by landing smaller parts that we have consensus on already, to at least make some progress.

FWIW, Node was not previously the highest-level in a template's AST either, that honor goes to Vec<Node>, which as you previously commented is a bit awkward.

It's not obvious to me that it's useful/necessary to wrap (the equivalent of) Vec<Node> in another type (except at the top level, where we can attach a parse() function to it).

If I may, a Node is either Literal text or an Askama Tag, and a Block is a sequence of nodes. All Tags have "outer" Ws that applies to the last whitespace before and the first whitespace after. All Blocks have "inner" Ws that applies to the first and last whitespace within.

In my mind what you call Tag is closer to what was referred to as Block before. Some additional structure might be needed in between because that's a helpful way to track whitespace.

djc · 2023-07-31T08:40:05Z

#834 extracts a parser crate and improves the parser API a bunch.

djc · 2023-07-31T08:41:19Z

I'm going to close this, as the stated goal (extract parser crate) has been achieved and the remaining changes will need substantial work to rebase -- it probably makes sense to start a new PR instead.

couchand mentioned this pull request Feb 28, 2023

Extract parser #748

Closed

couchand force-pushed the 2023-02/extract-parser branch from e9782e7 to c58643c Compare February 28, 2023 16:22

Kijewski self-requested a review February 28, 2023 19:25

GuillaumeGomez approved these changes Feb 28, 2023

View reviewed changes

couchand force-pushed the 2023-02/extract-parser branch 2 times, most recently from ee1994a to 109bb62 Compare February 28, 2023 22:22

couchand force-pushed the 2023-02/extract-parser branch 2 times, most recently from 9136054 to 51d49e8 Compare March 1, 2023 05:18

djc reviewed Mar 1, 2023

View reviewed changes

Kijewski reviewed Mar 1, 2023

View reviewed changes

couchand force-pushed the 2023-02/extract-parser branch 3 times, most recently from 3880bf4 to a1481da Compare March 6, 2023 20:00

couchand added 8 commits March 7, 2023 09:41

Remove proc_macro2 dependency from CompileError

62b0207

Move Syntax struct to parser module

9133bf3

Remove dependency on CompileError from parser

58157fe

Add test for parsing Node::Call

1366d70

Extract Call struct from Node

2857a41

Extract Match struct from Node

c32c2ef

Make When a proper struct

21ce716

Make Cond a proper struct

6402dc9

couchand added 20 commits March 7, 2023 09:51

Reify Tag::Super out of Tag::Call

99bd386

Move Extends into Tag

3ab6fcc

Use Block for When block

fbc3aae

Use Block for Loop body and else blocks

cf6a961

Use Block for Macro body

a5d6931

Use Block for BlockDef block

aee4e85

Use Block for Cond block

9ffa1ad

Use Block for parser::parse return value

f1b8b43

Pass whole Block into Generator::handle

0073a21

Handle Ws within Generator::handle

09a264a

Unify parser error cases using nom combinators

63ecf38

Introduce ParseError struct

bdde71f

Move Whitespace and Ws to parser

4b5e3a6

Name fields on Ws struct

a1cd50b

Document ParseError, rename row() to line()

55747de

Document parser::expr module

74825bf

Document parser module items

1dce292

Document parser::node module

9a1fa73

Document parser module

51af19f

Extract askama_parser from askama_derive

26e5eff

couchand force-pushed the 2023-02/extract-parser branch from a1481da to 26e5eff Compare March 7, 2023 14:53

couchand mentioned this pull request Jun 24, 2023

Template node structs & whitespace semantics #829

Open

djc mentioned this pull request Jul 1, 2023

Extract a parser crate #834

Merged

djc closed this Jul 31, 2023

Extract askama_parser from askama_derive #782

Extract askama_parser from askama_derive #782

Conversation

couchand commented Feb 28, 2023

Kijewski commented Feb 28, 2023

GuillaumeGomez commented Feb 28, 2023

couchand commented Feb 28, 2023

couchand commented Feb 28, 2023

couchand commented Mar 1, 2023

djc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

couchand Mar 3, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

couchand commented Mar 3, 2023

couchand commented Mar 6, 2023 • edited Loading

Kijewski commented Mar 6, 2023

djc commented Mar 7, 2023

couchand commented Mar 7, 2023

djc commented Mar 8, 2023 • edited Loading

couchand commented Mar 8, 2023 • edited Loading

djc commented Mar 13, 2023

djc commented Jul 31, 2023

djc commented Jul 31, 2023

couchand Mar 3, 2023 •

edited

Loading

couchand commented Mar 6, 2023 •

edited

Loading

djc commented Mar 8, 2023 •

edited

Loading

couchand commented Mar 8, 2023 •

edited

Loading