Error parsing ternary operator assignment #20

rickymagner · 2024-04-23T14:28:39Z

Hi folks, I think I found a subtle bug in parsing a particular type of ternary operator assignment. Here are the steps to reproduce:

Make the following file called MinimalParse.wdl:

version 1.0

workflow MinimalParse {
    input {
        Int num = 8
    }

    Int new_num=if(num>5) then 1 else 2  # Add space after `if` to parse with wdl crate
}

Try to parse using the wdl crate. Here is a Rust snippet:

// Inside main
    let src = std::env::args().nth(1).expect("missing src");
    let contents = std::fs::read_to_string(src).expect("could not read file contents");

    // Generate the parse tree from the document source.
    let parse_result = grammar::v1::parse(&contents);
    let parse_tree = match parse_result {
        Ok(_) => {
            let inner = parse_result.unwrap();
            println!("Inner is: {:?}", &inner);
            let tree = inner.into_tree().expect("Cannot make tree");
            tree
        }
        Err(e) => {
            println!("{}", e);
            panic!()
        }
    };

The printed output is:

Inner is: Result { concerns: Some(Concerns(NonEmpty { head: ParseError(Error { message: "The following tokens are required: singular_identifier.", location: Position(Position { line_no: 8, col_no: 32, byte_no: 107 }) }), tail: [] })), tree: None }

Add a space after if in the original WDL, i.e. Int new_num=if (num>5) then 1 else 2 and rerun. It now parses successfully.

I'm not defending the gross-looking syntax where there's no space after if, but womtool validate passes and cromwell does in fact run the original WDL, so this seems to be a parsing bug.

Curiously enough, the bug doesn't seem to care about the other shifty looking lack of spaces. Adding a space before or after the = does nothing to fix this. It also doesn't seem to care if you use a space or not after if in normal blocks outside of a ternary assignment.

The text was updated successfully, but these errors were encountered:

claymcleod · 2024-05-02T14:22:54Z

@peterhuene would you mind taking a look at this one?

peterhuene · 2024-05-02T18:45:30Z

The root cause of the issue is that the pest grammar defines the if rule like this:

 "if" ~ (WHITESPACE | COMMENT)+ ~ expression ...

This precludes if(expr). If we change the rule to:

 "if" ~ (WHITESPACE | COMMENT)* ~ expression ...

Then if(expr) would parse, but then this would also parse:

Int new_num=iftrue then 1 else 2

The fix will need to be:

"if" ~ (&ANY_TOKEN_THAT_MIGHT_START_AN_EXPRESSION | (WHITESPACE | COMMENT)+) ~ expression ...

Where ANY_TOKEN_THAT_MIGHT_START_AN_EXPRESSION would be the token set:

(
!
+
-
{
[

Parsing a CST with pest can be error-prone as it forces the rules to think about whitespace and comments as part of the rule.

a-frantz · 2024-05-03T17:25:02Z

@peterhuene You might already see this, but reading your fix for the situation I think that would introduce more problems.

As I'm reading this, "if" ~ (ANY_TOKEN_THAT_MIGHT_START_AN_EXPRESSION | (WHITESPACE | COMMENT)+) ~ expression ..., the ANY_TOKEN rule could change the evaluation of the expression by moving a token with meaning out of the rule which gets evaluated. e.g. if!paired then... would be parsed into an ANY_TOKEN (!) and an expression (paired) which will have the opposite meaning of the intended expression, !paired. Which is not to say what you wrote is wrong, just not the entire fix, which speaks to a larger problem at hand.

Maybe I'm not saying anything you didn't already realize, but this example is proving to me that we need to really fight with pest to get it to parse as we want. It seems to me that pest is simply the wrong tool for the job. I'm moving further and further into the logos+rowan camp.

peterhuene · 2024-05-03T17:29:55Z

You're definitely correct! In the grammar we'd need the & prefix on the rule to match it, but not consume a token, I think.

I also wouldn't call it ANY_TOKEN_THAT_MIGHT_START_AN_EXPRESSION, either 😄; just an example of what a fix might entail (i.e. we need to lookahead for any expression starting token).

I've updated the original suggestion to include the & so that there's no confusion.

peterhuene · 2024-06-04T21:08:25Z

I can confirm that this is fixed with the new "experimental" parser implementation.

With our intention to move away from the pest-based parser, I don't think it's worth attempting a fix there.

When we switch parser implementations, I'll resolve this as fixed.

claymcleod assigned peterhuene May 2, 2024

claymcleod added the bug Something isn't working label May 2, 2024

claymcleod changed the title ~~[BUG] Error parsing ternary operator assignment~~ Error parsing ternary operator assignment May 2, 2024

peterhuene linked a pull request Jun 13, 2024 that will close this issue

feat: make the new parser implementation the only one. #79

Merged

8 tasks

peterhuene closed this as completed in #79 Jun 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error parsing ternary operator assignment #20

Error parsing ternary operator assignment #20

rickymagner commented Apr 23, 2024

claymcleod commented May 2, 2024

peterhuene commented May 2, 2024 •

edited

Loading

a-frantz commented May 3, 2024

peterhuene commented May 3, 2024 •

edited

Loading

peterhuene commented Jun 4, 2024

Error parsing ternary operator assignment #20

Error parsing ternary operator assignment #20

Comments

rickymagner commented Apr 23, 2024

claymcleod commented May 2, 2024

peterhuene commented May 2, 2024 • edited Loading

a-frantz commented May 3, 2024

peterhuene commented May 3, 2024 • edited Loading

peterhuene commented Jun 4, 2024

peterhuene commented May 2, 2024 •

edited

Loading

peterhuene commented May 3, 2024 •

edited

Loading