Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error parsing ternary operator assignment #20

Closed
rickymagner opened this issue Apr 23, 2024 · 5 comments · Fixed by #79
Closed

Error parsing ternary operator assignment #20

rickymagner opened this issue Apr 23, 2024 · 5 comments · Fixed by #79
Assignees
Labels
bug Something isn't working

Comments

@rickymagner
Copy link

Hi folks, I think I found a subtle bug in parsing a particular type of ternary operator assignment. Here are the steps to reproduce:

  1. Make the following file called MinimalParse.wdl:
version 1.0

workflow MinimalParse {
    input {
        Int num = 8
    }

    Int new_num=if(num>5) then 1 else 2  # Add space after `if` to parse with wdl crate
}
  1. Try to parse using the wdl crate. Here is a Rust snippet:
// Inside main
    let src = std::env::args().nth(1).expect("missing src");
    let contents = std::fs::read_to_string(src).expect("could not read file contents");

    // Generate the parse tree from the document source.
    let parse_result = grammar::v1::parse(&contents);
    let parse_tree = match parse_result {
        Ok(_) => {
            let inner = parse_result.unwrap();
            println!("Inner is: {:?}", &inner);
            let tree = inner.into_tree().expect("Cannot make tree");
            tree
        }
        Err(e) => {
            println!("{}", e);
            panic!()
        }
    };
  1. The printed output is:
Inner is: Result { concerns: Some(Concerns(NonEmpty { head: ParseError(Error { message: "The following tokens are required: singular_identifier.", location: Position(Position { line_no: 8, col_no: 32, byte_no: 107 }) }), tail: [] })), tree: None }
  1. Add a space after if in the original WDL, i.e. Int new_num=if (num>5) then 1 else 2 and rerun. It now parses successfully.

I'm not defending the gross-looking syntax where there's no space after if, but womtool validate passes and cromwell does in fact run the original WDL, so this seems to be a parsing bug.

Curiously enough, the bug doesn't seem to care about the other shifty looking lack of spaces. Adding a space before or after the = does nothing to fix this. It also doesn't seem to care if you use a space or not after if in normal blocks outside of a ternary assignment.

@claymcleod claymcleod added the bug Something isn't working label May 2, 2024
@claymcleod claymcleod changed the title [BUG] Error parsing ternary operator assignment Error parsing ternary operator assignment May 2, 2024
@claymcleod
Copy link
Member

@peterhuene would you mind taking a look at this one?

@peterhuene
Copy link
Collaborator

peterhuene commented May 2, 2024

The root cause of the issue is that the pest grammar defines the if rule like this:

 "if" ~ (WHITESPACE | COMMENT)+ ~ expression ...

This precludes if(expr). If we change the rule to:

 "if" ~ (WHITESPACE | COMMENT)* ~ expression ...

Then if(expr) would parse, but then this would also parse:

Int new_num=iftrue then 1 else 2

The fix will need to be:

"if" ~ (&ANY_TOKEN_THAT_MIGHT_START_AN_EXPRESSION | (WHITESPACE | COMMENT)+) ~ expression ...

Where ANY_TOKEN_THAT_MIGHT_START_AN_EXPRESSION would be the token set:

(
!
+
-
{
[

Parsing a CST with pest can be error-prone as it forces the rules to think about whitespace and comments as part of the rule.

@a-frantz
Copy link
Member

a-frantz commented May 3, 2024

@peterhuene You might already see this, but reading your fix for the situation I think that would introduce more problems.

As I'm reading this, "if" ~ (ANY_TOKEN_THAT_MIGHT_START_AN_EXPRESSION | (WHITESPACE | COMMENT)+) ~ expression ..., the ANY_TOKEN rule could change the evaluation of the expression by moving a token with meaning out of the rule which gets evaluated. e.g. if!paired then... would be parsed into an ANY_TOKEN (!) and an expression (paired) which will have the opposite meaning of the intended expression, !paired. Which is not to say what you wrote is wrong, just not the entire fix, which speaks to a larger problem at hand.

Maybe I'm not saying anything you didn't already realize, but this example is proving to me that we need to really fight with pest to get it to parse as we want. It seems to me that pest is simply the wrong tool for the job. I'm moving further and further into the logos+rowan camp.

@peterhuene
Copy link
Collaborator

peterhuene commented May 3, 2024

You're definitely correct! In the grammar we'd need the & prefix on the rule to match it, but not consume a token, I think.

I also wouldn't call it ANY_TOKEN_THAT_MIGHT_START_AN_EXPRESSION, either 😄; just an example of what a fix might entail (i.e. we need to lookahead for any expression starting token).

I've updated the original suggestion to include the & so that there's no confusion.

@peterhuene
Copy link
Collaborator

I can confirm that this is fixed with the new "experimental" parser implementation.

With our intention to move away from the pest-based parser, I don't think it's worth attempting a fix there.

When we switch parser implementations, I'll resolve this as fixed.

@peterhuene peterhuene linked a pull request Jun 13, 2024 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants