-
Hi! I'm trying to parse nested quoted strings (like the shell does). I have a working recursive decent parser that I'm trying to port to lark. In zsh, the following double-quoted strings all parse as a single item: echo "foo"
echo "foo "bar" baz"
echo "foo "bar\"k" baz"
echo "foo \" baz" Here is my grammar so far REGULAR_DQUOTED_CHARS: /([^"\\]|\\[^"\\]|\\[\\"])+/
double_quoted_string: "\"" (REGULAR_DQUOTED_CHARS | double_quoted_string)* "\"" It parses basic strings correctly (without nesting), but it cannot handle I'm no really sure how to proceed because I've never used LALR parsers before 😦 I am also willing to write some sort of plugin to handle this, but I'm not sure how to do that (could you point me to the appropriate documentation or examples?) I would prefer not to use insane regex features like lookahead/lookbehind (I don't really understand those 😉) EDIT: Similarly, how would I parse balanced parentheses like "(foo (bar) baz)" without caring what is inside? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
What is the pattern that says |
Beta Was this translation helpful? Give feedback.
-
It seems that the pattern is that after a string quote you can't have a letter (or digit too?) . I'm guessing it because in this example As for plugins, we have a slightly smarter mechanism. Instead of inserting yourself in the middle of the parse, you can instead instrument it using this feature: https://lark-parser.readthedocs.io/en/latest/classes.html#lark.Lark.parse_interactive Which uses this class: https://lark-parser.readthedocs.io/en/latest/classes.html#interactiveparser The API for it is still a little rough around the edges, but it works. And I don't mind doing a bit of polishing if you decide to use it. |
Beta Was this translation helpful? Give feedback.
What is the pattern that says
"foo "bar" baz"
should be a single terminal? You need to encode that pattern in some way (probably a look ahead regex).