Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stdlib/runtime: Implement a basic parsing library #2100

Open
spotandjake opened this issue Apr 22, 2024 · 1 comment
Open

stdlib/runtime: Implement a basic parsing library #2100

spotandjake opened this issue Apr 22, 2024 · 1 comment

Comments

@spotandjake
Copy link
Member

As mentioned in #2099, in the stdlib there are multiple cases where we have needed to and will need to implement parsers such as currently for url decoding, filepaths and Json, and in the future we might want to support things like TOML, Yaml and possibly other things. It would be useful if we standardized the way we do parsing in the stdlib to make maintaining these libraries easier, this would also help maintain best practices. As we probably don't want this library to be accessible from the stdlib it would probably make sense to put this in the runtime. Alternatively maybe we want to consider adding an stdlib/helpers path for things like this. My thoughts for this library are a very low level parsing framework similar to whats in the Json library, something like a full parser generator seems overkill.

@saolof
Copy link

saolof commented Jun 17, 2024

What is the class of languages that the parsing framework needs to be able to parse?

If the scope is to have an extensible system whose main purpose is to parse configuration languages, restricting the scope to well-matched visibly pushdown grammars (with minor conveniences to translate leading whitespace to pushdown/pop) should make it possible to parse all languages mentioned here.

It would offer non-backtracking parsing without requiring complex grammar analysis like LR or LL parsers, or requiring backtracking like PEGs, by just restricting the scope of what the module is supposed to do. The fundamental idea is to use stack of finite state automata similar to what you would use for parsing a regex, and the main question is whether you want to support nondeterministic grammars (stack of thompson NFA to handle choice with lookahead, produce output on stack pop) or whether you take the predictive/LL1 ones (stack of DFAs, produce output as you parse, bracket delimited recursion makes it trivial to write a deterministic choice operator that just uses the leading token of each rule). The latter may still be enough to parse the mentioned languages while being straightforward to write a combinator package for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants