stdlib/runtime: Implement a basic parsing library #2100

spotandjake · 2024-04-22T17:24:17Z

As mentioned in #2099, in the stdlib there are multiple cases where we have needed to and will need to implement parsers such as currently for url decoding, filepaths and Json, and in the future we might want to support things like TOML, Yaml and possibly other things. It would be useful if we standardized the way we do parsing in the stdlib to make maintaining these libraries easier, this would also help maintain best practices. As we probably don't want this library to be accessible from the stdlib it would probably make sense to put this in the runtime. Alternatively maybe we want to consider adding an stdlib/helpers path for things like this. My thoughts for this library are a very low level parsing framework similar to whats in the Json library, something like a full parser generator seems overkill.

The text was updated successfully, but these errors were encountered:

saolof · 2024-06-17T19:25:32Z

What is the class of languages that the parsing framework needs to be able to parse?

If the scope is to have an extensible system whose main purpose is to parse configuration languages, restricting the scope to well-matched visibly pushdown grammars (with minor conveniences to translate leading whitespace to pushdown/pop) should make it possible to parse all languages mentioned here.

It would offer non-backtracking parsing without requiring complex grammar analysis like LR or LL parsers, or requiring backtracking like PEGs, by just restricting the scope of what the module is supposed to do. The fundamental idea is to use stack of finite state automata similar to what you would use for parsing a regex, and the main question is whether you want to support nondeterministic grammars (stack of thompson NFA to handle choice with lookahead, produce output on stack pop) or whether you take the predictive/LL1 ones (stack of DFAs, produce output as you parse, bracket delimited recursion makes it trivial to write a deterministic choice operator that just uses the leading token of each rule). The latter may still be enough to parse the mentioned languages while being straightforward to write a combinator package for.

spotandjake added stdlib runtime labels Apr 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

stdlib/runtime: Implement a basic parsing library #2100

stdlib/runtime: Implement a basic parsing library #2100

spotandjake commented Apr 22, 2024

saolof commented Jun 17, 2024

stdlib/runtime: Implement a basic parsing library #2100

stdlib/runtime: Implement a basic parsing library #2100

Comments

spotandjake commented Apr 22, 2024

saolof commented Jun 17, 2024