-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update grammar for parser unification #750
Comments
@Centril, the recent changes like this and rust-lang/rust#66183 and rust-lang/rust#62550 and rust-lang/rust#68764 and rust-lang/rust#68788 have been making it much harder to document the grammar. There is now a very abstract syntax that is getting further removed from the actual grammar for the language that is relevant to most people (i.e., the grammar for what people actually write). Do you have suggestions on how we can document this in a simple way? I'd be loathe to create two grammars (the parsed one and the semantic one), since they mostly overlap. I also don't like adding huge sections that explain what the restrictions are. There are now many restrictions on the grammar that are hard to express and document. Many portions now have to be singled out as "this is not accepted, because it's unstable" or "because it is shared with this other thing", or "because that's just the way it is". The AssocItem change is a good example, where there is a parse-time restriction on edition-based trait items. I think the old grammar handled this well, but now...I have no idea how to express that. I don't want to add parameters to the grammar. It could describe the 2015 grammar, and then include a note somewhere about what is rejected in parsing 2018. But this is piling on the complexity of how to define the grammar. Even documenting subslice patterns, which I consider an important change, is very difficult because there are deep changes to the pattern grammar, and then it needs to be qualified with all the new restrictions. I'm at a loss now as to how to move forward. |
I think it's harder because we are using the wrong tools, and have been doing so for a long time. We're documenting the reference under the illusion that Rust is in fact one language, but it is not. In observable terms, we have at minimum two languages, one syntactic and one semantic (and I think you're underestimating how much of AST validation we're not specifying at the moment even before the changes you've linked). But even so, I don't think it makes sense to e.g. document eventual borrow checking rules on something resembling HIR or HAIR; those can only be reasonably explained by something graph-like, e.g. like MIR (we did throw out ast_borrowck for a reason). I also think it's not a good idea to document each concept, e.g. " So yes, I think we should give into having two "grammars" (where the latter is abstract syntax, not concrete) especially because a real formal specification without it has no hope of success. Transforming from the former to the latter can be explained using a denotational style (we can translate this fairly easily to inference rules as well for a more type-checking like style): elab_slice_pats : [Syn.Pat] -> ([Abs.Pat], Option Abs.Pat, [Abs.Pat])
elab_slice_pats ([[ .. ]] :: ps) = ([], Some Wild, map elab_pat ps)
elab_slice_pats ([[ b @ .. ]] :: ps) = ([], Some (e_binding b Wild), map elab_pat ps)
elab_slice_pats (p :: ps) = (elab_pat p :: pre, mid, suf)
where (pre, mid, suf) = elab_slice_pats ps
elab_pat : Syn.Pat -> Abs.Pat
elab_pat [[ .. ]] = error -- propagates implicitly; or we use monads if needed.
elab_pat [[ [ p0, ..., pn ] ]] = elab_slice_pats [[ [ p0, ..., pn] ]]
... -- more cases Aside from introducing more syntactic shorthands, this is the tersest definition I can think of (way shorter than the actual Rust code, but it is still formal, and nearly executable). My initial versions of the slice patterns report had something like this. We can type set this appropriately and add comments where interesting things, e.g. not just identical structural transformations, happen. Auxiliary definitions can be defined and we can explain the syntactic conventions first. However, you asked for a simple way, which I'm going to interpret as "don't rock the boat". To avoid doing so, I suppose we could add a chapter for "restrictions on syntax" where we say in a "declarative" way which productions would result in errors: visit Trait => item_cx := Assoc(Trait) -- Implicitly following stack discipline.
visit Impl => item_cx := Assoc(Impl)
visit Statement | Module | Crate => item_cx := Free
visit ExternBlock => item_cx := Extern
error
when item_cx == Extern
on FunctionItem f where f.body != ";" This is a sort of made up DSL for how AST validation behaves (not exactly a clean approach in my view; it's very much "validate" instead of "parse"). |
FWIW, C++ standard has a syntax section that lists the grammar ignoring all the semantic restrictions (e.g. with |
Thanks for the detailed response. I really liked the preface to the sml definition, it is very relevant. I'll contemplate on this in the background, but it is unlikely I'll make progress soon.
I'm well aware that the reference grammar is very under-specified (and flat out wrong in some places). I was hoping wg-grammar would lead towards fixing things, but it seems like there isn't enough momentum. My general feeling is that it may be too difficult to make progress on a high-quality, precise language spec using only volunteer contributors. It takes an uncommon set of skills, and there just doesn't seem to be the incentives and interest to do it. What you wrote sounds very interesting, but I want to keep our expectations calibrated with what can realistically be done given the involvement we have now (which is close to zero). |
Update for rust-lang/rust#67131
Also, #722 has a mistake where visibility is allowed on a macro invocation for a trait item. This is not allowed, and should have been written more like InherentImplItem is.
The text was updated successfully, but these errors were encountered: