Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build a stand-alone Miden assembly parser #298

Closed
bobbinth opened this issue Jul 9, 2022 · 8 comments
Closed

Build a stand-alone Miden assembly parser #298

bobbinth opened this issue Jul 9, 2022 · 8 comments
Assignees
Labels
assembly Related to Miden assembly

Comments

@bobbinth
Copy link
Contributor

bobbinth commented Jul 9, 2022

As discussed in #225, it might be a good idea to separate Miden assembly parsing from MAST generation. There are two primary reasons for that:

  1. Serialization: by the time we have a MAST (in a form of a Script struct), some info about the original assembly file is already lost. Specifically, the procedures have already been inlined, and thus, serializing MAST could result in a much bigger output size than the original .masm file.
  2. Performance: parsing (especially large source files) may have a non-negligible performance overhead. But if most of the parsing is already done, and the code is stored in a binary format, the performance could be improved significantly.

Overall, the compilation process could look something like this:

[source .masm] -> [parsed AST] -> [Script]

As the first step, we should probably just implement parsing of source .masm files into simple ASTs. We could use off the shelf parsers such as Pest or Rowan, but given how simple Miden assembly syntax is, my initial thinking is that we can implement the parser ourselves.

In terms of structure, we could probably define a Node enum which could look something like this:

pub enum Node {
    Instruction(Instruction),
    IfElse(Vec<Node>, Vec<Node>),
    Repeat(Vec<Node>),
    While(Vec<Node>),
}

In the above, Instruction could be an enum itself, something like:

pub enum Instruction {
    Assert,
    AssertEq,
    Add,
    AddImm(u64),
    ...
}

A few things left to figure out:

  • How to handle procedures? I think we'll probably need to maintain two lists: one for local procedures (defined within a .masm file) and another one for external procedures (e.g., from stdlib). And there should be a Procedure struct which will contain procedure name, number of locals etc.
  • How exec instruction should work? In the .masm file we provide a name of the procedure to be invoked. For local procedures we should probably replace names with indexes. For external procedures, we should probably replace names with procedure hashes, but I haven't thought through the implications completely.

Once we have the parsing working, we can implement Miden assembly serialization into binary format, and then switch over to building MAST from the AST rather than directly from .masm source.

@bobbinth bobbinth added the assembly Related to Miden assembly label Jul 9, 2022
@tnachen
Copy link
Contributor

tnachen commented Jul 10, 2022

I can extend the existing parser for docs to make it more extensive to handle this.

@bobbinth
Copy link
Contributor Author

I think we should probably use the current .masm parser as the base as it has already a lot of the logic built for this (e.g., tokenization).

@tnachen
Copy link
Contributor

tnachen commented Jul 19, 2022

Got it, I'd like take this on as it seems relatively straight forward. Is there any preference what this binary format we should store AST in?

@bobbinth
Copy link
Contributor Author

bobbinth commented Jul 22, 2022

I think as the first step we can implement the parsing part only. Serialization to binary format could be done in future PRs.

Before we start working on this, might be good to figure out answers to the questions I had in the original post (especially for the exec instruction).

@tnachen
Copy link
Contributor

tnachen commented Aug 13, 2022

Yes seems like exec in a local .masm is replaceable with a local index (though probably sorted as I assume order in the .masm file doesn't matter and shouldn't effect AST).
Is there a reason we don't just store the exec proc name similar to the parser (prefix::proc.label)?

@bobbinth
Copy link
Contributor Author

I think we can still store names in the AST. I was thinking more about the future when we want to serialize to binary and then, we should try to be as compact as possible. So, when serializing, we'd need to replace all local procedure names with indexes (which probably could fit in 1 - or 2 bytes), and I thought it might be easier to handle this from the start.

Another related reason is that we should be able to parse the binary format into an AST as well. And if we implement serialization as described above, procedure names would not be available. Thus, procedure name should probably be stored in the relevant node as Option.

The order of local procedures does matter - so, we should assign indexes in the order in which they were declared in the source file.

@vlopes11
Copy link
Contributor

vlopes11 commented Oct 25, 2022

One option to remove the stdlib as dependency of the assembly is to deprecate use and create a new token import.

This new token will consume a given SourceProvider and will append contents to the current context. We end up being more flexible since we don't import whole modules, but only what is in the imports.

The souce provider will be a mapping str -> str that will normally map from a path to file contents. Being a trait, it can also have test-friendly implementations such as in-memory disk (i.e. HashMap<String, String>) that will provide imports under test contexts.

We can, alternatively, change the behavior of use, but this is a hard downstream breaking change. The advantage of deprecating one in favor of the other is we don't break downstream functionality.

@bobbinth
Copy link
Contributor Author

Closed by #490 and many others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assembly Related to Miden assembly
Projects
None yet
Development

No branches or pull requests

4 participants
@tnachen @vlopes11 @bobbinth and others