feat(masm): refactor assembler into a more compiler-like architecture #1277

bitwalker · 2024-03-08T00:20:59Z

Important

This is a large PR, with many changes interlinked. I have broken up the changes into many smaller pieces that introduce them piece-by-piece for review, but those pieces may refer to thing that come in later commits, or be incomplete in and of themselves. This is intentional, and you should use the commits to understand the structure of the changes being made, and then attack the review however you prefer from there.

Please read the commit messages, as they also further introduce the work done and what I'm trying to convey in each particular chunk.

This PR is a major refactor of the assembler crate, with various bits of new functionality in the miden-core and miden-assembly crates. There are some changes made in other crates, but these are largely tests, where the interaction with the assembler has either changed in some way, or the output being tested has changed. The following section lists the most significant changes to be aware of - keep in mind, that as I laid out in my note at the top, the changes are described in more detail in the commit messages, so read those for more info, but I'm happy to answer any questions you may have as well.

Summary

Here's a high-level overview of what is contained in this PR:

A completely new MASM frontend (i.e. parser/lexer/diagnostics), which makes use of a formal LALR(1) grammar, and corresponding generated parser. The grammar and parser generator used is lalrpop, and the lexer generator is logos - both are chosen because of my prior experience in compiler projects with them. They are solid, well-built crates, with a decent community around them, and both support no-std builds.
Rewritten source-location tracking, which also uses a representation that the Miden compiler uses, which will make later work to propagate source location information from the compiler frontend through Miden Assembly trivial. Most importantly though, this enables precise diagnostics, with source code snippets. See the tests for examples of what these diagnostics look like in practice.
A new way of expressing errors and diagnostics in the assembler and core crates (and other crates as well, when we want to start using them there). This makes use of the thiserror and miette crates (forked versions currently, for no-std support, until error_in_core is stabilized). These enable very ergonomic error types (via thiserror and #[derive(Error)]), and the ability to layer diagnostics on those same types (via miette and #[derive(Diagnostic)]). The diagnostics infrastructure is quite configurable, and I suspect we may want to take advantage of its JSON reporting backend for running the VM in the browser, so that we can extract error information in JavaScript and render pretty errors in the browser-based editor.
A source code formatter for Miden Assembly and MAST (in its textual form). This is based on the algorithm described by Philip Wadler, commonly called Prettier. You interact with the pretty printer by implementing the PrettyPrint trait, which is implemented in miden-core under core/src/prettier, by describing how you want to lay out the components of the syntax to be rendered. The prettier algorithm then takes that description and renders it using a language-agnostic algorithm. I have implemented pretty printers for both MASM and MAST, but feel free to modify their implementations as you see fit. One of the interesting things you can do with it is describe alternate layouts, which the algorithm will choose between depending on the available width of the output window (which defaults to 80 columns). You can see this in effect with span in the case of MAST, which will alternatively render single- vs multi-line depending on the size of the span.
Syntax tree visitors for the MASM abstract syntax tree, which allows you to concisely express analyses and rewrites over the tree without having to implement the traversals by hand each time. This is used in a few places in the refactored code.
A new semantic analyzer, which performs a variety of first-pass validation checks after parsing MASM source code.
A largely rewritten set of AST nodes. Many of the previous ones are still there in some form (especially Instruction, which hasn't changed much), others are expanded with new functionality or metadata (e.g. InvocationTarget), and others are entirely new, or rewritten so as to be essentially new. Virtually all of these nodes now implement a Spanned trait which returns the source span for that node, or a default one if there is no source code.
A significant rewrite of the Assembler internals. Some parts of this are virtually untouched (particularly the translation to MAST, with the main exception being how procedure calls are handled). The way in which the assembler is instantiated in used is slightly different, but much more powerful now. These changes are all driven off the new "module graph", which is used for global inter-procedural analysis during assembly.

Changes to MASM Syntax

I'd like to devote a section just to this topic, as they are both a motivating reason for some of the changes, as well as a result of reimplementing compilation in a more principled fashion:

You may now express arbitrarily-long identifiers in MASM, e.g. the 255 character limit is removed.
You may now specify identifiers that were previously illegal by quoting them in double-quotes, e.g. export."_RNvCskwGfYPst2Cb_3foo16example_function" (an identifier generated from Rust's mangling scheme). The default "bare" identifiers still have largely the same constraints as before (start with an alphabetic character, contain only alphanumerics and underscores, etc.).
You may now order procedures in a module any way you like, i.e. you do not need to define a procedure before it is referenced. This allows you to organize modules as you see fit to group functionality. The inter-procedural analysis done during compilation will ensure that no cycles in the call graph are introduced.
Similarly, you may introduce imports, constants, and other top-level items in any order, i.e. it is now perfectly acceptable to place constant definitions close to their usages. This means you can also reference definitions out of order, as long as when the names are resolved there are no cycles (which is also detected and raised as a diagnostic).
It is expected that Kernel modules export their syscalls, and proc their private internal helper functions. I actually can't recall now if that was the case before, but it certainly is now.
All instructions which accept immediates, now also accept constant identifiers as well as literals. I think the majority of instructions had support for constants, but it is now universal. There is only one exception to this rule, and it could easily be removed, and that's for the exp.uXX instruction, which currently only accepts a literal.
Imports are now purely syntactic sugar - they have no bearing on the compiled code at all, as the imports are resolved to their actual definitions during compilation (no matter how many re-exports have to be followed, though again, cycles are detected and not allowed). We could easily add support for important constants as well, but I have not implemented that here.

I think that largely covers the changes to the surface syntax, at least that I can recall off the top of my head here.

Changes to MAST/Assembler

One of the major things that falls out of the Assembler refactoring, is that we now always visit procedures in reverse topological order, i.e. callees before callers. As a result, we always know the MAST root for every procedure being called in the program. Instead of inlining the body of every procedure at every call site, the assembler now emits PROXY blocks for procedures called via exec (and CALL for call, but that's not new). This has the effect of making the emitted MAST drastically smaller. I added support in the processor for executing proxy blocks (it was an error previously) - if that is not the correct behavior, we may want to revisit this as part of the larger MAST refactoring. The only downside is that the textual MAST now has proxy.HASH where there used to be inlined code, but even that is something we could easily address in the formatter if we actually have a problem with this. I have added a test helper though that makes expressing tests that expect certain textual MAST output that contains calls much more readable, and less fragile to changes in the standard library, etc.

The other significant change to the way the assembler works is that the procedure cache has been rewritten to be tightly integrated with the module graph. This enables some nice optimizations, and in particular, allows us to use plain integer identifiers rather than having to hash procedure ids. The downside is that the procedure cache cannot be shared across assembler instances - whether that's a problem in practice I think remains to be seen. I do think that we'll want to further refactor the assembler in the future to enable a greater degree of sharing of the module graph and procedure cache, but there is a tradeoff in performance either way, so I erred on the side of single-threaded use cases for now.

Diagnostics

This PR introduces two dependencies to aid in creating error types and diagnostics, thiserror and miette respectively. These are using forks I made from the latest versions of both crates, so as to add support for #![no_std] environments until the error_in_core feature stabilizes. But let's take a brief look at how these are used:

Defining an error

Let's use the parser as an example here, the following is a subset of it's definition that demonstrates how thiserror can be used to define the type, while simultaneously implementing From for any errors it encapsulates, as well as the Display, and Error traits:

#[derive(Debug, Clone, thiserror::Error)]
pub enum ParsingError {
    #[error("invalid token")]
    InvalidToken {
        span: SourceSpan,
    },
    #[error("unrecognized token: expected {}", expected.as_slice().join(", or ")))]
    UnrecognizedToken {
        span: SourceSpan,
        token: String,
        expected: Vec<String>,
    },
    ...
}

That's it! It makes defining error types for each task convenient, and provides a very natural way to display the data associated with an error as part of it's Display implementation.

Now let's extend our ParsingError type for use in our diagnostics system:

#[derive(Debug, Clone, thiserror::Error, Diagnostic)]
pub enum ParsingError {
    #[error("invalid token")]
    #[diagnostic()]
    InvalidToken {
        #[label("occurs here")]
        span: SourceSpan,
    },
    #[error("unrecognized token")]
    #[diagnostic(help("expected {}", expected.as_slice().join(", or ")))]
    UnrecognizedToken {
        #[label("lexed a {token} here")]
        span: SourceSpan,
        token: String,
        expected: Vec<String>,
    },
    ...
}

As you can see, this feels like a natural extension of thiserror, with support for decorating things that are convertible to SourceSpan as "labels" in the diagnostic output. Here's what an instance of the UnrecognizedToken error above looks like when rendered with source code:

  x unrecognized token
         ,-[test1737:2:37]
       1 |
       2 |         use.dummy::math::u64->bigint->invalidname
         :                                     ^|
         :                                      `-- lexed a -> here
       3 |
         `----
        help: expected "begin", or "const", or "export", or "proc", or "use",
      or end of file, or doc comment

I actually lifted the example above from our test suite (the module_alias test). These pretty errors are enabled by associated source spans with specific errors, and then pairing them with a reference to the source code to which those spans are derived, and this can be done in a few different ways.

MASM Serialization

While this PR does not remove serialization of MASM, it does change it, as the AST has changed, and some of the restrictions have been loosened. I think we should plan on removing serialization entirely once the MAST refactoring is done, which should now be trivial with the changes contained in this PR having already set the stage for it.

TODO

There are three things that still need to be done, and they are small, but I want to call them out here:

Need to rebase on the most recent changes in next, I'm ~12 commits behind, and there are certainly conflicts there, but I wanted to tackle that once I got this open first.
There is a single failing test that I need to diagnose (in case one of you knows why it alone is failing!), which can be run with cargo test -p miden-stdlib -- crypto::sha256::sha256_hash_memory. It is failing with FailedAssertion { clk: <some number of cycles>, err_code: 0, err_msg: None }, but no additional info. I'm assuming it is due to my changes, but find it odd that all other tests pass and this one does not.
I would like to confirm that we actually want to support things like begin end or if.true end in the surface Miden Assembly syntax. I have chosen not to allow this in the parser, to keep the grammar simpler, and because it honestly doesn't make sense that anyone would write an empty block by hand, or what the purpose of doing so would be. This obviously has nothing to do with supporting empty blocks in MAST, only in MASM source code.
On a related note, I've also relaxed the behavior with regard to multiple imports of the same module with different aliases (as long as they are both used). This was an error, as of these changes that is no longer the case. We can certainly make it an error easily enough, but I feel there is a valid reason to allow this, and that we would only want to raise a warning/error if an import is unused (e.g. while refactoring a large program, you might move stuff using the old name to a new module that is aliased, while simultaneously importing the new module with its "true" name, and have all new code use that import, thus both imports are valid and useful, despite being overlapping.

TL;DR

This is an enormous PR, but it is more efficient to group these changes together like this than to have made them incrementally (which would have taken many weeks). That said, please take as much time as you need, I don't want this to be a nightmare to review. For that reason I spent an inordinate amount of time breaking this up into smaller commits that can be reviewed piecemeal, but stack in such a way that you can pretty much just start at the oldest commit and work forward and have it go fairly painlessly.

I'm also happy to walk through the changes or any questions you have on a call, in Discord, etc., if you feel it will be easier to have me on hand to answer questions you have. I'll literally pair up for the whole review if you want - I opened this monstrosity, so its only fair I go out of my way to make it easy on the rest of you. Just let me know!

I ran the benchmarks, and here's what I'm seeing compared to next:

❯ cargo bench -p miden-vm -- --save-baseline assembler-refactor
warning: src/github.com/0xpolygonmiden/miden-vm/miden/Cargo.toml: unused manifest key: bench.0.debug
   Compiling miden-vm v0.8.0 (src/github.com/0xpolygonmiden/miden-vm/miden)
    Finished bench [optimized] target(s) in 2.84s
     Running benches/program_compilation.rs (target/release/deps/program_compilation-f49d9b2efbd5811d)
Gnuplot not found, using plotters backend
program_compilation/sha256
                        time:   [5.1606 ms 5.1667 ms 5.1736 ms]
                        change: [-96.568% -96.560% -96.553%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  2 (2.00%) low mild
  5 (5.00%) high mild
  6 (6.00%) high severe

     Running benches/program_execution.rs (target/release/deps/program_execution-2647b9ea5dc36e7d)
Gnuplot not found, using plotters backend
program_execution/sha256
                        time:   [15.236 ms 15.283 ms 15.334 ms]
                        change: [-3.0510% -2.4781% -1.8854%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  3 (3.00%) high mild
  1 (1.00%) high severe

plafer

This is an awesome change! We get all the benefits of using a parser generator - cleaner code, and more flexible MASM programs (e.g. ordering procedures however we like).

I didn't look in detail in how the semantic analyzer works. Unit/integration tests are green, so I'm happy on that front - I'll deep dive if ever I need to make a change.

Otherwise left a couple of nits. Great job!

assembly/src/assembler/context.rs

assembly/src/assembler/id.rs

assembly/src/assembler/mod.rs

assembly/src/parser/error.rs

bobbinth

Looks great! Thank you! I've only started my review and looked through the simple changes so far (e.g., reviewed changes in the stdlib, processor, and verifier crates) - but wanted to leave a few comments already (mostly nits so far).

assembly/Cargo.toml

test-utils/Cargo.toml

core/src/operations/decorators/advice.rs

stdlib/docs/crypto/hashes/blake3.md

assembly/src/assembler/id.rs

hackaugusto · 2024-03-11T12:01:28Z

assembly/src/assembler/callgraph.rs

+        let mut expect_cycle = false;
+        if roots.is_empty() {
+            expect_cycle = true;
+            roots.extend(graph.nodes.keys().next());


question: could we have more than one cycle? if yes, would it make sense to report all possible cycles?

It's certainly possible to have multiple cycles simultaneously, depending on what has changed since the last compilation (if any). However detecting multiple cycles is not as simple as what is being done here, which relies on the simple property that failing to produce a topological sort means there is at least one cycle in the graph, but we don't actually try to work out the actual cycles. To actually identify all the cycles, we'd need to run something like Kosaraju's strongly-connected components algorithm, that identifies subgraphs that contain at least one cycle, but doesn't identify all of the cycles. Johnson's algorithm to find all of the elementary cycles of a graph is also an option, but is somewhat complex to implement.

TL;DR: we probably should hold off on implementing a fancier diagnostic here until we have a sense of how frequently this issue arises, and how often it involves multiple cycles vs accidentally introducing recursion in a new procedure. But if we feel there it is definitely worthwhile to do now, that's fine too.

hackaugusto · 2024-03-11T12:08:42Z

assembly/src/assembler/context.rs

+    /// When true, this permits calls to refer to procedures which are not locally available,
+    /// as long as they are referenced by MAST root, and not by name. As long as the MAST for those
+    /// roots is present when the code is executed, this works fine. However, if the VM tries to
+    /// execute a program with such calls, and the MAST is not available, the program will trap.


So we have a concept of trap in the VM? I think the execution just fails?

I assumed we handled it somewhat sanely rather than panicking, hence my description of the behavior as a "trap", which refers to anything that causes execution to halt unexpectedly, but in a controlled manner (i.e. the VM doesn't just segfault or something wild like that). If that's not the case today, it probably should be, since I anticipate this scenario could arise frequently in practice.

I think we just return an error, i.e. the VM errors

assembly/src/ast/instruction/advice.rs

assembly/src/ast/invocation_target.rs

assembly/src/ast/module.rs

assembly/src/ast/op.rs

hackaugusto · 2024-03-11T13:35:19Z

assembly/src/assembler/instruction/u32_ops.rs

@@ -481,29 +463,29 @@ fn calculate_clz(span: &mut SpanBuilder) -> Result<Option<CodeBlock>, AssemblyEr
    let ops_group_2 = [
        Push(Felt::new(u32::MAX as u64 + 1)), // [2^32, pow2(32 - clz), n, clz, ...]

-        Dup1, Neg, Add, // [2^32 - pow2(32 - clz), pow2(32 - clz), n, clz, ...] 
-                        // `2^32 - pow2(32 - clz)` is equal to `clz` leading ones and `32 - clz` 


@bobbinth can we please add the pre-commit checks to CI? These changes are happening across all our repos. It is cluttering the git history, adding extra work on rebase, and makes PRs like this bigger than necessary because not everyone is running cleanup on their tooling.

Running the pre-commit hooks will fix these issues, since the CI will enforce the author properly cleans their code. (I personally don't care for pre-commit hooks, it is just a tool for this, we can use any other tool that removes whitespace).

Yeah there were quite a few places, almost every module I touched honestly, that had some kind of formatting discrepancy, mostly really small things like this example, but having some kind of CI check would be helpful. I don't think pre-commit hooks are necessary as long as the PR status checks fail due to this stuff (and is more of a impetus for those that don't set up their editors to do this stuff for them, to do so and avoid the headache).

I don't think pre-commit hooks are necessary as long as the PR status checks fail due to this stuff

By pre-commit hook I mean the tool. I have added the configuration for it a while back. People don't have to install the tool if they don't want (and we dont' even have to use this tool to check the format in the CI), it is just an easy way of get this setup since the config is there already.

I think we should add this to the README - I didn't know about that tool, or that we had that configuration already setup. I'll set it up now

Ah gotcha, makes total sense

assembly/src/assembler/id.rs

assembly/src/assembler/mod.rs

assembly/README.md

core/src/utils/mod.rs

hackaugusto · 2024-03-11T14:57:22Z

Cargo.toml

+thiserror = { git = "https://github.com/bitwalker/thiserror", branch = "no-std" }
+miette = { git = "https://github.com/bitwalker/miette", branch = "no-std" }


should we move these forks to the https://github.com/0xPolygonMiden org?

I want to say yes? I guess the downside is that means all of the Polygon org rules are going to apply, so someone else will have to be involved to sign off on any updates we do to those crates, but they aren't likely to change much in the near term.

assembly/src/parser/error.rs

hackaugusto · 2024-03-11T16:06:28Z

assembly/src/parser/grammar.lalrpop

+        "adv.insert_hdword" => Token::AdvInsertHdword,
+        "adv.insert_hperm" => Token::AdvInsertHperm,
+        "adv.insert_mem" => Token::AdvInsertMem,
+        "adv_loadw" => Token::AdvLoadw,
+        "adv_pipe" => Token::AdvPipe,
+        "adv_push" => Token::AdvPush,
+        "adv.push_ext2intt" => Token::AdvPushExt2intt,
+        "adv.push_mapval" => Token::AdvPushMapval,
+        "adv.push_mapvaln" => Token::AdvPushMapvaln,
+        "adv.push_mtnode" => Token::AdvPushMtnode,
+        "adv.push_sig" => Token::AdvPushSig,
+        "adv.push_smtpeek" => Token::AdvPushSmtpeek,
+        "adv.push_smtget" => Token::AdvPushSmtget,
+        "adv.push_smtset" => Token::AdvPushSmtset,
+        "adv.push_u64div" => Token::AdvPushU64Div,


@bobbinth should we standardize the adv_loadw, adv_pipe, adv_push instructions to match the rest? (Or even better, drop . so that it is used to separate only arguments)

I had the same thought, I was actually a bit surprised that those couple of instructions didn't follow the same convention, but didn't think it was appropriate to change that as part of this PR.

The main reason why these are different is to differentiate between real instructions and decorators. So, anything which is adv.xxx is a decorator (i.e., it does not advance the clock cycle of the VM and does not affect the VM state outside of the advice provider). The adv_xxx are actual instructions which affect the state of the operand stack and/or memory.

Is this distinction useful though? And if it is, is a single . the best way to express this?

I think it is useful as something like adv_push has a very different semantic from adv.push_u64div - so, it is good to have syntactic differentiation. Whether using . is the best way to do that - I'm not sure. I picked it because conceptually in adv.[action] the action parametrizes what we should do to the advice provider (this is consistent with how we specify parameters for other instructions). But I'm open to other ways of doing this.

I think my first impression upon seeing it is that I would assume they are both instructions, so we probably should differentiate decorators further somehow, such as prefixing them with @ or something, e.g. @adv.push_u64div visually stands out as something special compared to regular instructions, and also makes it easy to spot them when skimming the code.

If we want to distinguish them somehow, I would suggest a prefix symbol from the set @, !, % or $. I'd be inclined to use @ simply because it is used for attribute/decorator-ish things in other languages, but any of those symbols are free of any ambiguity issues in the current grammar.

hackaugusto · 2024-03-11T16:18:01Z

assembly/src/parser/grammar.lalrpop

+#[inline]
+Assert: Instruction = {
+    "assert" <error_code:MaybeAssertCode> => error_code.map(Instruction::AssertWithError).unwrap_or(Instruction::Assert),
+    "assertz" <error_code:MaybeAssertCode> => error_code.map(Instruction::AssertzWithError).unwrap_or(Instruction::Assertz),
+    "assert_eq" <error_code:MaybeAssertCode> => error_code.map(Instruction::AssertEqWithError).unwrap_or(Instruction::AssertEq),
+    "assert_eqw" <error_code:MaybeAssertCode> => error_code.map(Instruction::AssertEqwWithError).unwrap_or(Instruction::AssertEqw),
+    "u32assert" <error_code:MaybeAssertCode> => error_code.map(Instruction::U32AssertWithError).unwrap_or(Instruction::U32Assert),
+    "u32assert2" <error_code:MaybeAssertCode> => error_code.map(Instruction::U32Assert2WithError).unwrap_or(Instruction::U32Assert2),
+    "u32assertw" <error_code:MaybeAssertCode> => error_code.map(Instruction::U32AssertWWithError).unwrap_or(Instruction::U32AssertW),
+}
+
+MaybeAssertCode: Option<Immediate<u32>> = {
+    "." "err" "=" <code:ImmValue<U32>> => Some(code),
+    => None,
+}


I think we should forbid assert.err=0, because 0 is used as the default for a plain assert. An alternative would be to have the error in the VM be a Option<u32> instead. So the default is set to None instead.

My rationale is that these two error spaces should not collide.

I'd be fine with that as well, though I think what I really want is to associate something more concrete, like a source span, with an assertion error. Even with the full 32-bit range for error codes, it doesn't really help you find where the error was raised (and that assumes that each possible location an error can occur uses a unique error code), and also doesn't account for all of the macro-like instructions that expand to some set of instructions that contain asserts (and is thus not visible in the original source code, nor do we - currently anyway - assign those asserts distinct codes). It's obviously better than nothing, but when I was debugging one issue last week, I definitely felt like I was looking for a needle in a haystack.

Yeah, totally agree. Ideally we would have some stack tracing too. I think the issue with that is that MAST erases the procedure names? But yeah, I'm 100% with you, would love to have additional information for errors.

One of the main motivations for limiting error codes to 32 bits is to keep the size of programs relatively small which is important for putting programs on chain. Otherwise, with this new structure, nothing really prevents us from using strings or more elaborate identifiers. A possible strategy to adopt is to have more elaborate identifiers but then strip them out for on-chain programs during serialization. But that's more related to MAST serialization than this PR.

The SourceSpan type is 32 bits, but in order to be useful, we also need to have the source code (path or actual content), so that we can render an error with a source snippet (see the assembler tests for examples of what these look like if you haven't seen them yet). We don't actually need identifiers (file/line/col) at each site. The source code path (or content) could be provided with the MAST by mapping each MAST root corresponding to a procedure with the source code for that procedure when built with debug info, allowing us to render source code snippets by looking up the source for the current procedure (if available) and then rendering the span using the source span encoded in the instruction.

Of course, the problem occurs when the source code is not available, still, the source span value itself forms a useful and semi-unique error code (i.e. identical spans with different errors are possible, since spans are unique only within a file, but they probably aren't super common).

If we used 64 bits for errors, then 32 bits could be the user-provided code, and 32 bits could be the span. We could even omit the 32 bits used for custom codes when no code is used, and so save space in the MAST. The main thing we'd need to do though is ensure the span is not part of the hash, so as to avoid making two otherwise identical functions have unique hashes. It's not a huge issue if the span ends up in the hash, but it would preclude two functions defined separately from hashing to the same value (unless they just happened to occur in the same place in both files, but that's super unlikely).

assembly/src/parser/grammar.lalrpop

bobbinth

Still not a full review - but I'm making steady progress. I've reviewed changes in all crates except for the assembly crate and started to make headway into the assembly crate as well.

I left some comments inline - most of them a pretty minor.

miden/Cargo.toml

bobbinth · 2024-03-13T08:32:38Z

miden/src/lib.rs

 pub use assembly::{
-    ast::{ModuleAst, ProgramAst},
-    Assembler, AssemblyError, ParsingError,
+    self,
+    ast::{Module, ModuleKind},
+    diagnostics, Assembler, AssemblyError,
 };


Why export both self and submodules from here? Should we just do:

pub use assembly;

That's what I would be inclined to do, but I guess I didn't want to remove some of the "major" types from the crate root, so as to avoid more downstream changes, but honestly it's not a big deal to fix those up, so if you'd prefer to just export the assembly module and drop the top-level re-export of things like Module and Assembler, that's fine by me!

core/src/utils/mod.rs

core/src/prettier/mod.rs

core/src/utils/mod.rs

bobbinth · 2024-03-13T09:24:40Z

stdlib/docs/crypto/hashes/blake3.md

@@ -1,6 +1,6 @@
-
+Initializes four memory addresses, provided for storing initial 4x4 blake3<br />state matrix ( i.e. 16 elements each of 32 -bit ), for computing blake3 2-to-1 hash<br /><br />Expected stack state:<br /><br />[state_0_3_addr, state_4_7_addr, state_8_11_addr, state_12_15_addr]<br /><br />Note, state_`i`_`j`_addr -> absolute address of {state[i], state[i+1], state[i+2], state[i+3]} in memory | j = i+3<br /><br />Final stack state:<br /><br />[...]<br /><br />Initialized stack state is written back to provided memory addresses.<br /><br />Functionally this routine is equivalent to https://github.com/itzmeanjan/blake3/blob/f07d32e/include/blake3.hpp#!L1709-L1713<br />


I think there is still something off with how we process documentation. This comment belongs to an internal procedure and should not be printed out - but instead, it seems to be printed out as module-level comment.

The only way that can happen (unless there is a bug, which I will check here) is if the doc comment for the local procedure starts on the first line of the module. The only way a doc comment gets associated with the module and not a procedure is if it starts on the first line of the module and there is no separate doc comment for the procedure (or if someone manually sets the docs for the module). I had to do it this way because it is otherwise ambiguous whether the first doc comment in a file is associated with a module or the first documentable item in the module.

Rust distinguishes doc comments for modules vs module-scoped items with different prefixes, i.e. //! vs ///. We may want to consider doing that if there are cases where the ambiguity causes a mixup like this.

Fumuran

Looks awesome!

core/src/utils/mod.rs

assembly/src/parser/grammar.lalrpop

bobbinth

I'm still making may way through the parser logic, and I left a few more comments inline.

Also, I pushed a couple of commits which cleaned up comment formatting and added section separators in a few files.

assembly/src/lib.rs

bobbinth · 2024-03-14T00:12:55Z

assembly/src/lib.rs

+pub use self::assembler::{ArtifactKind, Assembler, AssemblyContext};
+pub use self::ast::{Module, ProcedureName};
+pub use self::errors::AssemblyError;
+pub use self::library::{
+    Library, LibraryError, LibraryNamespace, LibraryPath, MaslLibrary, PathError, Version,
+};
+pub use self::parser::{SourceLocation, SourceSpan, Span, Spanned};


Why do we need to use self:: here?

Also, do we need to export ArtifactKind publicly? If so, maybe we should come up with a more "self-describing" name?

Why do we need to use self:: here?

As mentioned in another related comment, this is actually what our conventions as enforced by rustfmt/rust-analyzer dictate (though the configuration is disabled for some reason in this repo). There are other reasons why self is used, which I mentioned in my other comment, but in general it's good practice to use self like this rather than leaving them unqualified - both as a less ambiguous syntax, and to aid in skimming import lists to determine local vs external imports at a glance.

Also, do we need to export ArtifactKind publicly? If so, maybe we should come up with a more "self-describing" name?

A couple of the lower-level public assembler APIs take this as an argument, so we need to export it. We could probably re-use ModuleKind though, however I wanted to distinguish the two because the artifacts produced by the assembler may be a different set than the types of modules that can be defined. I'm open to alternatives though. This kind of thing is typically called an "artifact" or "object" in compiler/assembler/linker contexts, and is inherently a bit general, but we might be able to be more specific in our case.

assembly/build.rs

assembly/src/lib.rs

assembly/src/parser/mod.rs

assembly/src/parser/error.rs

core/src/operations/decorators/mod.rs

core/src/prettier/print.rs

…dule

…lated to cycle check

bitwalker · 2024-04-29T05:44:48Z

This is ready now and has been rebased on latest next, all review comments have been addressed, aside from the removal of the Assembler::*_in_context methods and the AssemblyContext, as that is a larger change that feels like it belongs in a separate PR once this is merged.

A number of improvements and other features/changes have been discussed here, but I believe everything we intended to tackle as part of this PR is more or less complete.

bitwalker added the assembly Related to Miden assembly label Mar 8, 2024

bitwalker requested review from hackaugusto, Fumuran and bobbinth March 8, 2024 00:20

bitwalker self-assigned this Mar 8, 2024

bitwalker force-pushed the bitwalker/assembler-refactor branch 3 times, most recently from b3a3ff5 to 5c8cddf Compare March 8, 2024 05:40

bobbinth requested a review from plafer March 8, 2024 05:40

plafer reviewed Mar 8, 2024

View reviewed changes

bobbinth mentioned this pull request Mar 8, 2024

Implement error codes for the mtree_verify instructions. #1274

Closed

bitwalker force-pushed the bitwalker/assembler-refactor branch 2 times, most recently from dda2489 to e240e4e Compare March 9, 2024 05:12

bitwalker marked this pull request as ready for review March 9, 2024 07:55

bobbinth reviewed Mar 9, 2024

View reviewed changes

bitwalker force-pushed the bitwalker/assembler-refactor branch from 878571e to c1cea0f Compare March 10, 2024 20:39

hackaugusto reviewed Mar 11, 2024

View reviewed changes

bobbinth reviewed Mar 13, 2024

View reviewed changes

Fumuran reviewed Mar 13, 2024

View reviewed changes

core/src/utils/mod.rs Outdated Show resolved Hide resolved

assembly/src/parser/grammar.lalrpop Show resolved Hide resolved

bobbinth reviewed Mar 14, 2024

View reviewed changes

Fumuran reviewed Mar 14, 2024

View reviewed changes

core/src/operations/decorators/mod.rs Outdated Show resolved Hide resolved

core/src/prettier/print.rs Outdated Show resolved Hide resolved

bitwalker force-pushed the bitwalker/assembler-refactor branch from 9d17463 to 3288cc5 Compare March 16, 2024 01:11

bitwalker mentioned this pull request Mar 16, 2024

chore: handle deprecations in winterfell 0.8.3 release 0xPolygonMiden/crypto#290

Merged

bitwalker force-pushed the bitwalker/assembler-refactor branch from 1891586 to b0a8361 Compare March 16, 2024 04:53

This was referenced Mar 17, 2024

feat: implement ilog2/clz/ctz/clo/cto instructions 0xPolygonMiden/compiler#150

Merged

docs: draft spec for mast binary format 0xPolygonMiden/compiler#132

Closed

Rename miden to miden_lib 0xPolygonMiden/miden-base#493

Closed

bitwalker force-pushed the bitwalker/assembler-refactor branch from b0a8361 to 8823033 Compare March 18, 2024 17:47

bobbinth mentioned this pull request Mar 18, 2024

fix: restore re-exports but using alloc 0xPolygonMiden/crypto#291

Closed

3 tasks

bitwalker and others added 19 commits April 29, 2024 01:25

fix: use u8 for index in debug.stack decorator

d64524d

fix: expose procedure.invoked/extend_invoked to downstream crates

cffff84

fix: replace some uses of text formatter with const_text

a6a1888

docs: comment formatting, section headers, minor typo fixes

55c73a3

docs: minor comment fixes

da0569e

chore: clean up some unused code, minor doc corrections

14b8c4e

refactor: move various modulegraph-related items into module_graph mo…

a002a85

…dule

refactor: break up module graph implementation more granularly

2628466

chore: remove code that was extracted to miden-formatting

4ff23af

fix: properly support distinction between errors and warnings

29180c7

docs: update documented behavior of 'exec'

69365d7

fix: add some callgraph tests, remove unused function, fix one bug re…

2676d21

…lated to cycle check

fix: address assorted review comments

55669cc

chore: clean up implementation a bit, address some review feedback

b46aaf0

chore: fix typos, comment formatting, add section headers

475dea1

chore: comment formatting and section headers

307a17d

fix: address a few more review comments

167b143

fix: a few small adjustments from review

aeb5269

fix: build errors due to conflict, address some clippy warnings

c639420

bitwalker force-pushed the bitwalker/assembler-refactor branch from bcfe0e9 to c639420 Compare April 29, 2024 05:39

bitwalker merged commit 356ba87 into next Apr 29, 2024
15 checks passed

bitwalker deleted the bitwalker/assembler-refactor branch April 29, 2024 15:47

This was referenced Apr 29, 2024

Use some nightly rustfmt features #1158

Closed

Fix duplicate procedure name issue. #1256

Closed

Fix unused imports checker #1205

Closed

bitwalker mentioned this pull request May 6, 2024

chore: handle assembler refactoring changes 0xPolygonMiden/compiler#181

Merged

mFragaBA mentioned this pull request May 20, 2024

bug: can't print account code of genesis accounts 0xPolygonMiden/miden-base#694

Closed

Dominik1999 mentioned this pull request Jun 3, 2024

Compiling transaction fails with multiple notes 0xPolygonMiden/miden-base#443

Closed

greenhat mentioned this pull request Jul 23, 2024

Test release process by publishing compiler and Miden SDK crates 0xPolygonMiden/compiler#246

Closed

		thiserror = { git = "https://github.com/bitwalker/thiserror", branch = "no-std" }
		miette = { git = "https://github.com/bitwalker/miette", branch = "no-std" }

		@@ -1,6 +1,6 @@

		Initializes four memory addresses, provided for storing initial 4x4 blake3<br />state matrix ( i.e. 16 elements each of 32 -bit ), for computing blake3 2-to-1 hash<br /><br />Expected stack state:<br /><br />[state_0_3_addr, state_4_7_addr, state_8_11_addr, state_12_15_addr]<br /><br />Note, state_`i`_`j`_addr -> absolute address of {state[i], state[i+1], state[i+2], state[i+3]} in memory \| j = i+3<br /><br />Final stack state:<br /><br />[...]<br /><br />Initialized stack state is written back to provided memory addresses.<br /><br />Functionally this routine is equivalent to https://github.com/itzmeanjan/blake3/blob/f07d32e/include/blake3.hpp#!L1709-L1713<br />

feat(masm): refactor assembler into a more compiler-like architecture #1277

feat(masm): refactor assembler into a more compiler-like architecture #1277

Conversation

bitwalker commented Mar 8, 2024 • edited Loading

Summary

Changes to MASM Syntax

Changes to MAST/Assembler

Diagnostics

Defining an error

MASM Serialization

TODO

TL;DR

plafer left a comment

Choose a reason for hiding this comment

bobbinth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hackaugusto Mar 11, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobbinth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fumuran left a comment

Choose a reason for hiding this comment

bobbinth left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bitwalker commented Apr 29, 2024

bitwalker commented Mar 8, 2024 •

edited

Loading

hackaugusto Mar 11, 2024 •

edited

Loading