Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow multi-module kernels #1436

Open
bobbinth opened this issue Aug 7, 2024 · 4 comments
Open

Allow multi-module kernels #1436

bobbinth opened this issue Aug 7, 2024 · 4 comments
Labels
assembly Related to Miden assembly
Milestone

Comments

@bobbinth
Copy link
Contributor

bobbinth commented Aug 7, 2024

Currently, a kernel library can be assembled only from a single module. For complex kernels (e.g., the transaction kernel in miden-base), this means that to avoid having one giant file we need to introduce a separate library which the kernel then relies on. It would be much more convenient to be able to build a kernel library from multiple modules.

One open question is how to identify a set of procedures exported from the module library vs. exported from modules to be used within the kernel internally. There are probably several options here, a couple that come to mind:

  • Introduce "internal" procedure visibility level. Such procedures would be visible to other modules within the library, but would not be exported out of the library.
  • Mark kernel procedures separately somehow. This could be done via special keword (e.g., sys.foo instead of export.foo) or via annotations described in #1434.
@bobbinth bobbinth added the assembly Related to Miden assembly label Aug 7, 2024
@bobbinth bobbinth added this to the v0.11.0 milestone Aug 7, 2024
@bitwalker
Copy link
Contributor

bitwalker commented Aug 7, 2024

Just one point to clarify: we can create compiled libraries with a subset of exports of all the modules contained in that library, i.e. only the modules passed explicitly to assemble_library have their exports exported from the library.

So I think what we need is actually a way, in the Miden Assembly syntax, to mark a procedure as having syscall visibility. Right now, any export from a kernel module is assumed to have syscall visibility, but we wouldn't want that for "protected" kernel procedures, i.e. those with public visibility exported for use within the kernel only, but we currently have no mechanism by which to support this.

I think we can solve this in one of two ways:

  1. Via annotations, once Add support for procedure annotations to the assembler #1434 is implemented, by decorating exports with an annotation that indicates that they are not to be treated as syscall-able, despite being exported from the kernel. This is a bit hacky IMO, but is an option.
  2. Handle this as part of a general improvement in the Miden Assembly syntax for procedures, so we can handle visibility more precisely, as well as integrate procedure locals and types, allowing us to generate signature metadata for MASM procedures programatically. See below for my proposed syntax:
# The grammar rule for `type` is essentially:
#
#   TYPE := "type" <id:ID> => Type::opaque(id)
#         |   "type" <id:ID> "=" <val:TYPE_VAL> => Type::new(id, val)
#         |   "type" <id:ID> "<" <params:Comma<ID>> ">" "=" <val:TYPE_VAL> => Type::generic(id, params, val)
#
#   TYPE_VAL := <id:ID> => ConcreteType::Ref(id)
#             |  <id:ID> "<" <params:Comma<ID>> ">" => ConcreteType::Instance(id, params)
#             |  "(" <elems:(<TYPE_VAL> ",")> ")" => ConcreteType::Tuple(elems)
#             |  "[" <id:ID> ";" <len:INT> "]" => ConcreteType::Array(id, len)
#
# The built-in "type" system of Miden Assembly would be described as follows:

type u32
type felt
type word = [felt; 4]
type ptr<T> = u32

# A few examples of the new procedure syntax:

# A private procedure
#
# NOTE: Some kind of signature terminator is needed, so that optional parts 
# can be elided, in this case, a return value.
#
# Procedure parameters are expected to appear in order on the operand stack,
# likewise with return values. Parameters are always consumed by the procedure,
# and a procedure that does not consume its parameters must specify them as
# return values in the same order
proc.foo():
    ...
end

# An exported procedure, with parameters and a result
pub.add(a: u32, b: u32) -> u32:
  ...
end

# A procedure exported with `syscall` visibility, with an example of using the
# type system to better describe the inputs/outputs.
type Digest = word

# Note that we're returning a "tuple" here, this is simply how we represent
# returning multiple heterogenous results. The semantics of this type are
# that the elements of the tuple type are laid out on the operand stack in
# order of appearance. Each type is aligned to a field element boundary.
pub(extern "syscall").get_assets_info() -> (Digest, u32):
  ...
end

# Lastly, an example of a procedure with locals. I'd like to make locals typed
# as well, perhaps even named, and make the `local` instructions in MASM
# more like macros, which are lowered to the more primitive forms by the
# parser. This would make locals less error-prone, and more semantically
# meaningful. For now though, I'm preserving the status quo.
pub.random.2() -> felt:
    ...
end

Aside from the type signature aspect, the only other key difference is replacing export with a more precise visibility, based on Rust's syntax somewhat, but I'm not too picky about that part of it. The key thing thing about the above is that an exported procedure is always pub, but the visibility can be constrained. This makes searching/reading code easier.

I believe the above could be implemented in a backwards-compatible fashion, as a result of using pub for exports, it would implicitly parse using the new grammar. The only hairy bit is with local procedures, which would need to dispatch differently based on whether it is "old-style" or not, but a lexer hack can be used to solve this pretty easily.

With a type system in place, we can generate better documentation, and we can generate the necessary metadata to be able to bind to MASM procedures from higher-level languages, without having to manually maintain those type definitions separately.

We don't need to implement any kind of type checker, but it would be an option, as this would allow us to infer the contents of the operand stack, and via abstract interpretation, it would be possible for us to determine whether there are any invalid instruction uses, incorrect arguments for exec, etc., and whether the correct values are being returned on the stack. We could use a success-typing methodology, so that we only raise errors when we know for sure that an operation would fail due to type errors. Annotations could be used to disable warnings/errors on procedure-wide or module-wide basis.

@bobbinth
Copy link
Contributor Author

bobbinth commented Aug 7, 2024

I like this approach but a few questions/comments.

First, I wonder if we should change proc to mean something like "internal" procedure. i.e., procedures which can be used anywhere in the current assembly context but are not exported publicly (i'm not seeing too much value in having procedure which are private to a given module - they all end up in the same MAST forest anyway).

Second, I'd probably prefer single-word visibility modifiers.

So, overall, I'm thinking we could have something like this:

#! internal procedure which can be invoked locally but also from other modules in the current
#! assembly context.
proc.foo:
    ...
end

#! a procedure exported from a library; having such procedures in executable programs or kernel
#! modules would be an error.
pub.bar:
    ...
end

#! an exported kernel procedure; such procedures can exist only in kernel modules.
sys.baz
    ...
end

The one thing missing from the above is ability to specify whether a given procedure is supposed to be call-ed or exec-ed (and I think this would apply only to pub procedures, though maybe to proc procedures as well). This could be done via an annotation, but maybe also there is a good way to specify it using syntax similar to the above.

Regarding adding signature info to procedures, I like this as well. The only potential concern that I have is that people may confuse these as binding signatures while we view them more like documentation (i.e., as you mentioned, these signatures would not be enforced). If the signature is described in the comments, it is much more likely to be interpreted as documentation. But maybe this is OK and the benefits of this clean syntax outweigh the potential confusion downsides.

@bitwalker
Copy link
Contributor

First, I wonder if we should change proc to mean something like "internal" procedure. i.e., procedures which can be used anywhere in the current assembly context but are not exported publicly (i'm not seeing too much value in having procedure which are private to a given module - they all end up in the same MAST forest anyway).

I tend to evaluate a lot of this in the context of what I'd expect from a typical assembly language and associated linker. A standard linker, e.g. lld, supports symbols with a couple different linkage types/visibilities:

  • private, i.e. a definition which is only reachable from within the same module, basically what proc is today
  • external, i.e. a declaration whose definition resides elsewhere; OR a definition which is accessible globally, i.e. what export is today
  • internal/hidden, i.e. similar to private, but can be linked against from any module in the same object. This linkage/visibility type is a bit weird, and there are actually a bunch of variations on it, depending on the semantics needed (mostly due to C/C++).

I think there is a pretty clear value proposition to having symbols which are clearly private, while also being able to share code within the same library, without exposing those procedures in the public API of the library. It is more about conveying intent, and being able to better represent the boundaries of your API, than it is about literally protecting the code, at least in my opinion.

I could be convinced to treat proc as protected, but since we aren't strictly generating MASM, and are actually writing code in it by hand, I'd lean more towards enabling us to represent things more fine-grained, so that we have better tooling for working in MASM by hand. If MASM was just something we emitted from a code generator, there's a number of changes I'd make, but it is in kind of a weird place where it is also the language we're writing some critical libraries in by hand.

Second, I'd probably prefer single-word visibility modifiers.

There are pretty significant downsides to this IMO:

  1. Readability: our eyes get good at scanning for function definitions by recognizing the same keywords appearing at the beginning of each one. The more unique ways you have of starting a definition, the harder it is to scan the code, and the more ad-hoc it feels. Some of this is in the eye of the beholder, but there is a reason that most successful programming languages stick to pub fn/fn (or their more verbose counterparts, public function/function), or def/defp, etc., and familiarity and regularity are the main ones.
  2. Uniformity: the fewer special forms you have in a language, the easier it is to learn, read, and write. It also results in a more regular grammar, which tends to result in more compact generated parsers, and are more efficient to parse.

In any case, I would recommend sticking to one special form for procedure definitions, with an optional visibility specifier - it is dead simple to parse, and will make the language feel more familiar to those coming from typical programming languages that have to get their hands dirty with MASM.

#! internal procedure which can be invoked locally but also from other modules in the current
#! assembly context.

Could you clarify what you mean by "assembly context" here, in terms of inputs to the Assembler? I'm assuming you mean any of the AST modules provided as inputs.

The one thing missing from the above is ability to specify whether a given procedure is supposed to be call-ed or exec-ed (and I think this would apply only to pub procedures, though maybe to proc procedures as well). This could be done via an annotation, but maybe also there is a good way to specify it using syntax similar to the above.

I think annotations are probably the most appropriate way to represent that particular distinction, since it is only actionable from a high-level language compiler, though I can think of some ways that one could use abstract interpretation to determine if any inputs from the caller violate the conventions (i.e. mem_load'ing an operand that was passed as an argument, when the procedure is decorated with #[callable] (or whatever), could result in an error diagnostic being raised).

Regarding adding signature info to procedures, I like this as well. The only potential concern that I have is that people may confuse these as binding signatures while we view them more like documentation (i.e., as you mentioned, these signatures would not be enforced). If the signature is described in the comments, it is much more likely to be interpreted as documentation. But maybe this is OK and the benefits of this clean syntax outweigh the potential confusion downsides.

Well, I said that they would not have to be enforced, but obviously it would be desirable to do so once we have them. The main reason I want it, is to generate bindings for the compiler (it is in fact critical to be able to do so against hand-written MASM modules if you want to call them from e.g. Rust). So to that end, it is actually best that people think of them as binding, and not simply as documentation. We can definitely drive type checking off of them, what I was mostly trying to do was convey that it isn't essential to the implementation of the type system and syntax for it, but instead a natural consequence of having types, is that we will want to start using them to help us check our programs. In other words, I didn't want to make the implementation of it contingent on the type checker being implemented at the same time, since backwards-compatibility would be a necessary aspect of the design (hence the system would necessarily be gradually-typed, with a success typing approach, so as to maximally benefit from types when available, without requiring them to be added everywhere at once).

@bobbinth
Copy link
Contributor Author

I think there is a pretty clear value proposition to having symbols which are clearly private, while also being able to share code within the same library, without exposing those procedures in the public API of the library. It is more about conveying intent, and being able to better represent the boundaries of your API, than it is about literally protecting the code, at least in my opinion.

I could be convinced to treat proc as protected, but since we aren't strictly generating MASM, and are actually writing code in it by hand, I'd lean more towards enabling us to represent things more fine-grained, so that we have better tooling for working in MASM by hand. If MASM was just something we emitted from a code generator, there's a number of changes I'd make, but it is in kind of a weird place where it is also the language we're writing some critical libraries in by hand.

I am fine with having privately-scoped procedures - though, one thing I was trying to avoid is overloading the meaning of pub (specifically, having it mean "public" for regular libraries and "internal" for kernel libraries) or introducing another modifier for internal procedures.

In any case, I would recommend sticking to one special form for procedure definitions, with an optional visibility specifier - it is dead simple to parse, and will make the language feel more familiar to those coming from typical programming languages that have to get their hands dirty with MASM.

OK - I'm convinced. Let's use only proc and pub with an optional modifier - though, I'd use a slightly different format here (see the sketch later in this comment).

#! internal procedure which can be invoked locally but also from other modules in the current
#! assembly context.

Could you clarify what you mean by "assembly context" here, in terms of inputs to the Assembler? I'm assuming you mean any of the AST modules provided as inputs.

Yes - correct, I meant the AST modules only (i.e., not including any compiled libraries).

The one thing missing from the above is ability to specify whether a given procedure is supposed to be call-ed or exec-ed (and I think this would apply only to pub procedures, though maybe to proc procedures as well). This could be done via an annotation, but maybe also there is a good way to specify it using syntax similar to the above.

I think annotations are probably the most appropriate way to represent that particular distinction, since it is only actionable from a high-level language compiler, though I can think of some ways that one could use abstract interpretation to determine if any inputs from the caller violate the conventions (i.e. mem_load'ing an operand that was passed as an argument, when the procedure is decorated with #[callable] (or whatever), could result in an error diagnostic being raised).

I actually think that we can enforce some checks on this in the assembler as well. For example, in the MASM code, if somebody tries to exec.foo when foo was declared as only "callable", we can raise an error.

Putting the above together, here is another sketch of how things could look like:

#! A private procedure which can be invoked only from within the module in which it was defined.
#!
#! Such procedures can be invoked only using the `exec` instruction.
proc.foo:
    ...
end

#! An internal procedure which can be invoked locally but also from other modules in the current
#! assembly context.
#!
#! Such procedures can be invoked only using the `exec` instruction.
proc(internal).foo:
    ...
end

#! A procedure exported from a library and is intended to be invoked using the `exec` instruction.
#!
#! Having such procedures in executable or kernel modules would be an error.
export.foo:
    ...
end

#! A procedure exported from a library and is intended to be invoked using the `call` instruction.
#!
#! Having such procedures in executable or kernel modules would be an error.
export(call).foo:
    ...
end

#! An exported kernel procedure; such procedures can exist only in kernel modules.
#!
#! Such procedures can be invoked only using the `syscall` instruction.
export(sys).foo:
    ...
end

A couple of additional comments on this:

  1. I've used export instead of pub above as this basically makes the changes almost entirely backward-compatible with what we have now. But if there is a strong preference for using pub, I'm fine using that too.
  2. In general, I like that we'd have a clear way of specifying what is exported from a library (i.e., via pub/export) and what is internal to the library (i.e., proc).
  3. As mentioned above, I'm still not sure that we actually need private procedures. But I guess it doesn't hurt to have them.

Well, I said that they would not have to be enforced, but obviously it would be desirable to do so once we have them. The main reason I want it, is to generate bindings for the compiler (it is in fact critical to be able to do so against hand-written MASM modules if you want to call them from e.g. Rust). So to that end, it is actually best that people think of them as binding, and not simply as documentation. We can definitely drive type checking off of them, what I was mostly trying to do was convey that it isn't essential to the implementation of the type system and syntax for it, but instead a natural consequence of having types, is that we will want to start using them to help us check our programs. In other words, I didn't want to make the implementation of it contingent on the type checker being implemented at the same time, since backwards-compatibility would be a necessary aspect of the design (hence the system would necessarily be gradually-typed, with a success typing approach, so as to maximally benefit from types when available, without requiring them to be added everywhere at once).

Makes sense - though, I'd probably create a separate issue for procedure signatures as I think we'd tackle procedure visibility modifiers and more comprehensive procedure signatures in different PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
assembly Related to Miden assembly
Projects
None yet
Development

No branches or pull requests

3 participants