-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIP: introducing the Filecoin Virtual Machine. #288
Conversation
f46de6c
to
0897e69
Compare
0897e69
to
b27ab8c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The IPLD model raises some interesting questions. My impression is that it's landed somewhere in an awkward middle ground, where the encoding is done in actors, but the VM must be able to interpret that encoded data and fundamentally models reachability.
I think either extreme of the VM being oblivious, or the VM actually implementing the IPLD model could be better. Oblivious means actors APIs and state are just bytes, the VM is simpler, but actors would have to do some work to ensure they don't store unreachable blocks, etc. The VM actually implementing IPLD is very opinionated (which... I think we want?) but efficient and a great programming model for actors, at the cost of more complex VM. The middle ground looks like it's going to need most of the complexity anyway, but with added complexity/cost of needing to validate encoding done in the actors.
FIPS/fip-introducing_fvm.md
Outdated
Wasm with 32-bit memories supports 64-bit integer types and arithmetic, it just | ||
can't address memory beyond 4GiB. This size is deemed sufficient for blockchain | ||
state transition use cases, but may be limiting for future and more generalised | ||
applications of the FVM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For discussion: this might limit possible universality of FVM as an execution environment outside the blockchain state transition function. For example, an FVM rollup could have far greater demands on hardware, including memory, to enable much more complex and resource-intensive programs to be written, while retaining Filecoin's blockchain security. Similarly, for compute-over-data projects there's a potential world in which something very like the FVM is the target for code to run inside compute providers, and support very large memory execution environments. It could be very powerful to have seamless, binary interop between rollups and off-chain compute. Taken to the extreme, it might be that we develop a zk-FVM which could do both.
I get the current requirements are rightly focussed on the blockchain state use case, but is there a big gain there from taking the 32-bit route? Is there a possible path to 64-bit in the future (like a compiled contract indicating the wordsize it's built for)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64-bit memory addressing is a proposed extension to the WASM spec; it doesn't seem to be final yet. This is the latest I could find on a quick search: https://twitter.com/wvo/status/1362458368107941888.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that other use cases that need virtualized runtimes would benefit from 64-bit addressing. However, I'm more skeptical on whether the FVM is the exact virtualized runtime they'd adopt. The FVM is built for state tree manipulation within a blockchain, uses blockchain messages as units of execution, and exposes blockchain-related operations through syscalls.
However, I do feel that the FVM is an instantiation of a "thin waist" interplanetary VM substrate that is WASM-based and IPLD-native. In the future, I would be agreeable to formally definining such VM substrate (by extracting the relevant parts from this spec), and rebasing the FVM spec on top of it. Would probably get more groups involved.
Specialized VMs for compute-over-data (off-chain), Lurk, etc. would also be instantiations of that VM, albeit with support for different WASM features (memory64, SIMD, interface types, threads, etc.) and different syscalls available to hosted code.
FIPS/fip-introducing_fvm.md
Outdated
The signature of an actor entrypoint is: | ||
|
||
```rust | ||
pub fn invoke(params_block_id: u32) -> u32 { } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To implement internal dispatch in an extensible way, compatible with standard APIs etc, I guess that actors will all need to define something like:
struct Params {
MethodName string
Params []byte // CBOR-ended, to be decoded by receiver
}
- An SDK can wrap this, but for interoperability we should probably define a standard for how actors should interpret parameters.
- This would be incompatible with the current built-in actors' parameter shapes.
This is good in a sense of giving actors full control over their API (they don't even have to use IPLD/CBOR), but a bit awkward. For ease of use and better interop between the built-in and new actors, perhaps we should consider the VM taking a slightly more active role in dispatch? E.g. the VM could interpret that params
struct above, along with the method number, and then invoke the target actor with something like invoke(method_name, method_params)
? Perhaps actors could declare which calling convention they use, leaving room for others (including the "raw" method currently specced)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We definitely need sane defaults, but I'd really like actors to be able to specify their own ABI and API via some form of schema. For example:
type Bar struct {
...
}
type MyActor interface {
func Transfer(foo int, ...) Bar (id:1)
} representation "inline"
The interface would likely compile to some mapping between methods and message "templates". E.g.:
Transfer(foo)
->{"method": 1, "foo": $foo}
Transfer(foo) ->
(1, $foo)`
This gives us some flexibility as actors can use method numbers, names, hashed names, polymorphic whatever, etc.
That being said... as long as we have method numbers, there's no reason not to pass them in invoke. This is more of a "if we find a nice way to get rid of them".
E.g. the VM could interpret that params struct above, along with the method number, and then invoke the target actor with something like invoke(method_name, method_params)?
Note: passing data into actors via invoke is non-trivial because the actor can't access the host's memory. The current approach lets the actor "read" the data from the system, giving it a chance to allocate the necessary space.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Encapsulating all call data into a single object enables call options like proxying/delegation, pattern matching, etc. Furthermore, having to choose how to identify methods in an actor whose code can be updated (assuming we do commit to implement code updates) makes the user think upfront about how they'd identify/deal with each invocable behaviour across updates.
Long term, the idea is to have an IPLD-based IDL that actors can advertise as metadata to enable discoverability by other developers (e.g. for composabilitiy) and by tooling (e.g. message inspectors). The IDL will express which behaviours an actor offers, its interface (inputs, outputs, errors), and how those behaviours can be called (call patterns). Because the method number is so ingrained in the protocol, it's likely that it will remain there forever, and it'll be a glorified call pattern available for users.
FIPS/fip-introducing_fvm.md
Outdated
|
||
/// Creates a new block, returning the block's ID. The block's children must be in the reachable | ||
/// set. The new block isn't added to the reachable set until the CID is computed. | ||
pub fn create(codec: u64, data: *const u8, len: u32) -> Result<u32>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you confirm my inference here that encoding of the block is happening in contract space, prior to this call? Thus the codec parameter is expected to match the data (how is it checked?), but isn't instructing the VM how to do the encoding internally. So any (?) codec, including future ones is allowed? This will make tracking the links difficult.
It appears that we must specify a supported list of codecs that the VM has code to read. And if we're doing that, it's a small step to also do the encoding in the VM, rather than in WASM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPLD blocks are encoded actor-side and checked VM-side. For now, CBOR will be the only expected codec but we expect to add support for other codecs later.
This will make tracking the links difficult.
At a minimum, the VM will need a single function (either native or some WASM module) per-codec to extract links from blocks.
And if we're doing that, it's a small step to also do the encoding in the VM, rather than in WASM.
Unfortunately not. Encoding VM-side is non-trivial as we need to get the data across the FFI boundary somehow. We'd need to specify something like serde's builder pattern (and will lose a fair amount of performance by repeatedly crossing the FFI boundary and not being able to inline anything).
I responded more fully below in #288 (comment).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sidenote: the codec passed here is later used when calculating the CID for the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A visitor pattern over FFI sounds slow. The VM doing the encoding doesn't seem plausible at all, unless it has knowledge of the memory layout and data types of the programming language the actor was developed with, which is undesirable. Alternative we could marshal the data across the boundary in some canonical form.... but wait, isn't that stepping on the toes of IPLD? 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A canonical form doesn't sound that bad to me. E.g. something like the Python struct
module.
I don't think IPLD is trying to be an efficient in-memory format. It's a storage/distribution format.
FIPS/fip-introducing_fvm.md
Outdated
pub fn receiver() -> Result<u64>; | ||
|
||
/// Returns the method number from the message. | ||
pub fn method_number() -> Result<u64>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about passing this as parameter to invoke
instead? Do we want to be opinionated about its use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We didn't do that because we'd like to find a way for this to go away (using only the parameters). But if that's too difficult (from a network upgrade standpoint), I agree that passing this to invoke makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, I meant passing it to invoke and removing this method, thus making it harder for anyone to misuse and setting a path for it to go away.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For clarity, I meant passing it to invoke and removing this method, thus making it harder for anyone to misuse and setting a path for it to go away.
Well, we'd like to make the method number itself go away, not just the syscall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anorth That would further bless the "method number" call pattern into the FVM, something we're trying to avoid entirely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with trying to get rid of it entirely. I don't see how having a syscall that's accessible from any actor code all the time is less of a blessing than passing it only to a single entry point method and forcing the callee to do more complex programming if they want to propagate it. But ok, this is your design call.
FIPS/fip-introducing_fvm.md
Outdated
IPLD blocks are dynamically-sized blobs of data. When loading a block by CID, | ||
the size is unknown, so the actor cannot allocate memory upfront. Moreover, | ||
host-side memory is inaccessible to Wasm code, so it's not possible for the | ||
Kernel to return a pointer to host-side memory for Wasm code to read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the VM decodes, the actor can allocate the structure to decode into up-front.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without a lot of back-and-forth. I.e.:
- Actor: Load CID1.
- VM: Found a list of length 5.
- Actor: Ok.
- VM: The first item is a number.
- Actor: Write the number to
*some_ptr
. - VM: The second item is a string of length 10.
- Actor: write it to
*some_other_ptr
. - etc, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I see why that API would be troublesome, but can't we do better? How about:
- Actor: Load CID1, expecting schema A
- VM: Ok, here's a pointer to newly allocated memory, which you can cast
Obviously I glossed over details, but would it be unreasonable for the VM to, say, allocate memory dynamically to decode into? Maybe with a gas limit for the call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not without making a lot of assumptions about the actor's programming language (or defining some form of flat-buffer abstraction).
- Currently, the VM never allocates memory because the VM knows nothing about the actor's memory layout (i.e., the actor "owns" its entire memory). To allocate, it would have to call recursively call back into the actor (which I'd like to avoid if possible).
- Different programming languages have different concepts of strings, vectors, data layout, etc. We can pass trivial structs (structs of ints, arrays of ints, etc.) back and forth, but nothing more complex without encoding through something like CBOR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ack that it's not a simple problem to solve a language-independent in-memory form for this. Fine to leave it as something to maybe investigate later.
FIPS/fip-introducing_fvm.md
Outdated
## Backwards Compatibility | ||
|
||
Existing built-in actor and system functionality will not be affected by the | ||
introduction of the Filecoin Virtual Machine. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there's more to say here. Perhaps spell out that this FIP actually doesn't represent a protocol change, only a suggested re-architecture. Point to the future FIPs that will break compatibility.
FIPS/fip-introducing_fvm.md
Outdated
## Test Cases | ||
|
||
FVM implementations can subject themselves to the FVM-specific test vectors | ||
hosted at: https://github.com/filecoin-project/fvm-test-vectors. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link doesn't resolve.
Because this proposal doesn't actually change the protocol, should we also point to (and execute) all the existing test vectors? Also all the existing actor unit and integration tests?
Worth noting here too that the roll-out plan is to demonstrate compatibility with the existing chain, resyncing since the last actors version. Actually, I just realised this FIP doesn't address the intention to trigger a network upgrade that switches the canonical spec of the actors to the Rust/WASM version? Am I right to infer that belongs here (because the next FIP will be the gas model change)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link doesn't resolve.
Try again? It does for me.
Because this proposal doesn't actually change the protocol, should we also point to (and execute) all the existing test vectors? Also all the existing actor unit and integration tests?
- Most of the existing test vectors tested prior network versions. The FVM only actors v6 (and soon, v7).
- Some of the test vectors were extracted from specs-actors.
- In terms of integration tests, lotus integration is underway but the tests won't pass until we have v7 support.
Actually, I just realised this FIP doesn't address the intention to trigger a network upgrade that switches the canonical spec of the actors to the Rust/WASM version? Am I right to infer that belongs here (because the next FIP will be the gas model change)?
Raul is working on a separate FIP proposing a network upgrade. This FIP is setting design direction: switching Filecoin over to a WASM/IPLD VM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Link works now.
Does the Rust actors codebase faithfully replicate all the unit tests from the Go one? Or did it rely on the comprehensiveness of the Go code for logical correctness checking, and focus on conformance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rust actor codebase has very few unit tests. Pretty much all of our test coverage currently comes from extracted test vectors (both from the specs-actors tests and from the network).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a problem we should solve on the road to M1. Given that Rust actors are likely becoming the canonical actors, they should be as well-tested as Go actors, if not better.
Specifically, it must be able to:
If we keep everything inside actors, managing IPLD data becomes tricky:
Doing all IPLD operations outside is more workable, but will likely lead to worse performance:
The third option is to implement the codecs as separate WASM modules. That way:
|
|
||
- Explicit messages are sent by secp256k1 or BLS account actors, appear on-chain | ||
when executed, and carry a signature to prove their authenticity. | ||
- Implicit messages are generated by the system, do not appear on chain, and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Projects such as Lily (and consumers of its data) must be able to observe implicit and internal messages (e.g. sends from multisig actors that do not appear on chain). Currently this is handled via the ExecMonitor
interface. Can we ensure the switch to the FVM doesn't break this observability requirement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I investigated this, and concluded that the FVM doesn't need to know about the ExecMonitor
callback. This is entirely handled at the node level:
- on every chain message application: https://github.com/filecoin-project/lotus/blob/9ec1abf8808422366443e3bea06973a23e644a2c/chain/consensus/filcns/compute_state.go#L197
- on cron tick: https://github.com/filecoin-project/lotus/blob/9ec1abf8808422366443e3bea06973a23e644a2c/chain/consensus/filcns/compute_state.go#L130
- on reward tick: https://github.com/filecoin-project/lotus/blob/9ec1abf8808422366443e3bea06973a23e644a2c/chain/consensus/filcns/compute_state.go#L230
However, the FVM does need to generate execution traces, but I'd be inclined to say that this is an optional feature which Lotus supports. I'm not sure we need to specify it in the spec, but I'll give it some extra thought.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tracking here now: filecoin-project/ref-fvm#318.
a6e8820
to
5d7d0cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my FIP editor hat, approving this as a well-formed and -motivated proposal.
As a note to myself and other collaborators, almost all the comments made here would have been better made in the associated discussion thread #287. That way, the discussion would remain discoverable and in-flight while the FIP progresses. We should attempt to limit comments on FIP documents to those about the appropriateness, form and completeness of a proposal, rather than its substantive content.
The first of several FIP submissions concerning the introduction of the Filecoin Virtual Machine as the new execution layer of the Filecoin blockchain.
This is an early draft that needs some refinement.
See discussion at #287.
Thanks to @Stebalien and @anorth for early reviews.