Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0178: SBPF Static Syscalls #178

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
146 changes: 146 additions & 0 deletions proposals/0178-static-syscalls.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,146 @@
---
simd: '0178'
title: SBPF Static Syscalls
authors:
- Alessandro Decina
- Alexander Meißner
- Lucas Steuernagel
category: Standard
type: Core
status: Review
created: 2024-09-27
---

## Summary

This SIMD introduces a new instruction syscall in the SBPF instruction set to
represent syscalls. Such a change aims to remove relocations when resolving
syscalls and simplify the instruction set, allowing for the straightforward
differentiation between external and internal calls. In addition, it proposes
a new `return` instruction to supersede the `exit` instruction.

## Motivation

The resolution of syscalls during ELF loading requires relocating addresses,
which is a performance burden for the validator. Relocations require an entire
copy of the ELF file in memory to either relocate addresses we fetch from the
symbol table or offset addresses to after the start of the virtual machine’s
memory. Moreover, relocations pose security concerns, as they allow the
arbitrary modification of program headers and programs sections. A new
separate opcode for syscalls modifies the behavior of the ELF loader, allowing
us to resolve syscalls without relocations.
Comment on lines +24 to +31

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉


## New Terminology

None.

## Detailed Design

The following must go into effect if and only if a program indicates the SBPF
version XX or higher in its ELF header e_flags field, according to the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify which version XX is?

specification of SIMD-0161.

### New syscall instruction

We introduce a new instruction in the SBPF instruction set, which we call
`syscall`. It must be associated with all syscalls in the SBPF format. Its
encoding consists of an opcode `0x95` and an immediate, which must refer to a
previously registered syscall. For more reference on the SBF ISA format, see
the
[spec document](https://github.com/solana-labs/rbpf/blob/main/doc/bytecode.md).

For simplicity, syscalls must be represented as a natural number greater than
zero, so that they can be organized in a lookup table. This choice allows for
quick retrieval of syscall information from integer indexes. An instruction
`syscall 2` must represent a call to the function registered at position two
in the lookup table.

Consequently, system calls in the Solana SDK and in any related compiler tools
must be registered as function pointers, whose address is a natural number
greater than zero, representing their position in a syscall lookup table. The
verifier must enforce that the immediate of a syscall instruction points to a
valid syscall, and throw `VerifierError::InvalidFunction` otherwise.

This new instruction comes together with modifications in the verification
phase. `call imm` (opcode `0x85`) instructions must only refer to internal
calls and its immediate field must only be interpreted as a relative address
to jump from the program counter.
Comment on lines +65 to +67

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean there is no longer a need to hash the immediates?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does and is the intention.


### New return instruction

The opcode `0x9D` must represent the return instruction, which supersedes the
`exit` instruction. The opcode (opcode `0x95`), previously assigned to the
`exit` instruction, must now be interpreted as the new syscall instruction.
Comment on lines +71 to +73
Copy link

@topointon-jump topointon-jump Oct 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation behind changing this?

Copy link

@topointon-jump topointon-jump Nov 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, changing the name from exit to return when it is the same instruction could be confusing. I have already seen this confused in other SIMDs.

Copy link

@topointon-jump topointon-jump Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note - we should bundle large sets of proposed ISA changes together into the same SBPF version upgrade, so that clients don't have to support a mis-mash of ISAs based on feature flags. I believe this is the intent of #161, but just re-iterating 🙏

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation is that exit was occupying the slot in the instruction class for controlflow with immediate values and it does not take an immediate value. The new syscall opcode however does, so it took its place.


The verifier must detect an SBPF V1 program containing the `0x9D` opcode and
throw a `VerifierError::UnknowOpCode`. Likewise, if, by any means, a V1
program reaches the execution stage containing the `0x9D` opcode, an
`EbpfError::UnsupportedInstruction` must be raised.

### Syscall numbering convention

Syscalls must be represented by a unique integer to maintain a dense lookup
table data structure for indexing and dispatch. For a clear correlation
between the existing syscalls and their respective identification number,
syscalls must strictly follow the numbering below.

| Syscall name | Number |
|------------------------------------------|----------|
| abort | 1 |
| sol_panic_ | 2 |
| sol_memcpy_ | 3 |
| sol_memmove_ | 4 |
| sol_memset_ | 5 |
| sol_memcmp_ | 6 |
| sol_log | 7 |
LucasSte marked this conversation as resolved.
Show resolved Hide resolved
| sol_log_64 | 8 |
| sol_log_pubkey | 9 |
| sol_log_compute_units_ | 10 |
| sol_alloc_free_ | 11 |
| sol_invoke_signed_c | 12 |
| sol_invoke_signed_rust | 13 |
| sol_set_return_data | 14 |
| sol_get_return_data | 15 |
| sol_log_data | 16 |
| sol_sha256 | 17 |
| sol_keccak256 | 18 |
| sol_secp256k1_recover | 19 |
| sol_blake3 | 20 |
| sol_poseidon | 21 |
| sol_get_processed_sibling_instruction | 22 |
| sol_get_stack_height | 23 |
| sol_curve_validate_point | 24 |
| sol_curve_group_op | 25 |
| sol_curve_multiscalar_mul | 26 |
| sol_curve_pairing_map | 27 |
| sol_alt_bn128_group_op | 28 |
| sol_alt_bn128_compression | 29 |
| sol_big_mod_exp | 30 |
| sol_remaining_compute_units | 31 |
| sol_create_program_address | 32 |
| sol_try_find_program_address | 33 |
| sol_get_sysvar | 34 |
| sol_get_epoch_stake | 35 |
| sol_get_clock_sysvar | 36 |
| sol_get_epoch_schedule_sysvar | 37 |
| sol_get_last_restart_slot | 38 |
| sol_get_epoch_rewards_slot | 39 |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching that. Fixed!

| sol_get_fees_sysvar | 40 |
LucasSte marked this conversation as resolved.
Show resolved Hide resolved
| sol_get_rent_sysvar | 41 |
|------------------------------------------|----------|

## Alternatives Considered

None.

## Impact

The changes proposed in this SIMD are transparent to dApp developers. The
compiler toolchain will emit correct code for the specified SBF version.
Static syscalls obviate relocations for call instructions and move the virtual
machine closer to eliminating relocations altogether, which can bring
considerable performance improvements.

## Security Considerations

None.
Loading