Skip to content

Commit

Permalink
Clarifications and bug fixes (#4773)
Browse files Browse the repository at this point in the history
  • Loading branch information
gcolvin authored and pull[bot] committed Aug 27, 2023
1 parent 7e39d87 commit fb520e2
Showing 1 changed file with 30 additions and 29 deletions.
59 changes: 30 additions & 29 deletions EIPS/eip-3779.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,14 @@ Dynamic jumps, where the destination of a `JUMP` or `JUMPI` is not known until r

#### *Dynamic Jumps, Static Jumps, and Subroutines*

Dynamic jumps need not always impede control-flow analysis. In the simplest and most common case
The easiest thing to do would be to just deprecate `JUMP` and `JUMPI`, but dynamic jumps need not always impede control-flow analysis.

Consider the simplest and most common case.
```
PUSH address
JUMP
```
is effectively a static jump.
This is effectively a static jump.

Another important use of `JUMP` is to implement the return jump from a subroutine. So consider this example of calling and returning from a minimal subroutine:
```
Expand All @@ -65,7 +67,7 @@ SQUARE:
swap1
jump
```
The return address -`RTN_SQUARE` - and the destination address - `SQUARE` - are pushed on the stack as constants and remain unchanged as they move on the stack, such that only those constants are passed to each `JUMP`. They are effectively static. *We do not need unconstrained dynamic jumps to implement subroutines.*
The return address -`RTN_SQUARE` - and the destination address - `SQUARE` - are pushed on the stack as constants and remain unchanged as they move on the stack, such that only those constants are passed to each `JUMP`. They are effectively static. We can track the motion of constants on the `data stack` at validation time, so *we do not need unconstrained dynamic jumps to implement subroutines.*

Finally, the static relative jumps of [EIP-4200](./eip-4200) and the simple subroutines of [EIP-2315](./eip-2315) provide static jumps directly.

Expand Down Expand Up @@ -100,7 +102,7 @@ The *execution* of each instruction is defined in the [Yellow Paper](https://eth

*In practice*, we must be able to validate *code* in linear time to avoid denial of service attacks. And we must support dynamically-priced instructions, loops, and recursion, which can use arbitrary amounts of gas and stack.

Thus our validation cannot consider concrete computations -- it only performs a limited symbolic execution of the _code_. This means we will reject programs if we detect any invalid execution paths, even if those paths are not reachable at runtime. And we will count as valid programs that do not in fact produce correct results.
Thus our validation cannot consider concrete computations -- it only performs a limited symbolic execution of the _code_. This means we will reject programs if we detect any invalid execution paths, even if those paths are not reachable at runtime. And we will count as valid programs that may not always produce correct results.

We can detect only _non-recursive_ stack overflows at *validation time*, so we must check for the first two states at _runtime_:
* `out of gas` and
Expand Down Expand Up @@ -132,7 +134,7 @@ We define a `JUMP` or `JUMPI` instruction to be *static* if its `jumpsrc` argume

The `RJUMP`, `RJUMPI` and `RJUMPSUB`instructions take their destination as an immediate argument, so they are *static*.

The Yellow Paper has the `stack pointer` (`SP`) pointing just past the top item on the `data stack`. We define the `available items` as the number of stack items between the current `SP` and the `SP` on entry to the most recent basic block.
The Yellow Paper has the `stack pointer` `SP` pointing just past the top item on the `data stack`. We define the `available items` as the number of stack items between the current `SP` and the `SP` on entry to the most recent basic block.

Taken together, these rules allow for code to be validated by traversing the control-flow graph, in time and space linear in the size of the code, following each edge only once.

Expand All @@ -144,11 +146,7 @@ Bounding the stack pointers catches all `data stack` and non-recursive`return st

Requiring consistently `available items` on the `data stack` prevents stack underflow. It can also catch such errors as misaligned stacks due to irreducible control flows and calls to subroutines with the wrong number of arguments.

And relative rather than absolute jump destinations are consistent with the other `RJUMP` instructions, so that code remains position-independent.

_Note: The definition of *static* here is the bare minimum needed to implement subroutines. Deeper analyses could be proposed that would validate a larger and probably more useful set of jumps, at the cost of more expensive (but still linear) validation._

_Note: Requiring the valid destinations of dynamic jumps to be enumerated at every jump instruction allows for tractable bytecode validation: a jump vector takes up space proportional to the number of destinations, so attempting to attack the validation algorithm with large numbers of jumps will proportionally reduce the available space for those jumps._
_Note: The definition of 'static' here is the bare minimum needed to implement subroutines. Deeper analyses could be proposed that would validate a larger and probably more useful set of jumps, at the cost of more expensive (but still linear) validation._

## Backwards Compatibility

Expand All @@ -162,26 +160,26 @@ This algorithm performs a symbolic execution of the program that recursively tra

It runs in time equal to `O(vertices + edges)` in the program's control-flow graph, where edges represent control flow and the vertices represent _basic blocks_ -- thus the algorithm takes time proportional to the size of the _code_.

_Note: Because valid code has a control-flow graph that can be traversed in linear time there are other static analyses and code transformations that might otherwise require quadratic time can also be written to run in linear time, including those which must traverse or construct the control-flow graph._
_Note: Because valid code has a control-flow graph that can be traversed in linear time some other static analyses and code transformations that might otherwise require quadratic time can also be written to run in near-linear time._

### Validation Function

For simplicity's sake we assume that _jumpdest analysis_ has been done and that we have a few helper functions.
* `is_valid_instruction(pc)` returns true if `pc` points at valid instruction
For simplicity's sake we assume that _jumpdest analysis_ has been done and that we have some helper functions.
* `is_valid_instruction(pc)` returns true if `pc` points at a valid instruction
* `is_valid_jumpdest(dest)` returns true if `dest` is a valid jumpdest
* `is_immediate_data(pc)` returns true if `pc` points at immediate data
* `immediate_data(pc)` returns the immediate data for an instruction.
* `advance_pc()` advances the pc, skipping any immediate data.
* `removed_items(pc)`returns the number of items removed from the `data_stack` by an instruction.
* `added_items(pc)` returns the number of items added to the `data_stack` by an instruction.

* `immediate_data(pc)` returns the immediate data for the instruction at `pc`.
* `advance_pc(pc)` returns next `pc`, skipping any immediate data.
* `remove_items(pc)`returns the new `sp` after items are removed from the `data_stack` by the instruction at `pc`.
* `add_items(pc)`returns the new SP after items are added to the `data_stack` by the instruction at `pc`.
```
var code [code_len]byte
var avail_items [code_len]int
var return_stack [1024]int = { -1 }
var data_stack [1024]uint256 = { INVALID }
var data_stack [1024]uint256 = { -1 }
// return the maximum stack used or else the PC and an error
func validate_path(pc := 0, sp := 0, bp := 0, rp := 0) int, error {
func validate(pc := 0, sp := 0, bp := 0, rp := 0) int, error {
used_items := 0
for pc < code_len {
if !is_valid_instruction(pc) {
Expand Down Expand Up @@ -214,15 +212,19 @@ func validate_path(pc := 0, sp := 0, bp := 0, rp := 0) int, error {
return used_items, nil
case SELFDESTRUCT:
return used_items, nil
case REVERT:
return used_items, nil
case INVALID:
return pc, invalid_opcode
// track constants on stack
// track constants pushed on data stack
case PUSH1 <= code[pc] && code[pc] <= PUSH32 {
sp++
if (sp > 1023) {
return pc, stack_overflow
}
data_stack[sp] = immediate_data(pc)
advance_pc()
pc = advance_pc(pc)
continue
case JUMP:
Expand All @@ -237,7 +239,7 @@ func validate_path(pc := 0, sp := 0, bp := 0, rp := 0) int, error {
}
pc = jumpdest
continue
case JUMPI:
// will enter basic block at destination
Expand All @@ -257,14 +259,14 @@ func validate_path(pc := 0, sp := 0, bp := 0, rp := 0) int, error {
if is_immediate_data(jumpdest) {
return pc, invalid_destination
}
left_used, err = validate_path(jumpdest, sp, bp, rp)
left_used, err = validate(jumpdest, sp, bp, rp)
if err {
return pc, err
}
// recurse to validate false side of conditional
pc = advance_pc(pc)
right_used, err = validate_path(pc, sp, bp, rp)
right_used, err = validate(pc, sp, bp, rp)
if err {
return pc, err
}
Expand Down Expand Up @@ -300,21 +302,21 @@ func validate_path(pc := 0, sp := 0, bp := 0, rp := 0) int, error {
if is_immediate_data(jumpdest) {
return pc, invalid_destination
}
left_used, err = validate_path(jumpdest, sp, bp, rp)
left_used, err = validate(jumpdest, sp, bp, rp)
if err {
return pc, err
}
// recurse to validate false side of conditional
pc = advance_pc(pc)
right_used, err = validate_path(pc, sp, bp, rp)
right_used, err = validate(pc, sp, bp, rp)
if err {
return pc, err
}
// both sides valid, check stack and return used_items
used_items += max(left_used, right_used)
if (sp += used_items > 1023) {
if (sp += used_items > `1023`) {
return pc, stack_overflow
}
return used_items, nil
Expand Down Expand Up @@ -373,4 +375,3 @@ This EIP is intended to ensure an essential level of safety for EVM code deploye

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

0 comments on commit fb520e2

Please sign in to comment.