Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VM, Common: implement eip-3540 (EOF1) #1719

Merged
merged 34 commits into from
Mar 15, 2022
Merged

VM, Common: implement eip-3540 (EOF1) #1719

merged 34 commits into from
Mar 15, 2022

Conversation

acolytec3
Copy link
Contributor

Fixes #1704

This adds support for EIP-3540. This EIP is not part of any hardfork and can only be activated in concert with 3541. At present, this implements EVM Object Format V1(EOF1) checks as a single utility function in src/evm/opcodes/util.ts. There was some significant discussion around keeping all of the specific parameters governing the EOF1 checks as parameters in the EIP specification in Common but after working through it, it feels like it makes more sense (at least based on the current status of the EIP) that we just incorporate all of those parameters as hard coded values in a utility function that validates the code passed in based on the EOF1 rules (rather than try to generalize to a generic EOF validation function). If/when future EOF versions come up for review/implementation, we can revisit this.

There are a couple of other changes made to the VM internals, notably:

  • The interpreter's runState.code is now set based on whether or not the bytecode passed with the message is EOF1 compliant or not (code section of EOF1 container if EOF1 bytecode or entire message.code for legacy code)
  • This allows for a change in the PUSH opcode handler that checks to see if it goes out of bounds and reverts if so (this doesn't seem to break API tests and gives a guard in EOF1 compliant bytecode to going out of bounds)
  • LIkewise Jump Destination analysis is now limited to the runState.code which will be set as the code section of the EOF1 container if the bytecode is EOF1 compliant.

The ethereum/tests PR containing tests for this EIP is still WIP as well so we should probably wait to merge that PR gets merged or else feels stable.

@acolytec3
Copy link
Contributor Author

To run the ethereum/tests in the PR, clone them locally and then use the below command to activate the EIP and run them:
npm run tester -- --blockchain --fork=London+3540 --customTestsPath=path/to/your/local/directory

@codecov
Copy link

codecov bot commented Feb 16, 2022

Codecov Report

Merging #1719 (8f2d17b) into master (b1bf67c) will increase coverage by 0.04%.
The diff coverage is 86.07%.

Impacted file tree graph

Flag Coverage Δ
block 85.57% <ø> (ø)
blockchain 83.82% <ø> (ø)
client 75.50% <ø> (ø)
common 93.90% <ø> (ø)
devp2p 82.43% <ø> (+0.20%) ⬆️
ethash 90.76% <ø> (ø)
trie 80.02% <ø> (ø)
tx 88.20% <ø> (ø)
util 92.62% <ø> (ø)
vm 81.25% <86.07%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

sectionSizes.code = (code[4] << 8) | code[5]
} else if (
code.length > 10 &&
code[3] === secCode &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EIP does not specify an order of the sections, so code[3] could also be secData. I would suggest a loop over these sections, this is also forward-compatible if new sections get introduced.

Copy link
Contributor Author

@acolytec3 acolytec3 Feb 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The EIP does not specify an order of the sections, so code[3] could also be secData

I don't believe this is correct. The EIP validation rules (excerpted below) require an order. Also, I'm following the reference implementation here.
"
Exactly one code section MUST be present.
The code section MUST be the first section.
A single data section MAY follow the code section.
"

I can't link to the section directly but if you look under the validation rules, it details the specifics about EOFv1 rules so let me know if I've misinterpreted something.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, sorry, you are right, it must be the first section.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out by @acolytec3 the first section should be the code section so in version 1 of EOF then the second section can only be the data section.

sectionSizes.code = (code[4] << 8) | code[5]
sectionSizes.data = (code[7] << 8) | code[8]
}
if (code.length !== codeSize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no check for the data size, what if this underflows or overflows?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, the naming here is imprecise. codeSize is the computed length of the entire bytestring based on the size of the header + the size of the code section and the data section. I'll clean up the naming to make things a little more clear.

@jochem-brouwer
Copy link
Member

@axic I have raised some questions about this EIP on https://ethereum-magicians.org/t/evm-object-format-eof/5727/28 😄

result = { ...result, ...INVALID_BYTECODE_RESULT(message.gasLimit) }
}
// EIP-3540 EOF1 checks
if (!eof1CodeAnalysis(result.returnValue)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I like this function signature, so this code path will run if the eof1CodeAnalysis returns undefined, but I'm also not super sure what would make this more clean... 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could throw instead though I'd have to rework the helper. Seems like that might be slower than just checking for undefined. Other alternatives (like constructing an execResult with the right error the helper) didn't quite feel right to me either.

@acolytec3
Copy link
Contributor Author

I would suggest a loop over these sections, this is also forward-compatible if new sections get introduced.

I want to discuss this question more generally than in the specific code comment. I'm open to something like this (and my original implementation was more along this line) but I decided after going through it that it made more sense (at least to me) that we implement separate functions for each EOF version to check code validity. That way, we don't have to be parsing common parameters each time we do this code check. Given we don't know what the future EOFs may look like, there may be additional handling that is not currently envisaged that we will need to incorporate and it feels like unnecessary work at the present time to try and predict what form that will take and generalize our code in some other way. My thought is if/when a new EOF v2 is added to the protocol, we can just look at that version byte in the header and call the appropriate code validation helper.

@@ -261,3 +261,35 @@ export function updateSstoreGas(
return new BN(common.param('gasPrices', 'sstoreSet'))
}
}

export const eof1CodeAnalysis = (code: Buffer) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think code as input here is misleading. I would suggest changing it to container this is the term the EIP uses as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will update again. Had set it to byteCode but agree container makes more sense.

@jochem-brouwer
Copy link
Member

Cool, great start, will use this as a general comment as I think certain parts of the EIP are not present.

  • section_size MUST NOT be 0 -> not present?
  • Version check is not implemented
  • Program going out-of-bounds (into data section or at end of code section) not present
  • Not entirely sure if PC opcode now returns the right value (but I think the implementation with this new code field is the way to go on the runState)

@acolytec3
Copy link
Contributor Author

  • section_size MUST NOT be 0 -> not present?

Ah, thanks for the catch. Addressed in my latest commit.

  • Version check is not implemented

This is implemented here, I fixed the names so it would be more obvious

  • Program going out-of-bounds (into data section or at end of code section) not present

I guess I thought this would be taken care of by the PC counter checks in the interpreter.run method but will think a little more on this. I'll have to think on this some more.

  • Not entirely sure if PC opcode now returns the right value (but I think the implementation with this new code field is the way to go on the runState)

@ryanio ryanio changed the title Implement eip3540 vm: implement eip-3540 (EOF1) Feb 16, 2022
@jochem-brouwer
Copy link
Member

This PR was merged into this one: #1743

return false
}
}
}
const terminatingOpcodes = new Set([0x00, 0xd3, 0xfd, 0xfe, 0xff])
const terminatingOpcodes = new Set([0x00, 0xf3, 0xfd, 0xfe, 0xff])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wow, missed this 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, me too. But writing those new tests you suggested helped me catch it 🎆

@acolytec3
Copy link
Contributor Author

Hmm, it seems that I somehow mixed up develop and master in these branches. 🤔 Will try and clean it up with an interactive rebase.

@jochem-brouwer
Copy link
Member

There are some steps left to do for this PR which are essentially listed in this #1719 (comment) comment. I can help creating that CREATE/CREATE2 bytecode. We should also update the case where the "intrinsic gas" is consumed (data fee + upfront fee) in case we try to de a create transaction where the EOF format is invalid. I have pinged an author to clarify/update the EIP regarding what "abort" means in terms of gas.

(Also seems that there is a linting problem)

Side note: once the EIP-specific tests get merged in Ethereum/tests we should also add those forks to CI in case we run "test all hardforks".

@jochem-brouwer
Copy link
Member

Have added invalid eof initcode tests for tx/create/create2.

jochem-brouwer
jochem-brouwer previously approved these changes Mar 12, 2022
Copy link
Member

@jochem-brouwer jochem-brouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I am concerned this is ready to merge, let's wait for one more review.

@holgerd77
Copy link
Member

Ah, ok, interesting (and: cool!). 👍 😄 🎉

I would then have a strong tendency to integrate this in the current round of releases, seems to be a very good occasion to directly get this out.

Will do a final review on this and merge if everything is ok and add to #1774.

I would then try to do the releases itself tomorrow, Tuesday (CET) morning - also depending on the outcome of #1781 and a final "go!" from @g11tech and @ryanio, and then do a first Twitter thread directly that day solely focusing on the Kiln aspects of the release round.

This would be followed up by a bugfix release round either at the end of the week or early next week, fixing some things from Kiln + adding the merge blocks and stuff and on that release occasion I would do another Twitter thread (more or less) solely focusing on all other release features (EIP-3860, Extended VM Genesis, EIP-3540,...).

So far on the current plan [TM] 😎.

}
return sectionSizes
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking on so many conditions in such a condensed form that it is extremely hard to review and generally follow. Can we have this dramatically [TM] more commented, so basically to a comment with a condition on more or less every line of code? Guess that would make things easier to grasp.

Also: one technique @ryanio often applies and I guess which would help here as well is to instead of checking for the positive condition (if (container[1] === magic && container[2] === version) {) and then nesting further down check for the negative (if (container[1] !== magic || container[2] !== version) {) and the directly do the return. This prevents further nesting and could be minimally applied on this example I made but I guess also on other code parts.

code.slice(0, 1).equals(Buffer.from('ef', 'hex'))
) {
if (code.slice(1, 2).equals(Buffer.from('00', 'hex'))) {
if (code.slice(2, 3).equals(Buffer.from('01', 'hex'))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here: negation and direct return.

@holgerd77
Copy link
Member

Oh, that's a really great update! Thanks a lot! 👍 😍

@holgerd77
Copy link
Member

I can't find/identify all parts of spec integrations in this huge EIP TBH, so I'll go with some basic questions for @acolytec3 just for re-assurance for a second review:

So first round would be: are you confident that you have "caught" all execution semantics changes from this list:

  1. JUMPDEST analysis is only run on the code.
  2. Execution starts at the first byte of the code, and PC is set to 0.
  3. If PC goes outside of the code section bounds, execution aborts with failure.
  4. PC returns the current position within the code.
  5. JUMP/JUMPI uses an absolute offset within the code.
  6. CODECOPY/CODESIZE/EXTCODECOPY/EXTCODESIZE/EXTCODEHASH keeps operating on the entire container.
  7. The input to CREATE/CREATE2 is still the entire container.

Or is there one where one should have another eye upon and do some verification?

if (
!eof1ValidOpcodes(
result.returnValue.slice(codeStart, codeStart + eof1CodeAnalysisResults.code)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from the EIP-3670 spec:

At contract creation time both initcode and code are iterated instruction-by-instruction

Might it be that you are doing validation only on the code and not the initcode? Or am I missing something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is checking the code returned by the create contract to verify its valid . The initcode is checked when it is run through the runInterpreter function call.

@acolytec3
Copy link
Contributor Author

Thanks for the feedback. I'll address these as best I can but happy to dig into things further(and @jochem-brouwer keep me honest on this).

  1. JUMPDEST analysis is only run on the code.
  2. Execution starts at the first byte of the code, and PC is set to 0.
  3. If PC goes outside of the code section bounds, execution aborts with failure.
  4. PC returns the current position within the code.
  5. JUMP/JUMPI uses an absolute offset within the code.

I believe these 5 are all implicitly addressed by [this line of code]((

)

If the code passed to the interpreter contains an EOF1 header, we define the runState.code to be interpreted as just the code section of the container so JUMPDEST analysis, code execution starting point, PC position, and JUMP/JUMPI offset are all conducted on this code only.

For JUMP/JUMPI offsets, I think you might have a point that we missed a step in checking here since check against runState.eei.getCodeSize() is looking at the entire code and not the code section. I think this can be fixed by just checking against runState.code.length. That said, this check feels redundant since it would throw on an invalid jump destination anyway the next line down. @jochem-brouwer - what do you think on this?

  1. CODECOPY/CODESIZE/EXTCODECOPY/EXTCODESIZE/EXTCODEHASH keeps operating on the entire container.

I looked at all 4 of these function handlers again and they are all checking against runState.eei.getCodeSize/runState.eei.getCode or equivalents so they are operating on the container. The EOF1 code section is found in runState.code so we should be good here.

  1. The input to CREATE/CREATE2 is still the entire container.

From looking at the opcode handlers, the inputs are entirely dependent on what is passed to them as here for CREATE. Whatever offset/length are popped from the stack determines what code is pulled in but once that code is passed in, it is treated as an entire container and analyzed for EOF1 validity in the same way as a basic create transaction. So, the long answer to your short question is "yes" as far as "the entire container" corresponds to whatever code is passed to _executeCreate as determined by the inputs to the CREATE/CREATE2 opcode handler.

I'd also like @jochem-brouwer to comment on this one. He wrote the tests that validate the CREATE/CREATE2 behavior.

Let me know if this addresses your questions.

@holgerd77
Copy link
Member

Let me know if this addresses your questions.

Totally, thanks a lot, explanations couldn't be better!

Ok, I guess I am confident enough here that I would do the following, since timing of this whole release round is really a bit challenging with all different requirements: I will merge this in now and - directly - release with the Kiln updates.

In case that we stumble upon some additional things post-release we can still add this to the very imminent bugfix releases (also necessary for some Kiln updates), to be released directly on Friday or early next week.

I guess that should catch all release requirements best, so we have the Kiln release out in time and also do not miss this release opportunity on the work here but still have the time for a short-termed correction if necessary (which should be very much ok given that we release as experimental anyhow + we are already somewhat confident on the implementation).

//cc @g11tech @ryanio @jochem-brouwer

Copy link
Member

@holgerd77 holgerd77 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@holgerd77 holgerd77 changed the title vm: implement eip-3540 (EOF1) VM, Common: implement eip-3540 (EOF1) Mar 15, 2022
@holgerd77 holgerd77 merged commit 010322d into master Mar 15, 2022
@holgerd77 holgerd77 deleted the implement-eip3540 branch March 15, 2022 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VM: Implement EVM Object Format (EOF) (EIP-3540)
4 participants