-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Side-effect-free Yul interpreter #15464
base: develop
Are you sure you want to change the base?
Side-effect-free Yul interpreter #15464
Conversation
Thank you for your contribution to the Solidity compiler! A team member will follow up shortly. If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother. If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! I won't manage to do a proper review until next week, but for now here's some initial feedback from a quick pass over it.
static std::set<std::string> const NON_INSTRUCTION_BUILTIN_NAME = { | ||
"datasize", | ||
"dataoffset", | ||
"datacopy", | ||
"memoryguard", | ||
"loadimmutable", | ||
"setimmutable", | ||
"linkersymbol" | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of hard-coding those, you should just check that the instruction
field in BuiltinFunctionForEVM
is empty.
m_config.maxTraceSize = m_reader.sizetSetting("maxTraceSize", 128); | ||
m_config.maxExprNesting = m_reader.sizetSetting("maxExprNesting", 64); | ||
m_config.maxSteps = m_reader.sizetSetting("maxSteps", 512); | ||
m_config.maxRecursionDepth = m_reader.sizetSetting("maxRecursionDepth", 64); | ||
|
||
m_printHex = m_reader.boolSetting("printHex", false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For testing, the existing
YulInterpreterTest
was not used. This is because this pure interpreter has no memory/storage access, it can not use the same trace as the current interpreter from 'test/libyul/tools'.
That's actually easy to solve. Just add a setting you can set to mark a test as one that requires access to memory or storage. Then add it to those tests that do not pass without it. As you can see in the snippet above, adding settings is pretty easy.
For this we could call the setting sideEffectFree
and it true
by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand how to mark a test that requires memory access. However, I’d still insist to keep the two test suites separate. While they do share the parsing part, the settings and trace printing are quite different. I’m concerned that merging the two might make it harder to manage the tests.
I also believe the two interpreter implementations might grow in different directions, so separating the test suites makes more sense to me.
// Increment step for each loop iteration for loops with | ||
// an empty body and post blocks to prevent a deadlock. | ||
if (_forLoop.body.statements.size() == 0 && _forLoop.post.statements.size() == 0) | ||
if (auto terminated = incrementStatementStep()) return *terminated; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some general notes on style to get that out of the way:
- We always put
return
on a new line. - Please also avoid abbreviations in names. Things like
fun
,cnt
,x
,g
,vec
,res
are hard to read and often ambiguous. TBH, we have a lot of bad names already since the code is old and it's also hard to enforce, so some of that might have been just copied, but when writing new code, please try to avoid them. - Constructor spacing:
ExecutionOk{ControlFlowState::Default};
- 4-space indents, including in
.yul
files. - You don't have to include the empty string as message in asserts. We made it optional at some point, but did not strip it from everywhere. When writing new code, you can omit the message.
- I'd avoid making files hidden (with a leading dot in the name).
- In Python code we use the same style for splitting long calls as in C++. I.e.
gen_test( 'sgt', param_cnt=2, calc=lambda p: u2s(p[0]) > u2s(p[1]) )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for commenting on the styling. For this part, I have a question about the formatter.
For C++ code style, I see that there is .clang-format
configuration included. However, using clang-format changes the code drastically. So I deliberately turned off the formatter. Should I use clang format for the new code, or should I try to adjust the code manually? And also the same question applied to Python code.
The is the only questions. I will address the other comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the code style, is it OK to use macro for the terminated case? I think that check is repetitive, and because it also return
immediately, I can only think of macro to clean up the code.
case Instruction::SWAP16: | ||
{ | ||
yulAssert(false, ""); | ||
return EvaluationOk(0); | ||
} | ||
} | ||
|
||
yulAssert(false, "Unknown instruction with opcode " + std::to_string(static_cast<uint8_t>(_instruction))); | ||
return EvaluationOk(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case Instruction::SWAP16: | |
{ | |
yulAssert(false, ""); | |
return EvaluationOk(0); | |
} | |
} | |
yulAssert(false, "Unknown instruction with opcode " + std::to_string(static_cast<uint8_t>(_instruction))); | |
return EvaluationOk(0); | |
case Instruction::SWAP16: | |
yulAssert(false, "Instruction not allowed in strict assembly."); | |
} | |
util::unreachable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for pointing out the util::unreachable
. I actually don't know about it.
However, I think the assert with error message should be used. Currently all the instruction are checked with a switch
statement, which is not guaranteed to be exhaustive check at compile time. I am not sure if there is a way to ensure its correctness at compile time. For the runtime check, I think it should fail loudly. When failure happens, there must be a new instruction. It should then be added to this function.
std::vector<Expression> const& /* _arguments */, // This was required to execute some builtin. | ||
// But all of them are impure. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are they though? I'd expect dataoffset()
, datasize()
and linkersymbol()
at least to be pure. We won't be able to evaluate them without information that's only available at bytecode generation time, but technically they end up being just constants :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the point. I think my comment here is misleading too, so I will change that.
I think for the implementation, we can try returning special values for these cases. But that sounds more complicated, and it is go beyond the point of code interpretation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd drop the tools/
dir and put it simply under interpreter/
. The old interpreter is under test/tools/
because it's a part of yulrun
, which is a testing tool (and compiles into a separate executable). The new interpreter becomes a part of libyul
instead, which goes into solc
.
using solidity::util::h160; | ||
using solidity::util::h256; | ||
|
||
using u512 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<512, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not place this next to the other types like this we already have, like u256
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not notice that type was there. I was too focused on modifying the original interpreter 🙏 .
I will move it.
This interpreter implementation is following this specification from #15435 , and is intended to use in #15358 .
This PR has followed spec for the implementation.
For testing, the existing
YulInterpreterTest
was not used. This is because this pure interpreter has no memory/storage access, it can not use the same trace as the current interpreter from 'test/libyul/tools'. But to keep it minimal, theYulInterpreterTest
test suit was copied and applied minimal changes. The new test suit now uses the function call trace and the outter most variable values for snapshot comparison. Most of the existing test from theYulInterpreterTest
was ported to the new test suit. New tests were also added to test all return status, as well as to test all evm instructions.