Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Side-effect-free Yul interpreter #15464

Open
wants to merge 78 commits into
base: develop
Choose a base branch
from

Conversation

quangloc99
Copy link

This interpreter implementation is following this specification from #15435 , and is intended to use in #15358 .

This PR has followed spec for the implementation.

For testing, the existing YulInterpreterTest was not used. This is because this pure interpreter has no memory/storage access, it can not use the same trace as the current interpreter from 'test/libyul/tools'. But to keep it minimal, the YulInterpreterTest test suit was copied and applied minimal changes. The new test suit now uses the function call trace and the outter most variable values for snapshot comparison. Most of the existing test from the YulInterpreterTest was ported to the new test suit. New tests were also added to test all return status, as well as to test all evm instructions.

Copy link

Thank you for your contribution to the Solidity compiler! A team member will follow up shortly.

If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother.

If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix.

Copy link
Member

@cameel cameel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I won't manage to do a proper review until next week, but for now here's some initial feedback from a quick pass over it.

Comment on lines +297 to +305
static std::set<std::string> const NON_INSTRUCTION_BUILTIN_NAME = {
"datasize",
"dataoffset",
"datacopy",
"memoryguard",
"loadimmutable",
"setimmutable",
"linkersymbol"
};
Copy link
Member

@cameel cameel Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hard-coding those, you should just check that the instruction field in BuiltinFunctionForEVM is empty.

Comment on lines +58 to +63
m_config.maxTraceSize = m_reader.sizetSetting("maxTraceSize", 128);
m_config.maxExprNesting = m_reader.sizetSetting("maxExprNesting", 64);
m_config.maxSteps = m_reader.sizetSetting("maxSteps", 512);
m_config.maxRecursionDepth = m_reader.sizetSetting("maxRecursionDepth", 64);

m_printHex = m_reader.boolSetting("printHex", false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, the existing YulInterpreterTest was not used. This is because this pure interpreter has no memory/storage access, it can not use the same trace as the current interpreter from 'test/libyul/tools'.

That's actually easy to solve. Just add a setting you can set to mark a test as one that requires access to memory or storage. Then add it to those tests that do not pass without it. As you can see in the snippet above, adding settings is pretty easy.

For this we could call the setting sideEffectFree and it true by default.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand how to mark a test that requires memory access. However, I’d still insist to keep the two test suites separate. While they do share the parsing part, the settings and trace printing are quite different. I’m concerned that merging the two might make it harder to manage the tests.

I also believe the two interpreter implementations might grow in different directions, so separating the test suites makes more sense to me.

// Increment step for each loop iteration for loops with
// an empty body and post blocks to prevent a deadlock.
if (_forLoop.body.statements.size() == 0 && _forLoop.post.statements.size() == 0)
if (auto terminated = incrementStatementStep()) return *terminated;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some general notes on style to get that out of the way:

  • We always put return on a new line.
  • Please also avoid abbreviations in names. Things like fun, cnt, x, g, vec, res are hard to read and often ambiguous. TBH, we have a lot of bad names already since the code is old and it's also hard to enforce, so some of that might have been just copied, but when writing new code, please try to avoid them.
  • Constructor spacing: ExecutionOk{ControlFlowState::Default};
  • 4-space indents, including in .yul files.
  • You don't have to include the empty string as message in asserts. We made it optional at some point, but did not strip it from everywhere. When writing new code, you can omit the message.
  • I'd avoid making files hidden (with a leading dot in the name).
  • In Python code we use the same style for splitting long calls as in C++. I.e.
        gen_test(
            'sgt',
            param_cnt=2,
            calc=lambda p: u2s(p[0]) > u2s(p[1])
        )

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for commenting on the styling. For this part, I have a question about the formatter.

For C++ code style, I see that there is .clang-format configuration included. However, using clang-format changes the code drastically. So I deliberately turned off the formatter. Should I use clang format for the new code, or should I try to adjust the code manually? And also the same question applied to Python code.

The is the only questions. I will address the other comments.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides the code style, is it OK to use macro for the terminated case? I think that check is repetitive, and because it also return immediately, I can only think of macro to clean up the code.

Comment on lines +271 to +279
case Instruction::SWAP16:
{
yulAssert(false, "");
return EvaluationOk(0);
}
}

yulAssert(false, "Unknown instruction with opcode " + std::to_string(static_cast<uint8_t>(_instruction)));
return EvaluationOk(0);
Copy link
Member

@cameel cameel Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
case Instruction::SWAP16:
{
yulAssert(false, "");
return EvaluationOk(0);
}
}
yulAssert(false, "Unknown instruction with opcode " + std::to_string(static_cast<uint8_t>(_instruction)));
return EvaluationOk(0);
case Instruction::SWAP16:
yulAssert(false, "Instruction not allowed in strict assembly.");
}
util::unreachable();

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for pointing out the util::unreachable. I actually don't know about it.

However, I think the assert with error message should be used. Currently all the instruction are checked with a switch statement, which is not guaranteed to be exhaustive check at compile time. I am not sure if there is a way to ensure its correctness at compile time. For the runtime check, I think it should fail loudly. When failure happens, there must be a new instruction. It should then be added to this function.

Comment on lines +284 to +285
std::vector<Expression> const& /* _arguments */, // This was required to execute some builtin.
// But all of them are impure.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are they though? I'd expect dataoffset(), datasize() and linkersymbol() at least to be pure. We won't be able to evaluate them without information that's only available at bytecode generation time, but technically they end up being just constants :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the point. I think my comment here is misleading too, so I will change that.

I think for the implementation, we can try returning special values for these cases. But that sounds more complicated, and it is go beyond the point of code interpretation.

Copy link
Member

@cameel cameel Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd drop the tools/ dir and put it simply under interpreter/. The old interpreter is under test/tools/ because it's a part of yulrun, which is a testing tool (and compiles into a separate executable). The new interpreter becomes a part of libyul instead, which goes into solc.

using solidity::util::h160;
using solidity::util::h256;

using u512 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<512, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not place this next to the other types like this we already have, like u256?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not notice that type was there. I was too focused on modifying the original interpreter 🙏 .

I will move it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants