Side-effect-free Yul interpreter #15464

quangloc99 · 2024-09-29T10:57:03Z

This interpreter implementation is following this specification from #15435 , and is intended to use in #15358 .

This PR has followed spec for the implementation.

For testing, the existing YulInterpreterTest was not used. This is because this pure interpreter has no memory/storage access, it can not use the same trace as the current interpreter from 'test/libyul/tools'. But to keep it minimal, the YulInterpreterTest test suit was copied and applied minimal changes. The new test suit now uses the function call trace and the outter most variable values for snapshot comparison. Most of the existing test from the YulInterpreterTest was ported to the new test suit. New tests were also added to test all return status, as well as to test all evm instructions.

github-actions · 2024-09-29T10:57:15Z

Thank you for your contribution to the Solidity compiler! A team member will follow up shortly.

If you haven't read our contributing guidelines and our review checklist before, please do it now, this makes the reviewing process and accepting your contribution smoother.

If you have any questions or need our help, feel free to post them in the PR or talk to us directly on the #solidity-dev channel on Matrix.

cameel

Thanks for the PR! I won't manage to do a proper review until next week, but for now here's some initial feedback from a quick pass over it.

cameel · 2024-10-01T23:24:06Z

libyul/tools/interpreter/PureEVMInstructionInterpreter.cpp

+	static std::set<std::string> const NON_INSTRUCTION_BUILTIN_NAME = {
+		"datasize",
+		"dataoffset",
+		"datacopy",
+		"memoryguard",
+		"loadimmutable",
+		"setimmutable",
+		"linkersymbol"
+	};


Instead of hard-coding those, you should just check that the instruction field in BuiltinFunctionForEVM is empty.

cameel · 2024-10-01T23:27:49Z

test/libyul/YulPureInterpreterTest.cpp

+	m_config.maxTraceSize = m_reader.sizetSetting("maxTraceSize", 128);
+	m_config.maxExprNesting = m_reader.sizetSetting("maxExprNesting", 64);
+	m_config.maxSteps = m_reader.sizetSetting("maxSteps", 512);
+	m_config.maxRecursionDepth = m_reader.sizetSetting("maxRecursionDepth", 64);
+
+	m_printHex = m_reader.boolSetting("printHex", false);


For testing, the existing YulInterpreterTest was not used. This is because this pure interpreter has no memory/storage access, it can not use the same trace as the current interpreter from 'test/libyul/tools'.

That's actually easy to solve. Just add a setting you can set to mark a test as one that requires access to memory or storage. Then add it to those tests that do not pass without it. As you can see in the snippet above, adding settings is pretty easy.

For this we could call the setting sideEffectFree and it true by default.

I understand how to mark a test that requires memory access. However, I’d still insist to keep the two test suites separate. While they do share the parsing part, the settings and trace printing are quite different. I’m concerned that merging the two might make it harder to manage the tests.

I also believe the two interpreter implementations might grow in different directions, so separating the test suites makes more sense to me.

cameel · 2024-10-01T23:40:06Z

libyul/tools/interpreter/PureInterpreter.cpp

+		// Increment step for each loop iteration for loops with
+		// an empty body and post blocks to prevent a deadlock.
+		if (_forLoop.body.statements.size() == 0 && _forLoop.post.statements.size() == 0)
+			if (auto terminated = incrementStatementStep()) return *terminated;


Some general notes on style to get that out of the way:

We always put return on a new line.

Please also avoid abbreviations in names. Things like fun, cnt, x, g, vec, res are hard to read and often ambiguous. TBH, we have a lot of bad names already since the code is old and it's also hard to enforce, so some of that might have been just copied, but when writing new code, please try to avoid them.

Constructor spacing: ExecutionOk{ControlFlowState::Default};

4-space indents, including in .yul files.

You don't have to include the empty string as message in asserts. We made it optional at some point, but did not strip it from everywhere. When writing new code, you can omit the message.

I'd avoid making files hidden (with a leading dot in the name).

In Python code we use the same style for splitting long calls as in C++. I.e.
gen_test( 'sgt', param_cnt=2, calc=lambda p: u2s(p[0]) > u2s(p[1]) )

Thank you for commenting on the styling. For this part, I have a question about the formatter.

For C++ code style, I see that there is .clang-format configuration included. However, using clang-format changes the code drastically. So I deliberately turned off the formatter. Should I use clang format for the new code, or should I try to adjust the code manually? And also the same question applied to Python code.

The is the only questions. I will address the other comments.

Besides the code style, is it OK to use macro for the terminated case? I think that check is repetitive, and because it also return immediately, I can only think of macro to clean up the code.

cameel · 2024-10-01T23:46:37Z

libyul/tools/interpreter/PureEVMInstructionInterpreter.cpp

+	case Instruction::SWAP16:
+	{
+		yulAssert(false, "");
+		return EvaluationOk(0);
+	}
+	}
+
+	yulAssert(false, "Unknown instruction with opcode " + std::to_string(static_cast<uint8_t>(_instruction)));
+	return EvaluationOk(0);


Suggested change

case Instruction::SWAP16:

{

yulAssert(false, "");

return EvaluationOk(0);

}

}

yulAssert(false, "Unknown instruction with opcode " + std::to_string(static_cast<uint8_t>(_instruction)));

return EvaluationOk(0);

case Instruction::SWAP16:

yulAssert(false, "Instruction not allowed in strict assembly.");

}

util::unreachable();

Thank you for pointing out the util::unreachable. I actually don't know about it.

However, I think the assert with error message should be used. Currently all the instruction are checked with a switch statement, which is not guaranteed to be exhaustive check at compile time. I am not sure if there is a way to ensure its correctness at compile time. For the runtime check, I think it should fail loudly. When failure happens, there must be a new instruction. It should then be added to this function.

cameel · 2024-10-01T23:51:32Z

libyul/tools/interpreter/PureEVMInstructionInterpreter.cpp

+	std::vector<Expression> const& /* _arguments */,  // This was required to execute some builtin.
+													  // But all of them are impure.


Are they though? I'd expect dataoffset(), datasize() and linkersymbol() at least to be pure. We won't be able to evaluate them without information that's only available at bytecode generation time, but technically they end up being just constants :)

I understand the point. I think my comment here is misleading too, so I will change that.

I think for the implementation, we can try returning special values for these cases. But that sounds more complicated, and it is go beyond the point of code interpretation.

cameel · 2024-10-01T23:53:39Z

libyul/tools/interpreter/PureEVMInstructionInterpreter.cpp

I'd drop the tools/ dir and put it simply under interpreter/. The old interpreter is under test/tools/ because it's a part of yulrun, which is a testing tool (and compiles into a separate executable). The new interpreter becomes a part of libyul instead, which goes into solc.

cameel · 2024-10-01T23:57:28Z

libyul/tools/interpreter/PureEVMInstructionInterpreter.cpp

+using solidity::util::h160;
+using solidity::util::h256;
+
+using u512 = boost::multiprecision::number<boost::multiprecision::cpp_int_backend<512, 256, boost::multiprecision::unsigned_magnitude, boost::multiprecision::unchecked, void>>;


Why not place this next to the other types like this we already have, like u256?

I did not notice that type was there. I was too focused on modifying the original interpreter 🙏 .

I will move it.

quangloc99 added 30 commits September 18, 2024 22:12

Copy interpreter tool from test as basis

36cbf76

Add new Result type

b69895e

Change interface of the interpreter to have custom return type

821a5e0

Implement visit function with new return type

8c23fcc

Remove runExternalCall

bbc1211

Minor comment edit

cfd22d0

Implement visit(Expression) for ExpressionEvaluator

30e799c

Force incrementStep logic to be done at visit function

e251707

Merge ExpressionEvaluator into Interpreter

d57b0bc

Reorder methods

2b1cf63

Reset expressionNestingLevel after each incrementStatementStep

ae58a7c

Remove redundant makeInterpreterNew function

109caa6

Remove unused disableMemoryTrace

b75de8d

Remove state with side effect from InterpreterState

0880bda

Move Interpreter Config to a separate struct

17dc623

Remove redundant numInstance

ec032d9

Add recursion depth check

6751981

Remove functions with side-effect from EVMInstructionInterpreter

6d9ac08

Declare ImpureBuiltinEncountered and EVMInstructionInterpretedResult

168047b

Return ImpureBuiltinEncountered for all builtin with side-effect

220ab2c

Use EVMInstructionInterpretedResult in Interpreter

22ccc63

Bring back empty body and post check when visiting ForLoop

7662ebf

Remove unused function in EVMInstructionInterpreter

2f1f412

Move result types into a separate file

f505219

Add Pure* prefix

1882cc4

Move PureInterpreterState into a separate file

8b61c79

Add Results and PureInterpreterState into CMakeLists

f745b08

Separate function definition and variable declaration in Scope

f37d2f8

Make Scope parent constant pointer

3be2783

Move Scope to a separate file

7775bbe

quangloc99 added 14 commits September 29, 2024 15:24

Filter and add existing interpreter tests to pure interpreterTest

f7fdc92

Add tests for all impure instructions

c3b2ba5

Add test for all pure instructions

3f82ab5

Change expr_nesting_depth_not_exceeded to run expression multiple time

84b6198

Add test for StepLimitReached

46f8979

Add test for impure non instructions

13d8001

Add check for verbatim

c1cd467

Add verbatim test

0f1e1d5

Remove redundant public:

9fcd79f

Optimize map usage

ec0dd94

Remove redundant terminated check

5aa7dc6

Remove redundant terminated result

a5e6d92

Add test for stop instruction

02ea3a7

Merge branch 'develop' into yul-side-effect-free-interpreter

24c2fe4

github-actions bot added the external contribution ⭐ label Sep 29, 2024

quangloc99 added 9 commits September 29, 2024 18:01

Update tests to remove trailing space

4b4e463

Fix styling for addTrace

c32754f

Fix spelling

7b2206b

Fix linting error for generate-test.py

5289dd3

Update EVMVersion in tests

30538f2

Add encoding when open file when generate test

a490149

Fix spelling Expection -> Expectation

a93794a

Remove virtual modifier from PureInterpreter functions

7640a22

Remove unused m_state

53a39bc

cameel reviewed Oct 1, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Side-effect-free Yul interpreter #15464

Side-effect-free Yul interpreter #15464

quangloc99 commented Sep 29, 2024

github-actions bot commented Sep 29, 2024

cameel left a comment

cameel Oct 1, 2024 •

edited

Loading

cameel Oct 1, 2024

quangloc99 Oct 2, 2024

cameel Oct 1, 2024

quangloc99 Oct 2, 2024

quangloc99 Oct 2, 2024

cameel Oct 1, 2024 •

edited

Loading

quangloc99 Oct 2, 2024

cameel Oct 1, 2024

quangloc99 Oct 2, 2024

cameel Oct 1, 2024 •

edited

Loading

cameel Oct 1, 2024

quangloc99 Oct 2, 2024

		std::vector<Expression> const& /* _arguments */, // This was required to execute some builtin.
		// But all of them are impure.

Side-effect-free Yul interpreter #15464

Are you sure you want to change the base?

Side-effect-free Yul interpreter #15464

Conversation

quangloc99 commented Sep 29, 2024

github-actions bot commented Sep 29, 2024

cameel left a comment

Choose a reason for hiding this comment

cameel Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cameel Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cameel Oct 1, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cameel Oct 1, 2024 •

edited

Loading

cameel Oct 1, 2024 •

edited

Loading

cameel Oct 1, 2024 •

edited

Loading