Make it obvious that we're running Yul optimizer twice via IR #14172

cameel · 2023-04-29T03:48:02Z

While working on #14149 I discovered that we're running Yul optimizer twice for the IR codegen: first in IRGenerator and then again in CompilerStack on that already optimized output.

Can anyone confirm my observations or am I missing some important detail? Maybe the second run is intentional after all?

Details

The second run looks superfluous to me and I expected it to have no effect, but this is not the case. It seems to have some minimal effects. E.g. it can eliminate some variables with constant values (see diff below). What's surprising is that this is apparently enough to make several external tests start failing with "Stack too deep" when that second run is removed. See CI run 29491 on the do-not-optimize-twice-via-ir branch, where I removed it. It makes bleeps and trident fail.

The second run might be understandable if we were doing some changes in between (this happens e.g. for ewasm, where we optimize -> translate -> optimize again). Weirdly, here that's not it. We get the same effect even if we put those two runs next to each other with nothing in between (as this PR does).

An interesting coincidence is that the difference should have shown up in the via IR equivalence test. These tests do fail on the do-not-optimize-twice-via-ir branch. They only work on develop because we use --optimize both for the solc command that outputs IR and later again for the second command that does the assembling - which amounts to optimizing twice, just like we do in one-step compilation via IR.

The cost of the second run of the whole optimizer seems quite significant. When I remove it, the chains.sol benchmark improves by 30% (from 36 s to 24 s). The compilation time of OpenZeppelin external test drops from 6.5 min to 5.5 min (though, oddly, the test suite as a whole got slower so it's not visible by just looking at CI times - you have to check the compilation timing in the CLI output).

Given how slight the changes from the second run are, I suspect we might be able to get the same effect by adjusting the optimization sequence and get a much smaller performance hit from it. That needs some investigation though and this PR is not a such a fix. Instead it just moves the second run to IRGenerator so that it's more obvious we're running it twice. The effect is that now this second run happens before we generate the IR output and its effects are visible in the --ir-optimized output.

…vious we're running it twice

hrkrshnn · 2023-04-30T20:13:09Z

first in IRGenerator and then again in CompilerStack on that already optimized output.

The first one could be vestiges from where Yul is used in legacy codegen. If this is the case, we could remove one of them in the via-ir codegen?

I'm not surprised that running it twice v/s once at the end has some deviations in output. But I think it should be minimal.

nikola-matic · 2023-04-30T20:39:54Z

As Hari mentioned, I'm not surprised that running twice instead of once will produce different output, since the second time, you are effectively running on code that was transformed by the first optimizer run. Still though, no idea why we're running it twice, although this does indicate that it was on purpose:

they only work on develop because we use --optimize both for the solc command that outputs IR and later again for the second command that does the assembling - which amounts to optimizing twice, just like we do in one-step compilation via IR.

nikola-matic · 2023-04-30T20:53:44Z

libsolidity/interface/CompilerStack.cpp

@@ -1467,7 +1467,6 @@ void CompilerStack::generateEVMFromIR(ContractDefinition const& _contract)
 		m_debugInfoSelection
 	);
 	stack.parseAndAnalyze("", compiledContract.yulIROptimized);
-	stack.optimize();

 	//cout << yul::AsmPrinter{}(*stack.parserResult()->code) << endl;


Unrelated, but we have a few of these scattered around; should probably remove them.

cameel · 2023-05-01T15:35:52Z

I don't know about the one in IRGenerator, but the one in CompilerContext was added by @axic in #10147. I reviewed that PR back then, but I did not notice this myself (even though the name of the input - compiledContract.yulIROptimize - should have been a dead giveaway). Still, I don't know if he used it on purpose or just somehow did not realize that IRGenerator runs it too.

although this does indicate that it was on purpose:

Haha, no. I would know because I wrote that test (#12141) :) This part is definitely an accident. Back then I think I assumed that double --optimize can't hurt because the optimization must be happening only on one end anyway. I did not dig into it because the fact that I was getting the output exactly matching the single-step compilation seemed to confirm that.

nikola-matic · 2023-05-01T15:44:21Z

A happy accident in that case :). I guess we're gonna have to wait for input from @chriseth or @axic in any case; I wouldn't mind approving this per se, but if Chris or axix can shed some light on this, it would still be nice.

cameel · 2023-05-01T15:52:08Z

As Hari mentioned, I'm not surprised that running twice instead of once will produce different output, since the second time, you are effectively running on code that was transformed by the first optimizer run.

I was a little surprised - because I thought that our sequence was chosen in a way that prevents this. Also, I did a small test by running the optimizer three times and the effect was the same as running it twice - so this seemed like an anomaly. I only checked that on this equivalence test though so this might not hold in general (but seemed to indicate that it does).

If this is the case, we could remove one of them in the via-ir codegen?

I totally expect @ekpyron to come to this thread at any moment with a perfectly logical explanation of why all of this is wrong and it has to stay that way and just close it :)

But yeah, If it is superfluous, then it's a very low-hanging fruit in terms of performance optimization of the IP pipeline. We get 15-30% speedup almost for free. It's just that we can't only remove it. We also have to adjust the sequence because of those failing external tests.

Though I see now that these tests are getting into "Stack too deep" errors even with optimizer because they don't use memory-safe assembly blocks. So maybe we could even disregard that. Still, this means that the double optimization was actually giving us slightly better stack optimization.

cameel · 2023-05-01T19:16:54Z

By the way, here's the benchmark showing how it affects gas usage for IR.

As expected it does not change anything much, but it's still interesting that differences are non-zero. This PR does not disable the second optimizer run so they should actually all be zero. It could be because these external test suites generally don't seem to be fully deterministic, but the differences show up in quite a few tests so it's not certain. I tried to verify that by rerunning tests and seeing if they randomly change, but unfortunately I'm getting errors in parsing the gas report.

`ir-optimize-evm+yul` (relative)

project	bytecode_size	deployment_gas	method_gas
bleeps	`0%`	`0%`	`0%`
brink	`0%`
chainlink	`0%`	`0%`	`+0%`
colony	`0%`
elementfi	`0%`
ens	`0%`	`+0%`	`0%`
euler	`0%`	`-0%`	`+0%`
gnosis
gp2	`0%`	`-0%`	`+0%`
perpetual-pools	`0%`	`+0%`	`-0.01% ✅`
pool-together	`0%`	`-0%`	`+0%`
prb-math	`0%`	`0%`	`0%`
trident	`0%`	`+0%`	`+0%`
uniswap	`0%`	`+0%`	`+0%`
yield_liquidator	`0%`	`-0%`	`0%`
zeppelin	`!V`	`!V`	`!V`

`ir-optimize-evm+yul` (absolute)

project	bytecode_size	deployment_gas	method_gas
bleeps	0	0	0
brink	0
chainlink	0	0	483
colony	0
elementfi	0
ens	0	24	0
euler	0	-12	24138
gnosis
gp2	0	-48	5
perpetual-pools	0	34	-74351
pool-together	0	-12	55
prb-math	0	0	0
trident	0	24	269
uniswap	0	12	1170
yield_liquidator	0	-12	0
zeppelin	`!V`	`!V`	`!V`

ekpyron · 2023-05-02T10:26:39Z

I don't think this is intentional - it's comparably old code and I don't see a good reason for this behaviour - the fact that two-step compilation works due to this also seems rather coincidental, but that's an issue of course, since ideally we'd want to preserve that, but that'll in fact be a bit tricky without this weird behaviour.

hrkrshnn · 2023-05-02T10:53:03Z

But yeah, If it is superfluous, then it's a very low-hanging fruit in terms of performance optimization of the IP pipeline. We get 15-30% speedup almost for free. It's just that we can't only remove it. We also have to adjust the sequence because of those failing external tests.

Yeah, I wonder if we can only run a simple subset of the sequence first (especially something that doesn't have a dependency on dataflow analyzer) and get the same optimized output. @cameel

but it's still interesting that differences are non-zero

I think it's okay. There are many things like inlining heuristics that would affect things like these.

ekpyron · 2023-05-02T11:47:12Z

I wouldn't be so concerned about those tests failing. The first uses unsafe assembly and changing that would fix it - the other actually seems to be a bug in the code transform/in stack-to-memory which we'll fix separately.

The real problem here is preserving the fact that we can compile separately in stages producing the same result as a one-step compilation.

ekpyron · 2023-05-02T12:32:58Z

Although in #12141 we should really have used non-optimized IR output as input for the second stage, shouldn't we? That'd probably be the way to preserve this as working... it's a bit of a harsh change now, but given the benefits...

ekpyron · 2023-05-02T12:38:33Z

libsolidity/codegen/ir/IRGenerator.cpp

 	asmStack.optimize();
+	asmStack.optimize(); // FIXME: We should be getting good results without running this twice


This is the place in which I'd rather remove this really... why does this function deal with any of this in the first place :-)? It's an IR generator, it should just, you know, generate IR, shouldn't it :-)?
It's probably solely for the purpose of the --ir-optimized output... which we may change in the future anyways, since it's horrible confusing (unfortunately breaking, though)...

So I'd suggest to remove this entire mess here, deal with --ir-optimized output on-demand-only in the compiler stack instead and to only deal with unoptimized IR here, which is then only optimized in generateEVMFromIR directly on the other side as well.

ekpyron · 2023-05-15T11:51:03Z

I'd suggest to do the following:

We drop the optimizer run in IRGenerator. The requested output for --ir-optimized can be generated using separate logic in the compiler stack (we'll probably change the mechanisms for that and --ir anyways eventually, so it doesn't need to be pretty)
We change the two-step compilation tests to generating unoptimized IR first and then assembling from there - that should still work, if we drop the additional optimization.
We can, in a second step, add some means to assembling with optimized assembler, but without re-optimizing the Yul code (we can separately decide upon option names, etc.), such that taking optimized IR and assembling with that option enabled still yields the same result as one-step compilation.

And we should definitely try to get all of this into the next release.

ekpyron · 2023-05-15T12:33:57Z

Note that the plan I just proposed will affect #14177, so ideally we'll get that merged beforehand and then adjust this together.

cameel · 2023-05-15T12:50:00Z

Do we want to merge this PR though? I think that making these extra transformations coming from the second optimize visible in --ir-optimized is a good thing and it does not hurt to do all the changes you're proposing on top if it. At least we'll get this one in if for some reason we don't manage to get it into 0.8.21 (though it seems straightforward enough that it probably won't be a problem).

Also, would probably be a good idea to put your plan into an issue so that we can properly track it in a milestone.

ekpyron · 2023-05-15T14:13:34Z

Do we want to merge this PR though? I think that making these extra transformations coming from the second optimize visible in --ir-optimized is a good thing and it does not hurt to do all the changes you're proposing on top if it. At least we'll get this one in if for some reason we don't manage to get it into 0.8.21 (though it seems straightforward enough that it probably won't be a problem).

Also, would probably be a good idea to put your plan into an issue so that we can properly track it in a milestone.

I'd just do it cleanly according to the plan - the place where the PR duplicates the run, is the place at which we don't want it at all in the future :-). And yeah: feel free to open an issue for tracking the plan!

cameel · 2023-05-17T14:44:43Z

ok then. I opened an issue (#14241) and I'll submit a new PR. I'm closing this one.

Move the second run of YulOptimizer to IRGenerator to make it more ob…

4296197

…vious we're running it twice

cameel added the optimizer label Apr 29, 2023

cameel requested review from axic, ekpyron, nikola-matic, chriseth and hrkrshnn April 29, 2023 03:48

cameel self-assigned this Apr 29, 2023

nikola-matic reviewed Apr 30, 2023

View reviewed changes

ekpyron reviewed May 2, 2023

View reviewed changes

cameel mentioned this pull request May 17, 2023

Yul code is getting optimized twice #14241

Closed

cameel closed this May 17, 2023

cameel added the performance 🐎 label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it obvious that we're running Yul optimizer twice via IR #14172

Make it obvious that we're running Yul optimizer twice via IR #14172

cameel commented Apr 29, 2023 •

edited

Loading

hrkrshnn commented Apr 30, 2023

nikola-matic commented Apr 30, 2023

nikola-matic Apr 30, 2023

cameel commented May 1, 2023

nikola-matic commented May 1, 2023

cameel commented May 1, 2023

cameel commented May 1, 2023 •

edited

Loading

ekpyron commented May 2, 2023 •

edited

Loading

hrkrshnn commented May 2, 2023 •

edited

Loading

ekpyron commented May 2, 2023

ekpyron commented May 2, 2023

ekpyron May 2, 2023

ekpyron commented May 15, 2023

ekpyron commented May 15, 2023

cameel commented May 15, 2023 •

edited

Loading

ekpyron commented May 15, 2023

cameel commented May 17, 2023

		asmStack.optimize();
		asmStack.optimize(); // FIXME: We should be getting good results without running this twice

Make it obvious that we're running Yul optimizer twice via IR #14172

Make it obvious that we're running Yul optimizer twice via IR #14172

Conversation

cameel commented Apr 29, 2023 • edited Loading

Details

hrkrshnn commented Apr 30, 2023

nikola-matic commented Apr 30, 2023

nikola-matic Apr 30, 2023

Choose a reason for hiding this comment

cameel commented May 1, 2023

nikola-matic commented May 1, 2023

cameel commented May 1, 2023

cameel commented May 1, 2023 • edited Loading

ir-optimize-evm+yul (relative)

ir-optimize-evm+yul (absolute)

ekpyron commented May 2, 2023 • edited Loading

hrkrshnn commented May 2, 2023 • edited Loading

ekpyron commented May 2, 2023

ekpyron commented May 2, 2023

ekpyron May 2, 2023

Choose a reason for hiding this comment

ekpyron commented May 15, 2023

ekpyron commented May 15, 2023

cameel commented May 15, 2023 • edited Loading

ekpyron commented May 15, 2023

cameel commented May 17, 2023

cameel commented Apr 29, 2023 •

edited

Loading

cameel commented May 1, 2023 •

edited

Loading

`ir-optimize-evm+yul` (relative)

`ir-optimize-evm+yul` (absolute)

ekpyron commented May 2, 2023 •

edited

Loading

hrkrshnn commented May 2, 2023 •

edited

Loading

cameel commented May 15, 2023 •

edited

Loading