Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make it obvious that we're running Yul optimizer twice via IR #14172

Closed
wants to merge 1 commit into from

Conversation

cameel
Copy link
Member

@cameel cameel commented Apr 29, 2023

While working on #14149 I discovered that we're running Yul optimizer twice for the IR codegen: first in IRGenerator and then again in CompilerStack on that already optimized output.

Can anyone confirm my observations or am I missing some important detail? Maybe the second run is intentional after all?

Details

The second run looks superfluous to me and I expected it to have no effect, but this is not the case. It seems to have some minimal effects. E.g. it can eliminate some variables with constant values (see diff below). What's surprising is that this is apparently enough to make several external tests start failing with "Stack too deep" when that second run is removed. See CI run 29491 on the do-not-optimize-twice-via-ir branch, where I removed it. It makes bleeps and trident fail.

The second run might be understandable if we were doing some changes in between (this happens e.g. for ewasm, where we optimize -> translate -> optimize again). Weirdly, here that's not it. We get the same effect even if we put those two runs next to each other with nothing in between (as this PR does).

An interesting coincidence is that the difference should have shown up in the via IR equivalence test. These tests do fail on the do-not-optimize-twice-via-ir branch. They only work on develop because we use --optimize both for the solc command that outputs IR and later again for the second command that does the assembling - which amounts to optimizing twice, just like we do in one-step compilation via IR.

The cost of the second run of the whole optimizer seems quite significant. When I remove it, the chains.sol benchmark improves by 30% (from 36 s to 24 s). The compilation time of OpenZeppelin external test drops from 6.5 min to 5.5 min (though, oddly, the test suite as a whole got slower so it's not visible by just looking at CI times - you have to check the compilation timing in the CLI output).

Given how slight the changes from the second run are, I suspect we might be able to get the same effect by adjusting the optimization sequence and get a much smaller performance hit from it. That needs some investigation though and this PR is not a such a fix. Instead it just moves the second run to IRGenerator so that it's more obvious we're running it twice. The effect is that now this second run happens before we generate the IR output and its effects are visible in the --ir-optimized output.

@cameel cameel self-assigned this Apr 29, 2023
@hrkrshnn
Copy link
Member

first in IRGenerator and then again in CompilerStack on that already optimized output.

The first one could be vestiges from where Yul is used in legacy codegen. If this is the case, we could remove one of them in the via-ir codegen?

I'm not surprised that running it twice v/s once at the end has some deviations in output. But I think it should be minimal.

@nikola-matic
Copy link
Collaborator

As Hari mentioned, I'm not surprised that running twice instead of once will produce different output, since the second time, you are effectively running on code that was transformed by the first optimizer run. Still though, no idea why we're running it twice, although this does indicate that it was on purpose:

they only work on develop because we use --optimize both for the solc command that outputs IR and later again for the second command that does the assembling - which amounts to optimizing twice, just like we do in one-step compilation via IR.

@@ -1467,7 +1467,6 @@ void CompilerStack::generateEVMFromIR(ContractDefinition const& _contract)
m_debugInfoSelection
);
stack.parseAndAnalyze("", compiledContract.yulIROptimized);
stack.optimize();

//cout << yul::AsmPrinter{}(*stack.parserResult()->code) << endl;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated, but we have a few of these scattered around; should probably remove them.

@cameel
Copy link
Member Author

cameel commented May 1, 2023

I don't know about the one in IRGenerator, but the one in CompilerContext was added by @axic in #10147. I reviewed that PR back then, but I did not notice this myself (even though the name of the input - compiledContract.yulIROptimize - should have been a dead giveaway). Still, I don't know if he used it on purpose or just somehow did not realize that IRGenerator runs it too.

although this does indicate that it was on purpose:

Haha, no. I would know because I wrote that test (#12141) :) This part is definitely an accident. Back then I think I assumed that double --optimize can't hurt because the optimization must be happening only on one end anyway. I did not dig into it because the fact that I was getting the output exactly matching the single-step compilation seemed to confirm that.

@nikola-matic
Copy link
Collaborator

A happy accident in that case :). I guess we're gonna have to wait for input from @chriseth or @axic in any case; I wouldn't mind approving this per se, but if Chris or axix can shed some light on this, it would still be nice.

@cameel
Copy link
Member Author

cameel commented May 1, 2023

As Hari mentioned, I'm not surprised that running twice instead of once will produce different output, since the second time, you are effectively running on code that was transformed by the first optimizer run.

I was a little surprised - because I thought that our sequence was chosen in a way that prevents this. Also, I did a small test by running the optimizer three times and the effect was the same as running it twice - so this seemed like an anomaly. I only checked that on this equivalence test though so this might not hold in general (but seemed to indicate that it does).

If this is the case, we could remove one of them in the via-ir codegen?

I totally expect @ekpyron to come to this thread at any moment with a perfectly logical explanation of why all of this is wrong and it has to stay that way and just close it :)

But yeah, If it is superfluous, then it's a very low-hanging fruit in terms of performance optimization of the IP pipeline. We get 15-30% speedup almost for free. It's just that we can't only remove it. We also have to adjust the sequence because of those failing external tests.

Though I see now that these tests are getting into "Stack too deep" errors even with optimizer because they don't use memory-safe assembly blocks. So maybe we could even disregard that. Still, this means that the double optimization was actually giving us slightly better stack optimization.

@cameel
Copy link
Member Author

cameel commented May 1, 2023

By the way, here's the benchmark showing how it affects gas usage for IR.

As expected it does not change anything much, but it's still interesting that differences are non-zero. This PR does not disable the second optimizer run so they should actually all be zero. It could be because these external test suites generally don't seem to be fully deterministic, but the differences show up in quite a few tests so it's not certain. I tried to verify that by rerunning tests and seeing if they randomly change, but unfortunately I'm getting errors in parsing the gas report.

ir-optimize-evm+yul (relative)

project bytecode_size deployment_gas method_gas
bleeps 0% 0% 0%
brink 0%
chainlink 0% 0% +0%
colony 0%
elementfi 0%
ens 0% +0% 0%
euler 0% -0% +0%
gnosis
gp2 0% -0% +0%
perpetual-pools 0% +0% -0.01% ✅
pool-together 0% -0% +0%
prb-math 0% 0% 0%
trident 0% +0% +0%
uniswap 0% +0% +0%
yield_liquidator 0% -0% 0%
zeppelin !V !V !V

ir-optimize-evm+yul (absolute)

project bytecode_size deployment_gas method_gas
bleeps 0 0 0
brink 0
chainlink 0 0 483
colony 0
elementfi 0
ens 0 24 0
euler 0 -12 24138
gnosis
gp2 0 -48 5
perpetual-pools 0 34 -74351
pool-together 0 -12 55
prb-math 0 0 0
trident 0 24 269
uniswap 0 12 1170
yield_liquidator 0 -12 0
zeppelin !V !V !V

@ekpyron
Copy link
Member

ekpyron commented May 2, 2023

I don't think this is intentional - it's comparably old code and I don't see a good reason for this behaviour - the fact that two-step compilation works due to this also seems rather coincidental, but that's an issue of course, since ideally we'd want to preserve that, but that'll in fact be a bit tricky without this weird behaviour.

@hrkrshnn
Copy link
Member

hrkrshnn commented May 2, 2023

But yeah, If it is superfluous, then it's a very low-hanging fruit in terms of performance optimization of the IP pipeline. We get 15-30% speedup almost for free. It's just that we can't only remove it. We also have to adjust the sequence because of those failing external tests.

Yeah, I wonder if we can only run a simple subset of the sequence first (especially something that doesn't have a dependency on dataflow analyzer) and get the same optimized output. @cameel

but it's still interesting that differences are non-zero

I think it's okay. There are many things like inlining heuristics that would affect things like these.

@ekpyron
Copy link
Member

ekpyron commented May 2, 2023

I wouldn't be so concerned about those tests failing. The first uses unsafe assembly and changing that would fix it - the other actually seems to be a bug in the code transform/in stack-to-memory which we'll fix separately.

The real problem here is preserving the fact that we can compile separately in stages producing the same result as a one-step compilation.

@ekpyron
Copy link
Member

ekpyron commented May 2, 2023

Although in #12141 we should really have used non-optimized IR output as input for the second stage, shouldn't we? That'd probably be the way to preserve this as working... it's a bit of a harsh change now, but given the benefits...

Comment on lines 115 to +116
asmStack.optimize();
asmStack.optimize(); // FIXME: We should be getting good results without running this twice
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the place in which I'd rather remove this really... why does this function deal with any of this in the first place :-)? It's an IR generator, it should just, you know, generate IR, shouldn't it :-)?
It's probably solely for the purpose of the --ir-optimized output... which we may change in the future anyways, since it's horrible confusing (unfortunately breaking, though)...

So I'd suggest to remove this entire mess here, deal with --ir-optimized output on-demand-only in the compiler stack instead and to only deal with unoptimized IR here, which is then only optimized in generateEVMFromIR directly on the other side as well.

@ekpyron
Copy link
Member

ekpyron commented May 15, 2023

I'd suggest to do the following:

  • We drop the optimizer run in IRGenerator. The requested output for --ir-optimized can be generated using separate logic in the compiler stack (we'll probably change the mechanisms for that and --ir anyways eventually, so it doesn't need to be pretty)
  • We change the two-step compilation tests to generating unoptimized IR first and then assembling from there - that should still work, if we drop the additional optimization.
  • We can, in a second step, add some means to assembling with optimized assembler, but without re-optimizing the Yul code (we can separately decide upon option names, etc.), such that taking optimized IR and assembling with that option enabled still yields the same result as one-step compilation.

And we should definitely try to get all of this into the next release.

@ekpyron
Copy link
Member

ekpyron commented May 15, 2023

Note that the plan I just proposed will affect #14177, so ideally we'll get that merged beforehand and then adjust this together.

@cameel
Copy link
Member Author

cameel commented May 15, 2023

Do we want to merge this PR though? I think that making these extra transformations coming from the second optimize visible in --ir-optimized is a good thing and it does not hurt to do all the changes you're proposing on top if it. At least we'll get this one in if for some reason we don't manage to get it into 0.8.21 (though it seems straightforward enough that it probably won't be a problem).

Also, would probably be a good idea to put your plan into an issue so that we can properly track it in a milestone.

@ekpyron
Copy link
Member

ekpyron commented May 15, 2023

Do we want to merge this PR though? I think that making these extra transformations coming from the second optimize visible in --ir-optimized is a good thing and it does not hurt to do all the changes you're proposing on top if it. At least we'll get this one in if for some reason we don't manage to get it into 0.8.21 (though it seems straightforward enough that it probably won't be a problem).

Also, would probably be a good idea to put your plan into an issue so that we can properly track it in a milestone.

I'd just do it cleanly according to the plan - the place where the PR duplicates the run, is the place at which we don't want it at all in the future :-). And yeah: feel free to open an issue for tracking the plan!

@cameel
Copy link
Member Author

cameel commented May 17, 2023

ok then. I opened an issue (#14241) and I'll submit a new PR. I'm closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants