Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use tests in BlockchainTests #35

Closed
wborgeaud opened this issue Aug 16, 2023 · 14 comments · Fixed by #37
Closed

Use tests in BlockchainTests #35

wborgeaud opened this issue Aug 16, 2023 · 14 comments · Fixed by #37
Assignees

Comments

@wborgeaud
Copy link
Contributor

We currently use the tests in the GeneralStateTests folder of the Ethereum tests repo.
Each such test seems to be replicated in the BlockchainTests/GeneralStateTests folder, in a different format. This format includes useful information like the roots of the receipts and transactions tries, the Bloom filter, and the withdrawals root.

It would be helpful to modify the runner to use these tests instead, so that we can verify these data. An immediate application would be to verify the receipts trie root produced by 0xPolygonZero/plonky2#1097.

@BGluth BGluth self-assigned this Aug 28, 2023
@BGluth
Copy link
Contributor

BGluth commented Aug 30, 2023

Hmm yeah I agree. The tests under BlockchainTests are the expanded versions of the root GeneralStateTests. I remember we deliberated a bit on which to parse when we wrote this initially, but I don't remember why we chose to parse this format instead. As far as I can tell, no information is lost through the expansion, so we should be fine to parse this instead.

@Nashtare
Copy link
Contributor

I think one difference is the absence of variant cases on the BlockchainTests side. For example comparing the Shanghai version for BlockchainTests/Call1MB1024Calldepth.json vs GeneralStateTests/Call1MB1024Calldepth.json, there's only 1 txn vs 2 (the second one is the same).

@BGluth
Copy link
Contributor

BGluth commented Aug 31, 2023

Hmm yeah. I think what actually ends up happening during the test expansion is each of these variants gets expanded into separate "test sections" (for lack of a better word). So like in this case, the two txns are there, but they have their own sections Call1MB1024Calldepth_d0g0v0_Shanghai & "Call1MB1024Calldepth_d0g1v0_Berlin", where g* corresponds to which gas price the entry is using from here:

"Shanghai" : [
    {
        "hash" : "0x5de8ccc907a2cf4b21c0ee18f6ccd171f7b6a517b149cc7fb5398ec40d619e0f",
        "indexes" : {
            "data" : 0,
            "gas" : 1,
            "value" : 0
        },
        "logs" : "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
        "txbytes" : "0xf861800a840ee6b28094bbbf5374fce5edbc8e2a8697c15331677e6ebf0b0a801ba0462186579a4be0ad8a63224059a11693b4c0684b9939f6c2394d1fbe045275f2a059d73f99e037295a5f8c0e656acdb5c8b9acd28ec73c320c277df61f2e2d54f9"
    },
    {
        "hash" : "0x485947362d7a26bfe59efa9a2581036f48cbdb09f9b704130ecd0dc2bf969ec7",
        "indexes" : {
            "data" : 0,
            "gas" : 0,
            "value" : 0
        },
        "logs" : "0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347",
        "txbytes" : "0xf860800a830249f094bbbf5374fce5edbc8e2a8697c15331677e6ebf0b0a801ba06843bbbe573744c46cf47f1c126c11a53cffa1bcc3eeb0e0d328e6a73ecc5447a02062c2fbb34bfea606f3cf268b5a94a81d68e748d4d896c514061c03e8acb774"
    }
]

@Nashtare
Copy link
Contributor

Ah good point, I had missed this! All good then 😄

@Nashtare
Copy link
Contributor

Nashtare commented Sep 6, 2023

In addition to the added support for the receipt trie, the latest PRs have also added a few fields to the block_metadata, namely

  • txn_number_before
  • gas_used_before
  • gas_used_after
  • block_bloom_before
  • block_bloom_after

some of which seem kind of cumbersome to parse ahead of proving

@BGluth
Copy link
Contributor

BGluth commented Sep 6, 2023

Thanks for the headsup @Nashtare.

Sorry that I haven't finished this yet. I have something in-progress, but I keep getting pulled to work on other tasks. I'm aiming to have this PR done by the end of tomorrow.

@Nashtare
Copy link
Contributor

Nashtare commented Sep 7, 2023

And actually another topic regarding recent updates on plonky2 side. Now that we observe public values in the challenger, we can't process tests that have a block_gaslimit not fitting in 32 bits (before we could, as long as they were not spawning a new context which would have caused an overflow when exiting the kernel).

I know that @wborgeaud wasn't inclined in supporting 2-limb gaslimit as kinda needless in practice, though that means there would be now many tests that aren't supported natively (we could manually alter their block_gaslimit but for most of those it seems setting it to u32::MAX would result in an OOG error), and this isn't ideal to ensure proper coverage of all edge-cases.

Do you have any opinion on the matter?

@BGluth
Copy link
Contributor

BGluth commented Sep 7, 2023

Hmm... So I'm a bit naive when it comes to a lot of the plonky2 internals, but maybe could we clamp the gas limit in these cases to u32::MAX? I don't know if this ends up making some tests sort of miss the point of what they are trying to test, but if most of them never hit the gas limit (and clamping the gas limit doesn't affect any other output like hashes or whatever), then maybe this is an option.

@Nashtare
Copy link
Contributor

Nashtare commented Sep 7, 2023

Hmm, we could test on a case-by-case basis but for the few of which I actually tried setting this to u32::MAX, it seemed they actually needed to go over that limit, as I wrote above

we could manually alter their block_gaslimit but for most of those it seems setting it to u32::MAX would result in an OOG error

hence I couldn't find a really satisfactory solution.

@Nashtare
Copy link
Contributor

Nashtare commented Sep 27, 2023

@wborgeaud I was wondering if your opinion had changed on the matter:

Now that we observe public values in the challenger, we can't process tests that have a block_gaslimit not fitting in 32 bits (before we could, as long as they were not spawning a new context which would have caused an overflow when exiting the kernel).
I know that @wborgeaud wasn't inclined in supporting 2-limb gaslimit as kinda needless in practice, though that means there would be now many tests that aren't supported natively.

I am currently working on updating the runner to latest plonky2, but the fact that so many tests are uncheckable (altering block_gaslimit leads to failure anyway in most cases) kind of concerns me, especially with all the heavy changes that happened on the zkEVM the past few months.
AFAIK, the overhead for having a 2-limb gaslimit would be light (we'd only have an additional Target to connect to the PublicValues but this is fairly negligible). Did you have other reserves about this?

@wborgeaud
Copy link
Contributor Author

Yeah sounds good, I agree the performance overhead is pretty negligible.
So let's change it to 2 limbs with a comment mentioning this is necessary for testing.

@Nashtare
Copy link
Contributor

Should we change block_gas_used also?
Out of the 18567 test I can parse so far (some are still TODO), I'm having:

  • 3940 unprovable because of gas_used overflowing u32,
  • out of the 14627 remaining ones, 3442 unprovable because of gaslimit.

@wborgeaud
Copy link
Contributor Author

Yes sounds good too. Just one question, do you have an idea if most of these tests will run in a reasonable time? Like I just want to make sure these tests are not stupidly large and will make the runner run out of memory anyways.

@Nashtare
Copy link
Contributor

Most of them are runnable on non-beefy machines yeah. They are already run with the current instance, we just "Ignore" them if the prover fails with GasLimit error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants