Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected trie node error occurs after initial snap sync #28587

Closed
rjl493456442 opened this issue Nov 23, 2023 · 2 comments · Fixed by #28595
Closed

Unexpected trie node error occurs after initial snap sync #28587

rjl493456442 opened this issue Nov 23, 2023 · 2 comments · Fixed by #28595
Labels

Comments

@rjl493456442
Copy link
Member

rjl493456442 commented Nov 23, 2023

System information

Geth version: geth version: 1.13.5

Issue description

Ref: original ticket #27983 (comment)

Nov 22 12:33:19 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:19.850] Initialized transaction indexer          limit=2,350,000
Nov 22 12:33:19 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:19.850] Loaded local transaction journal         transactions=0 dropped=0
Nov 22 12:33:19 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:19.851] Regenerated local transaction journal    transactions=0 accounts=0
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: WARN [11-22|12:33:20.370] Switch sync mode from snap sync to full sync reason="snap sync complete"
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:20.370] Chain post-merge, sync via beacon client
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: INFO [11-22|12:33:20.370] Gasprice oracle is ignoring threshold set threshold=2
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: ERROR[11-22|12:33:20.389] Unexpected trie node in disk             owner=5cc0a4..667982 path="[12 5 9 3 7]" expect=8b09b1..e87152 got=99f9a0..b9f78f
Nov 22 12:33:20 ip-10-0-0-11.ec2.internal geth[30414]: ERROR[11-22|12:33:20.389] State snapshotter failed to iterate trie err="missing trie node 8b09b17b3a4e17de5274c52cc6387cf42c1fb25fd97effda757bb9a2cde87152 (owner 5cc0a47442e6bc69eb1ec9e2ff1fe0c9657c26dfa5836f560fd7141038667982) (path 0c05090307) unexpected node, loc: disk, node: (5cc0a47442e6bc69eb1ec9e2ff1fe0c9657c26dfa5836f560fd7141038667982 [12 5 9 3 7]), 8b09b17b3a4e17de5274c52cc6387cf42c1fb25fd97effda757bb9a2cde87152!=99f9a0c9f954cd0d8cf5bb7df9c2b5e529a1652fcc97824ee446ba9300b9f78f, blob: 0xf87180a0df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9808080808080a0b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae808080a02ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e780808080"

The node is reported as invalid, with

  • owner: 5cc0a47442e6bc69eb1ec9e2ff1fe0c9657c26dfa5836f560fd7141038667982,
  • address: 0x32400084C286CF3E17e7B677ea9583e60a000324
  • path:[12 5 9 3 7]
  • content: 0xf87180a0df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9808080808080a0b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae808080a02ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e780808080
  • exphash: 8b09b17b3a4e17de5274c52cc6387cf42c1fb25fd97effda757bb9a2cde87152
  • gothash 99f9a0c9f954cd0d8cf5bb7df9c2b5e529a1652fcc97824ee446ba9300b9f78f

After retrieving the correct node from our benchmark machine, I rlpdump them

correct node

(base) ➜  ~ rlpdump -hex 0xf8518080808080808080a0b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae808080a02ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e780808080
[
  "",
  "",
  "",
  "",
  "",
  "",
  "",
  "",
  b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae,
  "",
  "",
  "",
  2ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e7,
  "",
  "",
  "",
  "",
]

corrupted node

(base) ➜  ~ rlpdump -hex 0xf87180a0df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9808080808080a0b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae808080a02ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e780808080
[
  "",
  df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9,
  "",
  "",
  "",
  "",
  "",
  "",
  b1902b4fca66415f63634e3ddeae1bfa7b877a1db5ed4c029730e166ba2031ae,
  "",
  "",
  "",
  2ded9e78076e79e96fcd5562c7951f678d22a167429cc75c17d30a08705bb6e7,
  "",
  "",
  "",
  "",
]

The corrupted node has one more child at the index 1.


Also, I dumped out the parent nodes of this one, they are all full nodes with no shortNode in the middle of path, so it's not relevant with the shortNode trick at all.

This storage is quite huge, with 1.8m slots inside.


I analyzed the contract, there are two functions can mutate the states:

  • finalizeEthWithdrawal (0x6c0960f9): example from etherscan
  • requestL2Transaction (0xeb672419): example from etherscan

Both of them only create new storage slot, but never delete storage slot.


There are a few possibilities here for this situation:

  • The state sync target is forked, the transaction which creates the trie node at index 1 is reorged out and never get accepted
    I don't think it's the case here. Geth uses head-64 as the sync target, which is very very hard to be reorged in the proof-of-stake network.

  • programatic problems??


The log is attached.

Here it is -- I had a few mis-starts in the log but each time I purged the DB

sync-log.zip

@rjl493456442
Copy link
Member Author

rjl493456442 commented Nov 23, 2023

The node rlpdump corresponds to hash df5465feffb831b1f31a6184b1efdf75f10f13b2b4900956c22f41a6108c45c9 is

(base) ➜  ~ rlpdump -hex 0xf8419e20a8ccaf952498df75fd7dfb93763a000cc976d1512a01bc3afbbbe2ba92a1a06a62a088d03375c29f8c41b3cd5d4e350f25031c4a712bd7ec2f6555b3365cc5
[
  20a8ccaf952498df75fd7dfb93763a000cc976d1512a01bc3afbbbe2ba92,
  a06a62a088d03375c29f8c41b3cd5d4e350f25031c4a712bd7ec2f6555b3365cc5,
]

@rjl493456442
Copy link
Member Author

rjl493456442 commented Nov 24, 2023

The "unexpected" node is created in this transaction https://etherscan.io/tx/0xadf288639e9921292c8847c5b0ff1c66513f5472dd92b43e17d9451f235c0bfd in block https://etherscan.io/block/18583652

The "unexpected" node is deleted in this transaction
https://etherscan.io/tx/0x9f1cd1a48c9bf3066d6af8aaa2c9c3b84f7c332e0e6a49fdf3f2e8c01bb191bc in block 18590132

According to the sync log, at block 18583652 node is still in the snap sync stage,

Nov 16 10:10:22 ip-10-0-0-11.ec2.internal geth[18425]: INFO [11-16|10:10:22.645] Loaded last snap-sync pivot marker       number=18,583,669
Nov 16 10:10:22 ip-10-0-0-11.ec2.internal geth[18425]: ERROR[11-16|10:10:22.645] Failed to reset txpool state             err="missing trie node d7f8974fb5ac78d9ac099b9ad5018bedc2ce0a72dad1827a1709da30580f0544 (path ) layer stale"
Nov 16 10:10:22 ip-10-0-0-11.ec2.internal geth[18425]: ERROR[11-16|10:10:22.645] Failed to reset blobpool state           err="missing trie node d7f8974fb5ac78d9ac099b9ad5018bedc2ce0a72dad1827a1709da30580f0544 (path ) layer stale"
Nov 16 10:10:22 ip-10-0-0-11.ec2.internal geth[18425]: INFO [11-16|10:10:22.646] Syncing: state download in progress      synced=72.97% state=212.22GiB accounts=174,259,772@33.40GiB slots=838,358,902@173.44GiB codes=842,563@5.37GiB    eta=5h57m18.862s
Nov 16 10:10:22 ip-10-0-0-11.ec2.internal geth[18425]: INFO [11-16|10:10:22.706] Syncing: state download in progress      synced=72.97% state=212.22GiB accounts=174,259,772@33.40GiB slots=838,358,902@173.44GiB codes=842,563@5.37GiB    eta=5h57m18.801s

at block 18590132 node is still in the state healing stage.

Nov 17 07:49:20 ip-10-0-0-11.ec2.internal geth[18425]: INFO [11-17|07:49:20.997] Loaded last snap-sync pivot marker       number=18,590,147
Nov 17 07:49:20 ip-10-0-0-11.ec2.internal geth[18425]: ERROR[11-17|07:49:20.998] Failed to reset txpool state             err="missing trie node d7f8974fb5ac78d9ac099b9ad5018bedc2ce0a72dad1827a1709da30580f0544 (path ) layer stale"
Nov 17 07:49:20 ip-10-0-0-11.ec2.internal geth[18425]: INFO [11-17|07:49:20.998] Syncing: state healing in progress       accounts=270,254@13.06MiB     slots=492,602@37.49MiB        codes=569@4.10MiB        nodes=2,353,275@569.59MiB pending=1
Nov 17 07:49:20 ip-10-0-0-11.ec2.internal geth[18425]: ERROR[11-17|07:49:20.998] Failed to reset blobpool state           err="missing trie node d7f8974fb5ac78d9ac099b9ad5018bedc2ce0a72dad1827a1709da30580f0544 (path ) layer stale"
Nov 17 07:49:21 ip-10-0-0-11.ec2.internal geth[18425]: INFO [11-17|07:49:21.109] Syncing: state healing in progress       accounts=270,254@13.06MiB     slots=492,602@37.49MiB        codes=569@4.10MiB        nodes=2,353,276@569.59MiB pending=17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant