Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client: sync breaking "Peer error: invalid rlp: total length is larger than the data" #1050

Closed
holgerd77 opened this issue Jan 18, 2021 · 2 comments

Comments

@holgerd77
Copy link
Member

holgerd77 commented Jan 18, 2021

This is maybe the last remaining error which regularly breaks client sync after some time. Right now in the process of debugging, will drop my findings here since I've to stop soon.

Error is showing up in the following way:

image

I was now able to get a stack trace by adding a console.log(error) statement in peerpool.ts when the error is received as an event from a peer:

Error: invalid rlp: total length is larger than the data
    at _decode (/EthereumJS/ethereumjs-vm/node_modules/rlp/dist/index.js:169:19)
    at Object.decode (/EthereumJS/ethereumjs-vm/node_modules/rlp/dist/index.js:56:19)
    at ETH._handleMessage (/EthereumJS/ethereumjs-vm/packages/devp2p/dist/eth/index.js:65:29)
    at Peer._handleBody (/EthereumJS/ethereumjs-vm/packages/devp2p/dist/rlpx/peer.js:422:26)
    at Peer._onSocketData (/EthereumJS/ethereumjs-vm/packages/devp2p/dist/rlpx/peer.js:457:30)

So this is some malformed RLP message incoming (the original error is coming from the RLP.decode() function) and is thrown along the ETH protocol message handling in the payload decoding in ETH._handleMessage(). The _handleMessage() function is called in a protocol-agonstic way in peer.ts in Peer._handleBody().

The error is correctly propagated to the client so the following three lines from the catch cause from the last linked code block are all run, tested this with console.log() statements.

Not sure yet why and how the client gets in some inconsistent / blocking state by this call, since the peer connection should be closed. So my current assumption is that the place where this is happening originally in the client is relevant (which is not shown in the stack trace), since this might bring the client to some deadlock state or something (setting a variable unluckily, whatever).

This needs some further investigation.

@holgerd77
Copy link
Member Author

First time I have observed the client run going over the issue from above and continue both sync and execution:

image

So there is a slight hope that this has been solved by #1064 (which would be a good candidate to fix such kind of errors leading the client to get stuck) 😄

Will leave this open for some time for some further confirmation though.

@holgerd77
Copy link
Member Author

This hasn't occurred any further respectively when happened didn't break sync any more, will close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant