`nimbus-reth-1` on `devnet-12` sporadically getting `"code": -38001` from `engine_getPayloadV2` #5623

tersec · 2023-11-29T14:24:50Z

Describe the bug

The EF runs a JSON-RPC snooper, and this has happened twice in the first hour or so:

Nov 29 13:16:12.069 2023 REQUEST
{
  "jsonrpc": "2.0",
  "method": "engine_forkchoiceUpdatedV2",
  "params": [
    {
      "headBlockHash": "0xbce64b5dd363e32660f40a3345bb71a068255fb5ae977132bea9e4c776bb6689",
      "safeBlockHash": "0x0000000000000000000000000000000000000000000000000000000000000000",
      "finalizedBlockHash": "0x0000000000000000000000000000000000000000000000000000000000000000"
    },
    {
      "timestamp": "0x6567399C",
      "prevRandao": "0xafa4c7c736bc9c9e43bbb375824d36342fab2f917b8e659dab637b01ef58f10c",
      "suggestedFeeRecipient": "0xf97e180c050e5ab072211ad2c213eb5aee4df134",
      "withdrawals": []
    }
  ],
  "id": 444
}

Nov 29 13:16:12.071 2023 RESPONSE (status 200 OK)
{
  "jsonrpc": "2.0",
  "result": {
    "payloadStatus": {
      "status": "VALID",
      "latestValidHash": "0xbce64b5dd363e32660f40a3345bb71a068255fb5ae977132bea9e4c776bb6689",
      "validationError": null
    },
    "payloadId": "0x9aa4047b10975fa9"
  },
  "id": 444
}

Nov 29 13:16:12.573 2023 REQUEST
{
  "jsonrpc": "2.0",
  "method": "engine_getPayloadV2",
  "params": [
    "0x9aa4047b10975fa9"
  ],
  "id": 445
}

Nov 29 13:16:12.574 2023 RESPONSE (status 200 OK)
{
  "jsonrpc": "2.0",
  "error": {
    "code": -38001,
    "message": "Unknown payload"
  },
  "id": 445
}

Nov 29 13:28:24.063 2023 REQUEST
{
  "jsonrpc": "2.0",
  "method": "engine_forkchoiceUpdatedV2",
  "params": [
    {
      "headBlockHash": "0x0834fb9253e33ec9ecc641f08d4673f8d28d2e74e657d0bafee609223a7b8bfe",
      "safeBlockHash": "0x5b9529839af4542cc8841ceb10b30df56dd96cc206af61f8717139cc86fa8627",
      "finalizedBlockHash": "0x0fb6d76c70424225a23b2e1e9af0e3cc2b66ce03c76079b5ad5d5cab706367a6"
    },
    {
      "timestamp": "0x65673C78",
      "prevRandao": "0x90ad47b968263571877557fc54c3c3a95014f52c35a9c97138b722246db0b0d0",
      "suggestedFeeRecipient": "0xf97e180c050e5ab072211ad2c213eb5aee4df134",
      "withdrawals": []
    }
  ],
  "id": 749
}

Nov 29 13:28:24.064 2023 RESPONSE (status 200 OK)
{
  "jsonrpc": "2.0",
  "result": {
    "payloadStatus": {
      "status": "VALID",
      "latestValidHash": "0x0834fb9253e33ec9ecc641f08d4673f8d28d2e74e657d0bafee609223a7b8bfe",
      "validationError": null
    },
    "payloadId": "0x2c43dd4b9fd860b2"
  },
  "id": 749
}

Nov 29 13:28:24.566 2023 REQUEST
{
  "jsonrpc": "2.0",
  "method": "engine_getPayloadV2",
  "params": [
    "0x2c43dd4b9fd860b2"
  ],
  "id": 750
}

Nov 29 13:28:24.567 2023 RESPONSE (status 200 OK)
{
  "jsonrpc": "2.0",
  "error": {
    "code": -38001,
    "message": "Unknown payload"
  },
  "id": 750
}

Steps to reproduce

Consult devnet-12

Node logs

Node logs available on `nimbus-reth-1.dencun-devnet-12.ethpandaops.io` for both Nimbus and Reth.

Platform(s)

No response

What version/commit are you on?

c49cda6

What database version are you on?

No response

What type of node are you running?

None

What prune config do you use, if any?

No response

If you've built Reth from source, provide the full command you used

No response

Code of Conduct

I agree to follow the Code of Conduct

The text was updated successfully, but these errors were encountered:

mattsse · 2023-11-29T16:49:33Z

from the logs

2023-11-29T13:16:00.191668Z INFO reth_payload_builder::service: New payload job created id=0x9aa4047b10975fa9 parent=0xbce64b5dd363e32660f40a3345bb71a068255fb5ae977132bea9e4c776bb6689

and then 12 seconds later

2023-11-29T13:16:12.071105Z DEBUG reth_payload_builder::service: Payload job already in progress, ignoring. id=0x9aa4047b10975fa9 parent=0xbce64b5dd363e32660f40a3345bb71a068255fb5ae977132bea9e4c776bb6689

same situation with the other

2023-11-29T13:28:12.144440Z INFO reth_payload_builder::service: New payload job created id=0x2c43dd4b9fd860b2 parent=0x0834fb9253e33ec9ecc641f08d4673f8d28d2e74e657d0bafee609223a7b8bfe

2023-11-29T13:28:24.064648Z DEBUG reth_payload_builder::service: Payload job already in progress, ignoring. id=0x2c43dd4b9fd860b2 parent=0x0834fb9253e33ec9ecc641f08d4673f8d28d2e74e657d0bafee609223a7b8bfe

then unknown block at

13:28:24.566

curious why these are so far apart. The second logs are preceded by:

Block is already canonical, ignoring. block_hash=0x0834fb9253e33ec9ecc641f08d4673f8d28d2e74e657d0bafee609223a7b8bfe

It appears reth gets the same FCU twice.

I need to check the timeouts of the payload building job

tersec · 2023-11-29T17:42:05Z

This is also a Nimbus bug, though I'm not yet certain what the spec says about this, so this uncovered something useful on the Nimbus end too.

Nimbus is sending the same fcU twice, and the reason why is also explains why it'd stop in Deneb. It caches the PayloadAttributes it used in the "lookahead" forkchoiceUpdated (the first one in each of the pairs, immediately upon setting the parent block to head), then compares this cached information when it's about to request a payload with what it knows about the payload attributes then.

The trouble is that it fills in a placeholder parent beacon block root (Deneb-only payload attribute) for previous forks, as the zero hash. This never matches the actual beacon block root later on, so it sends a duplicate forkchoiceUpdated.

I'm not sure yet what the spec says should happen here, but regardless, it's among other things a Nimbus bug which should be fixed.

That said, it's a bit strange to respond with a payload ID and then immediately not have it available, as well.

mattsse · 2023-11-29T18:34:42Z

That said, it's a bit strange to respond with a payload ID and then immediately not have it available, as well.

agree, the reth bug was that the job's deadline was set too tight: fixed 12s starting when it is created, though this is incorrect according to the spec. This would explain why there's a race condition if the two FCU are <=12s apart and the getPayload is >12s after the first FCU

see #5626

tersec · 2023-12-01T00:43:18Z

status-im/nimbus-eth2#5635 should address this on the Nimbus side.

tersec · 2023-12-06T15:56:28Z

This has been resolved.

tersec added C-bug An unexpected or incorrect behavior S-needs-triage This issue needs to be labelled labels Nov 29, 2023

mattsse mentioned this issue Nov 29, 2023

chore: add more payload tracing #5624

Merged

mattsse mentioned this issue Nov 29, 2023

fix: derive job deadline from attributes timestamp #5626

Merged

tersec mentioned this issue Dec 1, 2023

stop checking beacon parent root for fcU caching status-im/nimbus-eth2#5635

Merged

tersec closed this as completed Dec 6, 2023

DaniPopes removed the S-needs-triage This issue needs to be labelled label Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`nimbus-reth-1` on `devnet-12` sporadically getting `"code": -38001` from `engine_getPayloadV2` #5623

`nimbus-reth-1` on `devnet-12` sporadically getting `"code": -38001` from `engine_getPayloadV2` #5623

tersec commented Nov 29, 2023

mattsse commented Nov 29, 2023

tersec commented Nov 29, 2023

mattsse commented Nov 29, 2023 •

edited

Loading

tersec commented Dec 1, 2023

tersec commented Dec 6, 2023

nimbus-reth-1 on devnet-12 sporadically getting "code": -38001 from engine_getPayloadV2 #5623

nimbus-reth-1 on devnet-12 sporadically getting "code": -38001 from engine_getPayloadV2 #5623

Comments

tersec commented Nov 29, 2023

Describe the bug

Steps to reproduce

Node logs

Platform(s)

What version/commit are you on?

What database version are you on?

What type of node are you running?

What prune config do you use, if any?

If you've built Reth from source, provide the full command you used

Code of Conduct

mattsse commented Nov 29, 2023

tersec commented Nov 29, 2023

mattsse commented Nov 29, 2023 • edited Loading

tersec commented Dec 1, 2023

tersec commented Dec 6, 2023

`nimbus-reth-1` on `devnet-12` sporadically getting `"code": -38001` from `engine_getPayloadV2` #5623

`nimbus-reth-1` on `devnet-12` sporadically getting `"code": -38001` from `engine_getPayloadV2` #5623

mattsse commented Nov 29, 2023 •

edited

Loading