Skip to content

Commit

Permalink
EIP-2464: eth/65: transaction annoucements and retrievals
Browse files Browse the repository at this point in the history
  • Loading branch information
karalabe committed Jan 13, 2020
1 parent 6f5a49e commit 8ea8691
Showing 1 changed file with 74 additions and 0 deletions.
74 changes: 74 additions & 0 deletions EIPS/eip-2464.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
---
eip: 2464
title: "eth/65: transaction annoucements and retrievals"
author: Péter Szilágyi <peterke@gmail.com>, Gary Rong <garyrong0905@gmail.com>
discussions-to: https://github.com/ethereum/EIPs/issues/2465
status: Draft
type: Standards Track
category: Networking
created: 2020-01-13
requires: 2365
---

## Abstract

The `eth` network protocol has two ways to propagate a newly mined block: it can be broadcast to a peer in its entirety (via `NewBlock (0x07)` [in `eth/64` and prior](https://github.com/ethereum/devp2p/blob/master/caps/eth.md)) or it can be announced only (via `NewBlockHashes (0x01)`). This duality allows nodes to do the high-bandwidth broadcasting (10s-100s KB) for a logarithmic number of peers; and the low-bandwidth announcing (10s-100s B) for the remaining linear number of peers. The logarithmic broadcast is enough to reach all well connected nodes, but the linear announce is needed to get across degenerate topologies. This works well.

The `eth` protocol, however, does not have a similar dual mechanism for propagating transactions, so nodes need to rely on broadcasting (via `Transactions (0x02)`). To cater for degenerate topologies, transactions cannot be broadcast logarithmically, rather they need to be transferred linearly to all peers. With N peers, each node will transfer the same transaction N times (counting both directions), whereas 1 would be enough in a perfect world. This is a significant waste.

A similar issue arises when a new network connection is made between two nodes, as they need to sync up their transaction pools, but the pool is just a soup of dangling transactions. Without a way to deduplicate transactions remotely, each node is forced to naively transfer their entire list of transactions to the other side. With pools containing thousands of transactions, a naive transfer amounts to 10s-100s MB, most of which is useless. There is no better way, however.

This EIP introduces two additional message types into the `eth` protocol (releasing a new version, `eth/65`): `NewPooledTransactionHashes (0x08)` to announce a set of transactions without their content; and `GetPooledTransactions (0x09)` to request a batch of transactions by their announced hash. This permits reducing the bandwidth used for transaction propagation from linear complexity in the number of peers to logarithmic one; and also reducing the initial transaction exchange from 10s-100s MB to `len(pool) * 32B ~= 128KB`.

## Motivation

With transaction throughput (and size) picking up in Ethereum, transaction propagation is the current dominant component of the used network resources. Most of these resources are however wasted, as the `eth` protocol does not have a mechanism to deduplicate transactions remotely, so the same data is transferred over and over again across all network connections.

This EIP proposes a tiny extension to the `eth` protocol, which permits nodes to agree on the set of transactions that need to be transferred across a network connection, before doing the costly exchange. This should help reduce the global (operational) bandwidth usage of the Ethereum network by at least an order of magnitude.

## Specification

Add two new message types to the `eth` protocol:
* `NewPooledTransactionHashes (0x08): [hash_0: B_32, hash_1: B_32, ...]`
* Specify one or more transactions that have appeared in the network and which have **not yet been included in a block**. To be maximally helpful, nodes should inform peers of all transactions that they may not be aware of.
* There is **no protocol violating hard cap** on the number of hashes a node may announce to a remote peer (apart from the 10MB `devp2p` network packet limit), but 4096 seems a sane chunk (128KB) to avoid a single packet hogging a network connection.
* Nodes should only announce hashes of transactions that the remote peer could reasonably be considered not to know, but it is better to be over zealous than to have a nonce gap in the pool.
* `GetPooledTransactions (0x09): [hash_0: B_32, hash_1: B_32, ...]`
* Specify one or more transactions to retrieve from a remote peer's **transaction pool**.
* There is **no protocol violating hard cap** on the number of transactions a node may request from a remote peer (apart from the 10MB `devp2p` network packet limit), but the recipient may enforce an arbitrary cap on the reply (size or serving time), which **must not** be considered a protocol violation.
* If a node decides to return only a subset of the requested transactions, those **must** be the prefix of the requested hash list to permit the requestor to differentiate between missing and deliberately not transferred transactions.
* A peer may respond with an empty reply **iff** none of the hashes match transactions in its pool. It is allowed to announce a transaction that will not be served later if it gets included in a block in between.

## Rationale

**Q: Why limit `GetPooledTransactions` to retrieving items from the pool?**

Apart from the transaction pool, transactions in Ethereum are always bundled together by the hundreds in block bodies and existing network retrievals honor this data layout. Allowing direct access to individual transactions in the database has no actionable use case, but would expose costly database reads into the network.

For transaction propagation purposes there is no reason to allow disk access, as any transaction finalized to disk will be broadcast inside a block anyway, so at worse there is a few hundred millisecond delay when a node gets the transaction.

Block propagation may be made a bit more optimal by transferring the contained transactions on demand only, but that is a whole EIP in itself, so better relax the protocol when all the requirements are known and not in advance. It would probably be enough to maintain a set of transactions included in recent blocks in memory.

**Q: Should `NewPooledTransactionHashes` deduplicate from disk?**

Similarly to `GetPooledTransactions`, `NewPooledTransactionHashes` should also only operate on the transaction pool and should ignore the disk altogether. During healthy network conditions, a transaction will propagate through much faster than it's included in a block, so it will essentially be non-existent that a newly announced transaction is already on disk. By avoiding disk deduplication, we can avoid a DoS griefing by remote transaction announces.

If we want to be really correct and avoid even the slightest data race when deduplicating announcements, we can use the same recetly-included-transactions trick that we discussed above to discard announcements that have recently become stale.

## Backwards Compatibility

This EIP extends the `eth` protocol in a backwards incompatible way and requires rolling out a new version, `eth/65`. However, `devp2p` supports running multiple versions of the same wire protocol side-by-side, so rolling out `eth/65` does not require client coordination, since non-updated clients can keep using `eth/64`.

This EIP does not change the consensus engine, thus does _not_ require a hard fork.

## Test Cases

TODO

## Implementation

Geth: https://github.com/ethereum/go-ethereum/pull/20234

## Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).

0 comments on commit 8ea8691

Please sign in to comment.