[P2P] known TxOrphanage problems

TxOrphanage was not designed to be a super reliable data structure, so these are not necessarily "bugs", but definitely limitations that we can improve upon.

Easy Churnability

We evict randomly. This means an attacker can make it more likely for us to drop useful orphans simply by sending us large volumes of orphans (which can be transactions that spend nonexistent UTXOs).

Some solutions we've discussed:

Limiting the amount of in-flight orphan resolutions or orphanage usage per peer. We didn't like this idea since transactions, blocks, and likely orphan resolutions are usually disproportionately provided to us by the same few outbound peers. So an in-flight limit would likely slow things down in the average case. Draft.
Changing our eviction algorithm to be more clever and try to load-balance across peers. IIRC I started implementing this but found it a bit too complicated.
For each peer, protecting a certain amount of the orphans provided by them. We can do this via some token bucket scheme in which each peer gets some number of "tokens" (a counter in a std::map), each good for X bytes of orphanage storage. The "tokens" would be returned for successful orphan acceptance, not returned for invalid orphans, and replenished at some slow rate to a certain ceiling. Draft in #27742.

One per txid

after we have placed a transaction into our orphanage, any tx that has the same txid is considered "already in orphanage" and we won't add it. All 1p1c submissions are composed of a parent and a transaction taken from the orphanage. This means an attacker could send us a mutated version of a child transaction in a 1p1c package and we'd fail to accept the parent+child package until that transaction is evicted from orphanage.

Solution: check (and erase) by wtxid. See #30000

No effort to retry parent fetching

When we have an orphan transaction, we request missing parents from the sender and hope that they send it to us. If we get notfound or no response for the parents, we make no effort to try again. This means that an attacker can block us from hearing about the parent in a 1p1c package by being the first sender of the child, unless our other peers also announce it.

Solution: retry with the other peers who can provide us the parent (presumably that is everyone who announced the orphan, since they must have had the parent in order to validate the child). This brings us to the next limitation...

Only tracking 1 fromPeer

We only track 1 fromPeer for each orphan entry. When we add a tx to orphanage, we just fill in fromPeer with the NodeId of the peer who sent it. We don't add the other announcers of the tx. Similarly, if we receive an inv for a tx in orphanage, we just drop it and don't track that this peer potentially knows the parent.

When we decide to request an orphan's missing parents, we call m_txrequest.ForgetTxHash (so that we don't continue requesting/downloading this transaction that we've already seen and are keeping in orphanage). So we have also forgotten about those announcers if/when we get to the point where we have failed to resolve the orphan's missing inputs with the first peer.

See #28031

No effort to try all package combinations

The 1p1c logic gathers all candidate children from the orphanage and tries submitting a maximum of 1 before dropping it on the floor. Limiting to 1 package is good to bound computation, but we may fail to accept a package if there are a lot of other candidates. This means an attacker may try to block us from accepting a package by sending many invalid children of the parent transaction.

Solution: add a work queue for 1p1cs. This requires storing the parent transaction somewhere (probably also in orphanage) and bounding the memory used for this purpose.

Provide feedback

Saved searches