Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(hyp-ethereum): gas escalator middleware #3852

Merged
merged 14 commits into from
Jul 3, 2024

Conversation

daniel-savu
Copy link
Contributor

Description

Wraps evm providers in a gas escalator middleware as part of the build_with_signer step, so signed txs can have their gas price bumped if stuck in the mempool.

Gas price is increased every 60s by 12.5%, and is capped at the max_fee_per_gas set in tx overrides.

Drive-by changes

  • changes to our fork of ethers:
    • removes the M: Clone bound, since it's not actually required and conflicts with the middleware we use
    • increases the level of the escalated log in the middleware to debug (so we can see it working in prod without enabling trace)

Related issues

Backward compatibility

Yes

Testing

e2e

@daniel-savu daniel-savu requested a review from tkporter as a code owner May 27, 2024 17:55
Copy link

changeset-bot bot commented May 27, 2024

⚠️ No Changeset found

Latest commit: ee9910f

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

codecov bot commented May 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 0.00%. Comparing base (befc38d) to head (ee9910f).
Report is 7 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (befc38d) and HEAD (ee9910f). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (befc38d) HEAD (ee9910f)
2 1
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #3852       +/-   ##
==========================================
- Coverage   60.56%   0.00%   -60.57%     
==========================================
  Files         129       1      -128     
  Lines        1656      14     -1642     
  Branches      176       0      -176     
==========================================
- Hits         1003       0     -1003     
+ Misses        626      14      -612     
+ Partials       27       0       -27     
Components Coverage Δ
core ∅ <ø> (∅)
hooks ∅ <ø> (∅)
isms ∅ <ø> (∅)
token ∅ <ø> (∅)
middlewares ∅ <ø> (∅)

Copy link
Collaborator

@tkporter tkporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After diving a bit deeper, imo like many things in ethers, it feels like not many people actually use these middlewares in production / long-running services, and that it may need some work to meet our bar

The worst thing is that it doesn't seem to have EIP-1559 support at all (see #3852 (comment)), and it returns errors if you try to use this middleware with 1559 txs. This feels like a dealbreaker imo in its current state - either we'd need to change it to just ignore 1559 txs but still allow them, or we'd need to change it to accommodate 1559 txs.

Though pls correct me if I'm wrong here because I would've expected e2e to fail if this were the case? Maybe the transactions submitted by the relayer on e2e are all legacy transactions?

Some other things that make me nervous:

  1. In our fork, it holds the lock across potentially a lot of async operations, which feels questionable: https://github.com/hyperlane-xyz/ethers-rs/blob/master/ethers-middleware/src/gas_escalator/mod.rs#L177

In upstream this seems to have been fixed fwiw https://github.com/gakonst/ethers-rs/blob/master/ethers-middleware/src/gas_escalator/mod.rs

  1. The task can silently error https://github.com/hyperlane-xyz/ethers-rs/blob/master/ethers-middleware/src/gas_escalator/mod.rs#L223 (I think it's still possible to hit this edge case Investigate ethers-rs middleware for "stuck" txs #2959 (comment)). Upstream at least it indicates that's happening https://github.com/gakonst/ethers-rs/blob/master/ethers-middleware/src/gas_escalator/mod.rs#L343

  2. We'd probably want some way to test things to certainly hit the code path that results in a gas escalation - not sure if you did this, but being able to test this in e2e or at least ad-hoc would be nice to derisk things

@daniel-savu
Copy link
Contributor Author

E2E fails because validators can't send the announce txs. The error is invalid chain id for signer, originating here I think, since e2e txs are using EIP-1559.

This means that the chain ID in the txData is different from the chain ID in the signature for some reason. Should look into why

daniel-savu added a commit to hyperlane-xyz/ethers-rs that referenced this pull request Jun 27, 2024
Fixes some issues @tkporter brought up in
hyperlane-xyz/hyperlane-monorepo#3852 (review):
1. Holding mutex lock across gas escalations (fixed upstream, I just
merged the changes)
2. Silently ignoring nonce related errors: added a `warn` log. These
error are only expected to occur if the original tx landed onchain
though
3. I ran the unit test that comes with this middleware, and similar to
what Trevor pointed out, sending a tx returns a `PendingTransaction`
whose tx hash will not reflect the hash of the tx that lands onchain.
However I manually checked by polling anvil and emitting logs that the
gas is escalated and the final tx is finalized
Copy link
Collaborator

@tkporter tkporter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some tiny suggestions

@daniel-savu daniel-savu enabled auto-merge July 3, 2024 11:15
@daniel-savu daniel-savu added this pull request to the merge queue Jul 3, 2024
Merged via the queue into main with commit c3c002a Jul 3, 2024
37 of 38 checks passed
@daniel-savu daniel-savu deleted the dan/gas-escalator-middleware branch July 3, 2024 12:10
daniel-savu added a commit that referenced this pull request Jul 4, 2024
github-merge-queue bot pushed a commit that referenced this pull request Jul 4, 2024
…4098)

This reverts commit c3c002a.

### Description

The gas escalator took up to 28gb of memory, and even though it was only
ever released on RC, it must have taken up too many cluster resources,
ending up starving the `hyperlane` relayer. Reverting for now until we
have a better idea

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

<!--
- Fixes #[issue number here]
-->

### Backward compatibility

<!--
Are these changes backward compatible? Are there any infrastructure
implications, e.g. changes that would prohibit deploying older commits
using this infra tooling?

Yes/No
-->

### Testing

<!--
What kind of testing have these changes undergone?

None/Manual/Unit Tests
-->
github-merge-queue bot pushed a commit that referenced this pull request Dec 3, 2024
### Description

Follow up to
#3852 and
#4098.

This PR brings major fixes to the escalator:
- A revamped escalator implementation, that was forked from ethers-rs
and had various fixes applied to it. The most notable bug caused
"infinite" gas price escalations to occur within a very short timespan,
until the max configured gas price was reached. Full diff here:
hyperlane-xyz/ethers-rs@950dd34...edf703a
- Support for escalating EIP-1559 txs, in addition to the existing
Legacy tx type support
- Memory leak fix. Since each escalator instance spawns a thread and we
create transient providers for each message, this created a memory leak
of escalator tasks that would build up over time - on RC, memory usage
had reached 27gb before we noticed. This PR merges in the changes from
#4208 to stop
the leak. Profiling results:
- before the fix (62mb peak usage): ![Screenshot_2024-07-29_at_12 19
26](https://github.com/user-attachments/assets/5b52fecb-6733-481c-96fb-10f505b9bf8a)
- after the fix (16mb peak usage): ![Screenshot_2024-07-29_at_12 19
53](https://github.com/user-attachments/assets/a78b9452-78cb-4b5b-9aa6-fe7f3a6ac683)

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

- Fixes #4108

### Backward compatibility

Yes

### Testing

Manual testing using the unit test setup from ethers-rs, and using it on
RC to observe it in the wild. Prom chain returns a `could not replace
existing tx` error when a tx is resubmitted so the escalator won't be
effective there, but this can be fixed later

---------

Co-authored-by: Trevor Porter <tkporter4@gmail.com>
tiendn pushed a commit to tiendn/hyperlane-monorepo that referenced this pull request Jan 11, 2025
### Description

Follow up to
hyperlane-xyz#3852 and
hyperlane-xyz#4098.

This PR brings major fixes to the escalator:
- A revamped escalator implementation, that was forked from ethers-rs
and had various fixes applied to it. The most notable bug caused
"infinite" gas price escalations to occur within a very short timespan,
until the max configured gas price was reached. Full diff here:
hyperlane-xyz/ethers-rs@950dd34...edf703a
- Support for escalating EIP-1559 txs, in addition to the existing
Legacy tx type support
- Memory leak fix. Since each escalator instance spawns a thread and we
create transient providers for each message, this created a memory leak
of escalator tasks that would build up over time - on RC, memory usage
had reached 27gb before we noticed. This PR merges in the changes from
hyperlane-xyz#4208 to stop
the leak. Profiling results:
- before the fix (62mb peak usage): ![Screenshot_2024-07-29_at_12 19
26](https://github.com/user-attachments/assets/5b52fecb-6733-481c-96fb-10f505b9bf8a)
- after the fix (16mb peak usage): ![Screenshot_2024-07-29_at_12 19
53](https://github.com/user-attachments/assets/a78b9452-78cb-4b5b-9aa6-fe7f3a6ac683)

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

- Fixes hyperlane-xyz#4108

### Backward compatibility

Yes

### Testing

Manual testing using the unit test setup from ethers-rs, and using it on
RC to observe it in the wild. Prom chain returns a `could not replace
existing tx` error when a tx is resubmitted so the escalator won't be
effective there, but this can be fixed later

---------

Co-authored-by: Trevor Porter <tkporter4@gmail.com>
tiendn pushed a commit to tiendn/hyperlane-monorepo that referenced this pull request Jan 11, 2025
### Description

Follow up to
hyperlane-xyz#3852 and
hyperlane-xyz#4098.

This PR brings major fixes to the escalator:
- A revamped escalator implementation, that was forked from ethers-rs
and had various fixes applied to it. The most notable bug caused
"infinite" gas price escalations to occur within a very short timespan,
until the max configured gas price was reached. Full diff here:
hyperlane-xyz/ethers-rs@950dd34...edf703a
- Support for escalating EIP-1559 txs, in addition to the existing
Legacy tx type support
- Memory leak fix. Since each escalator instance spawns a thread and we
create transient providers for each message, this created a memory leak
of escalator tasks that would build up over time - on RC, memory usage
had reached 27gb before we noticed. This PR merges in the changes from
hyperlane-xyz#4208 to stop
the leak. Profiling results:
- before the fix (62mb peak usage): ![Screenshot_2024-07-29_at_12 19
26](https://github.com/user-attachments/assets/5b52fecb-6733-481c-96fb-10f505b9bf8a)
- after the fix (16mb peak usage): ![Screenshot_2024-07-29_at_12 19
53](https://github.com/user-attachments/assets/a78b9452-78cb-4b5b-9aa6-fe7f3a6ac683)

### Drive-by changes

<!--
Are there any minor or drive-by changes also included?
-->

### Related issues

- Fixes hyperlane-xyz#4108

### Backward compatibility

Yes

### Testing

Manual testing using the unit test setup from ethers-rs, and using it on
RC to observe it in the wild. Prom chain returns a `could not replace
existing tx` error when a tx is resubmitted so the escalator won't be
effective there, but this can be fixed later

---------

Co-authored-by: Trevor Porter <tkporter4@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate ethers-rs middleware for "stuck" txs
2 participants