Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eth RPC: Address performance of EthGetTransactionCount #10538

Closed
maciejwitowski opened this issue Mar 22, 2023 · 8 comments · Fixed by #10700
Closed

Eth RPC: Address performance of EthGetTransactionCount #10538

maciejwitowski opened this issue Mar 22, 2023 · 8 comments · Fixed by #10700
Assignees
Labels
area/api kind/enhancement Kind: Enhancement P1 P1: Must be resolved

Comments

@maciejwitowski
Copy link
Contributor

We identified EthGetTransactionCount as one of the hotspots on Glif production nodes. The goal of this issue is to investigate the reason and fix it.

image

@jennijuju
Copy link
Member

@snissn i think we should look into MpoolGetNonce - which exchanges have been reporting a 6-8 sec slow response. If we can narrow down the issue there - it should improve eth get tx count too

@snissn
Copy link
Contributor

snissn commented Mar 23, 2023

i'm having trouble reproducing on master -- i'm running a lotus node and pummeling the node with the bash script below the response time is between 20 and 50 ms. Could there just be other slow endpoints causing their performance issues but eth_getTransactionCount is fast?

while true; do
        {
                time curl  http://localhost:1234/rpc/v1  -X POST -H "Content-Type: application/json" --data '{"method":"eth_getTransactionCount","params":["0x7C2540eE20773995a7304A2ab6fd785cBBef8354", "latest"],"id":1,"jsonrpc":"2.0"}'
        } 2>&1 |grep real|cut -f2 -dm |cut -f1 -ds
sleep 0.1
done

@jennijuju
Copy link
Member

jennijuju commented Mar 23, 2023

We might need to spin up a local net, then send L many messages and having them in mpool, and then call the api with the same account multiple times to repro

@snissn
Copy link
Contributor

snissn commented Mar 23, 2023

this is interesting - an LRU cache for nonce with only 256 slots in it https://github.com/filecoin-project/lotus/blob/master/chain/messagepool/messagepool.go#L374

@Stebalien
Copy link
Member

So, the issue is that

act, err := mp.api.GetActorAfter(addr, ts)
computes the tipset under a lock.

The answer is to not call GetActorAfter. Instead, we need to look at the tipset iself:

  1. Check if the nonce is cached.
  2. If not, load the tipset messages, look for the relevent message, and fill the cache with all nonces found.
  3. Check the state before the tipset (i.e., GetActor, not GetActorAfter).
  4. Update the cache.

At the same time, we'll definitely want to increase the size of this cache. Also, for the current head, we could pre-cache in HeadChange. We already process all messages to update the message pool, so this operation should be basically free.

@maciejwitowski
Copy link
Contributor Author

Thanks for coming up with this @Stebalien! ❤️

@Stebalien
Copy link
Member

I've tested this with @snissn's test above and it pauses for 10 seconds every so often, so it's clearly related to tipset computation.

@maciejwitowski
Copy link
Contributor Author

Deploying to Glif canary nodes today to get production data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api kind/enhancement Kind: Enhancement P1 P1: Must be resolved
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants