-
Notifications
You must be signed in to change notification settings - Fork 534
Running the chain with IBFT PoA: the chain crashes while spamming many transactions #1850
Comments
wonder if you've tried this with polybft? @JDawg287 |
Hey, buddy :) Did you try the pandoras-box to test your nodes? It is a pretty decent tool and has EOA, ERC-20, and ERC-721 modes 💯 However, I have a custom polygon-edge node implementation, and the node is configured to have like 20 transactions per batch. When I test with 1000-1400 transactions, it works fine, but when I increase it to 2000 or more, then some start to get stuck in the TX pool and I receive the following errors:
It seems like I have a similar problem to yours. Did you find a way to handle the issue? Another strange thing is that I achieve 120 TPS, but I see in the Polygon-edge's docs they have nearly 2.5k TPS. |
What impl are you using? Did you change the --max-enqueued-transactions on the server run params? |
Nah, it uses the default one. I suppose increasing it will do the trick, but what is the recommended number of max enqueued txs? |
No reference to that in the documentation , but we've been using it with pretty high values. Again, what implementation are you using? |
It is a modified version of the polybft. If this is what you are asking. |
Is it publicly available? |
Yes, it is a public repo already - https://github.com/Hydra-Chain/hydragon-node Btw, I have just checked that the default enqueue transactions is 128, atm. |
Running the chain with IBFT PoA: the chain crashes while spamming many transactions
Description
While stress testing the Polygon Edge client
v1.1.0
, I came across a strange behaviour. To explain a bit I want to use a private network using the IBFT PoA consensus protocol. In order to assess the stability of the Polygon Edge client under stress, I made a network using AWS EC2 instances and made a script to spam EOA to EOA transactions to the network.Environment
v1.1.0
Chain Specs
Setup
The setup is fairly simple. I am running 4 validator nodes and each node has 1000 secrets generated for it. I premine an amount for each Ethereum address (for the secrets) in the
genesis.json
. This is to prevent having many transactions in the mempool with different nonces for any single Ethereum address at a single time.Here is what the setup looks like:
The script to spam the transactions is as follows:
Each validator node is spammed with transactions using a single HTTP connection and the transactions are sent via JSON RPC. This is to simulate a real use case where potentially the network would need to handle many transactions at the same time. The idea is to keep spamming transactions at a steady rate. The script also takes care of managing the nonce for each address since
eth_sendTransaction
method is not supported by the Polygon Edge client as of yet.I start the test by firing up the nodes and letting the network produce some blocks. Note: If I let the chain run the chain keeps growing indefinitely.
Then I set the environment variables which are stored in a
.env
file.After that, I run the
send_batch.js
script on each node using the following command:Where the
secrets-*
represents the file containing secrets to be used by each node (e.g.secrets-1.json
for validator 1,secrets-2.json
for validator 2 etc.). Thesend_batch.js
script has a timeout to stop the machine from hanging.Expected behaviour
The script works fine and I am able to predict the output throughput (TPS) when the
BATCH_SIZE
is kept low. I can calculate spamming with aBATCH_SIZE
of a certain amount, would produce a certain number of transactions in a block. For e.g. setting theBATCH_SIZE
to 250 and theTIMEOUT_BATCH_SEND
to 3000 (3s - same as the block time), I can calculateWhich can be seen from the data I collected:
Note: ignoring the batch creation time in this case since it is insignificant
Now the problem arises when I increase the
BATCH_SIZE
. With a biggerBATCH_SIZE
, I also account for the batch creation time since it takes longer for each batch to be created. For this particular scenario, I increased theBATCH_SIZE
to 3000 and added 2.2s to the timeout (which can be calculated from the script above). I was expecting around 6900 transactions in a block (which the Polygon Edge chain should be able to handle by looking at the old tests from your team here), but I was unable to see that number. I was also getting empty blocks for some reason. After letting the chain run for a while, it seizes to produce any new blocks. I started noticing this behaviour fromBATCH_SIZE
2500 and above seemingly at random block heights.Any clue as to why there are no new blocks under load? I also tried to change the block time to 6 seconds but the behaviour is more or less the same.
Logs
The logs from one of the validators can be found here. Had to upload it separately as the file is too large.
The text was updated successfully, but these errors were encountered: