-
Notifications
You must be signed in to change notification settings - Fork 1.7k
RPC randomly hangs for no reason #10344
Comments
It's currently a hard blocked for me because I have a hard deadline in two weeks when my contract ends. If I can't get parity working until this date then I just don't know what to do... Probably try |
@Pzixel I don't think no-persistent-txqueue will help for your issue - can you share any of the Specifically things like the |
I only wanted to say that the only thing persisted between restart is My current [parity]
chain = "/parity/config/chain.json"
[rpc]
interface = "0.0.0.0"
cors = ["all"]
hosts = ["all"]
apis = ["web3", "eth", "net", "parity", "traces", "rpc", "personal", "parity_accounts", "signer", "parity_set"]
[network]
bootnodes = [
"enode://147573f46fe9f5cc38fbe070089a31390baec5dd2827c8f2ef168833e4d0254fbee3969a02c5b9910ea5d5b23d86a6ed5eabcda17cc12007b7d9178b6c697aa5@172.16.0.10:30303",
"enode://1412ee9b9e23700e4a67a8fe3d8d02e10376b6e1cb748eaaf8aa60d4652b27872a8e1ad65bb31046438a5d3c1b71b00ec3ce0b4b42ac71464b28026a3d0b53af@172.16.0.11:30303",
"enode://9076c143a487aa163437a86f7d009f257f405c50bb2316800b9c9cc40e5a38fef5b414a47636ec38fdabc8a1872b563effa8574a7f8f85dc6bde465c368f1bf5@172.16.0.12:30303"
]
[account]
password = ["/parity/authority.pwd"]
[mining]
reseal_on_txs = "none"
gas_floor_target = "0x165A0BC00"
tx_queue_size = 16384
tx_queue_mem_limit = 1024 |
Can you perhaps try setting |
Gonna try it out. I see it might speed up things a bit, and I see it unlikely be the solution. However, is a good spot indeed. |
@Pzixel Are you running with pruning (i.e. default settings)? Could you re-run with I suspect it might be because of some underyling DB compaction that happens. Can you check the load on the machine (CPU and IO) and what threads are doing work? |
I didn't recall this setting so I do believe it's default. I'l retry with all logging on and will be monitoring resources as well. Brb when done. Thank you for attention. P.S. When I run with
Is it intended? |
With
RPC responds nothing (no faliures/etc), the entire process is just hanged. Maybe it's something like invalid client-side signing? |
You can merge multiple logging flags like this: |
It's a client side nonce calculation as suggested in #8829 (comment). It was never an issue, but I suspect myself my library that perform a nonce generation. Need deeper investigation, so do I. |
I see tons of request/response in logs
It looks like it generated several transactions and then trying to poll their receipts that never gets created Code that gets executed (for reference): var receipt = await createRequest.SendDefaultTransactionAndWaitForReceiptAsync(params); Where Where |
How I see it:
Transaction queue is empty, so this process is infinte. P.S. I actually see several hundreds of txs in the tx queue but it doesn't get smaller. |
@Pzixel that's quite possible. If the Having a huge number of such requests will definitely affect performance of other requests. We need to figure out WHY the transactions were lost (either because of some nonce issue or a bug). Could you please collect more logs so that we can analyze a path of one particular transaction that get's lost? We need info:
This will require collecting logs with |
@tomusdrw right, I had the same thoughts :) Unfortunately when I do I'l send them the monday, if you don't mind. |
@Pzixel yeah, you need to find transaction hash (maybe log it in your application) and then |
I see following
So getting |
@Pzixel Right, sorry. I incorrectly thought that rececipts are available for pool transactions as well and pool data is only considered for |
So it still doesn't solve the original issue, could you provide some logs from the time the hang actually happens? |
I'm not sure. For example here is my txs sent in 200tx batches. Surrounded logs were trimmed
You can see that tx was in the queue for 1 minute but wasn't mined. It may be a nonce problem but I'm not sure how to check it. |
I provide the entire log so you could grep it yourself if you want. |
Yeah, I can clearly see that the transaciton is not considered for pending block, but since there is a lot of logs trimmed I can't figure out what actually happened with it. The issue is that, what I see is:
What I want to know:
EDIT: just noticed the full logs, analysing now. |
@Pzixel what pool flags do you have? It seems that you are reaching per-sender limit and the transactions replace each other in the pool: Please make sure that |
It happens because transactions are received out of order (most likely multiple json-rpc server threads), which causes a transaction with
|
Interestingly since it seems the transactions are local, after #9002 local transactions should be accepted above all possible limits, so it needs some investigation why it's being dropped. As a workaround running with some high value of |
Yes, I see. It seems that with client-side nonce management you are almost guaranteed to loose a tx. And if you loose any then your mining hangs because client doesn't expect tx to suddenly dissapear, and infinite loop occurs. I just tried to run batch 1000 with I could configure it via TOML |
|
I ran it with batch size 1000 (which was almost always failing before) and Thank you for your great analysis. I close it for now and I'l come back if it won't help. My infinite gratitude is yours (personally ❤️ :) ) P.S. I hope #10324 will get some attention :) |
@tomusdrw sorry, it seems that it's not working yet. As I said, symptoms were different and it's indeed wasn't fixed by the queue length. I started a stress test several hours ago and it failed with status code 134 when it was running as docker |
Nope, it's completely different thing. Or maybe I should create another issue. My guess is that parity shutdowns under load in some cases.
|
@Pzixel yeah, let's re-open a new issue if it's unrelated to this one |
Got it. I'm going to reproduce this once again, and if so I'l open the new issue |
For some reason after several hours of intensive block generations parity hangs for no reason.
Friday night I ran my batcher, that just call some contract in batches (50 items in batch).
Here are my logs:
As you can see, it worked fine for almost 10 hours. My setup is 5 sec so it took 10 sec (or two blocks) to insert the entire batch.
Then things get weirder, becasue now it takes almost an hour to insert the same amout of data in blockchain (I used "retry forever" policy with 30 sec awaiting between calls).
When I go to parity logs I see:
Nothing more appears, it just writes it and in 20 sec my client fails with RPC timeout. Then it retries and I get another bunch of logs, then it again writes nothing more, repeat.
I restarted all nodes, I tried to leave single node to prevent desync, I double checked system time. I don't know what can I do.
Here is a code that gets called:
edit I tried another contract and had exactly same behaviour. It seems that actual code doesn't matter. Maybe some kind of deadlock?
It just does a lookup in the map and then insert one item in array. Nothing complicated that has to take hours to complete.
I provide logs and all info I possess. If I can do anything else I'd be rigorous in providing all required additional information. It just randomly stops to work with no sane reason. It seems that it depend on database somehow, because I restarted service with
--no-persistent-queue
, but still have the same behaviour. I already did all kind of optimizations: client-side signing and so on, but it still doesn't work.cc @tomusdrw , could you take a look, please?
The text was updated successfully, but these errors were encountered: