Skip to content
This repository has been archived by the owner on May 11, 2024. It is now read-only.

Current Capacity is not working properly #404

Closed
davaymne opened this issue Sep 22, 2023 · 19 comments
Closed

Current Capacity is not working properly #404

davaymne opened this issue Sep 22, 2023 · 19 comments
Assignees

Comments

@davaymne
Copy link

What: Prover concurrently is generating proofs for more blocks as it was supposed to and status API returns wrong numner of available capacity.

Config:

$ more .env | grep ZKEVM_CHAIN_INSTANCES_NUM
ZKEVM_CHAIN_INSTANCES_NUM=3

From log we can see that there are 5 ongoing proof generating (it was supposed to be max 3).
Log:
docker logs -f simple-taiko-node-taiko_client_prover_relayer-1

INFO [09-22|21:38:05.006] Proof generating                         height=47463 time=8m50.443885432s
INFO [09-22|21:38:08.931] Proof generating                         height=47547 time=6m10.430408028s
INFO [09-22|21:38:08.966] Proof generating                         height=47461 time=11m31.820360114s
INFO [09-22|21:38:09.379] Proof generating                         height=47550 time=2m20.105616946s
INFO [09-22|21:38:10.749] Proof generating                         height=47549 time=4m50.883719328s

Status returns incorrect number of available capacity (it was supposed to be 0).

Status:
$ curl http://172.23.0.7:9876/status

{"minProofFee":10,"maxExpiry":3600,"currentCapacity":1}
@davaymne
Copy link
Author

davaymne commented Sep 22, 2023

My prover keeps starting generating proofs for proposed blocks for "external" proposers while it is not accepting new assign requests on port 9876 (closed).

INFO [09-22|21:59:42.867] Proposed block information               blockID=48309 prover=0xfbfd4F6993BC0D3481B9bf61AD0892f817a2e7aC proposedAt=1,695,416,604
INFO [09-22|21:59:42.867] Proposed block is provable               blockID=48309 prover=0xfbfd4F6993BC0D3481B9bf61AD0892f817a2e7aC proofWindowExpired=false
INFO [09-22|21:59:42.907] Request proof from zkevm-chain proverd service blockID=48309 proposer=0xd1B2cdD43DcF548c1E1f63b6807ce2124c4000Bd height=48309 hash=bcabf8..061a7e
INFO [09-22|21:59:42.908] Proof generating                         height=48309 time="356.529µs"

Current height is 49086

I suspect: prover got enormous number (up to total emount of TTKOj) of assignements due to some issue, and currrently is trying to catchup with proof generating.

@davaymne
Copy link
Author

Today node experiences the same.
Status:

curl http://192.168.32.9:9876/status
{"minProofFee":10,"maxExpiry":3600,"currentCapacity":3}

Prover log:

INFO [09-23|19:32:26.734] Proof generating                         height=67865 time=7m0.739871303s
INFO [09-23|19:32:32.990] Proof generating                         height=67938 time=20.001931266s
INFO [09-23|19:32:36.738] Proof generated                          height=67865 degree=22 time=7m10.744321422s
INFO [09-23|19:32:40.642] Proof generating                         height=67956 time="303.7µs"
INFO [09-23|19:32:42.994] Proof generating                         height=67938 time=30.005771258s

Proposer log:

INFO [09-23|19:30:02.352] Prover assigned                          address=0xfbfd4F6993BC0D3481B9bf61AD0892f817a2e7aC endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,202
INFO [09-23|19:30:17.478] "📝 Propose transactions succeeded"       txs=17
INFO [09-23|19:31:25.495] Start fetching L2 execution engine's transaction pool content 
INFO [09-23|19:31:25.496] Current base fee                         fee=1
INFO [09-23|19:31:25.557] Transactions lists count                 count=1
INFO [09-23|19:31:25.560] Proposer account information             chainHead=4,349,365 nonce=12826
INFO [09-23|19:31:25.562] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,285
INFO [09-23|19:31:25.570] Prover assigned                          address=0xfbfd4F6993BC0D3481B9bf61AD0892f817a2e7aC endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,285
INFO [09-23|19:31:40.629] "📝 Propose transactions succeeded"       txs=31
INFO [09-23|19:31:52.632] Start fetching L2 execution engine's transaction pool content 
INFO [09-23|19:31:52.633] Current base fee                         fee=1
INFO [09-23|19:31:52.642] Transactions lists count                 count=1
INFO [09-23|19:31:52.644] Proposer account information             chainHead=4,349,368 nonce=12827
INFO [09-23|19:31:52.646] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,312
INFO [09-23|19:31:52.649] Prover assigned                          address=0xfbfd4F6993BC0D3481B9bf61AD0892f817a2e7aC endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,312
INFO [09-23|19:32:01.676] "📝 Propose transactions succeeded"       txs=16
INFO [09-23|19:32:42.678] Start fetching L2 execution engine's transaction pool content 
INFO [09-23|19:32:42.680] Current base fee                         fee=1
INFO [09-23|19:32:42.733] Transactions lists count                 count=1
INFO [09-23|19:32:42.734] Proposer account information             chainHead=4,349,372 nonce=12828
INFO [09-23|19:32:42.735] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,362
INFO [09-23|19:32:42.737] Prover assigned                          address=0xfbfd4F6993BC0D3481B9bf61AD0892f817a2e7aC endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,362
INFO [09-23|19:32:51.759] "📝 Propose transactions succeeded"       txs=24
INFO [09-23|19:33:07.760] Start fetching L2 execution engine's transaction pool content 
INFO [09-23|19:33:07.761] Current base fee                         fee=1
INFO [09-23|19:33:07.800] Transactions lists count                 count=1
INFO [09-23|19:33:07.801] Proposer account information             chainHead=4,349,374 nonce=12829
INFO [09-23|19:33:07.802] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,387
INFO [09-23|19:33:07.804] Prover assigned                          address=0xfbfd4F6993BC0D3481B9bf61AD0892f817a2e7aC endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,499,387
INFO [09-23|19:33:16.823] "📝 Propose transactions succeeded"       txs=29

@davaymne
Copy link
Author

time to time client returns

{"minProofFee":10,"maxExpiry":3600,"currentCapacity":4}

While it is ZKEVM_CHAIN_INSTANCES_NUM=3 configured and it is currently proving 3-4 blocks

@davaymne
Copy link
Author

davaymne commented Sep 24, 2023

I identified culprit:

As per log below we can see that capacity ReleaseOneCapacity() was called 2 times:

  • on submitProofOp() - which make sense
  • onBlockProven() - block still is being mapped as being proved (
    delete(p.currentBlocksBeingProven, proofWithHeader.Meta.Id)
    )
INFO [09-23|23:07:33.370] Release one capacity (submitProofOp)     blockID=72902
INFO [09-23|23:07:33.378] Curent capacity before release (ReleaseOneCapacity) Capacity=4
INFO [09-23|23:07:33.379] Curent capacity after release (ReleaseOneCapacity) Capacity=5
INFO [09-23|23:07:49.273] Cancel proof (onBlockProven) isValidProof BlockId=72902
INFO [09-23|23:07:49.273] Cancel proof generation for block        blockId=72902
INFO [09-23|23:07:49.273] Release one capacity (cancelProof)       blockID=72902
INFO [09-23|23:07:49.273] Curent capacity before release (ReleaseOneCapacity) Capacity=5
INFO [09-23|23:07:49.273] Curent capacity after release (ReleaseOneCapacity) Capacity=6

It is caused by #402

@davidtaikocha
Copy link
Member

Should have been fixed by #405, plz try again :) @davaymne

@Motko222
Copy link

Did not fix for me.

simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:33.422] Proof generating                         height=105,943 time=15m10.092780615s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:36.267] Proof generating                         height=105,636 time=27m40.20338758s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:36.626] Proof generating                         height=106,031 time=8m30.070991504s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:37.456] Proof generating                         height=105,921 time=15m20.1005304s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:39.135] Proof generating                         height=105,627 time=27m40.234377349s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:41.113] Proof generating                         height=105,920 time=18m30.133288472s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:41.386] Proof generating                         height=105,922 time=15m20.11519491s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:43.423] Proof generating                         height=105,943 time=15m20.093920692s

Capacity at 1

root@CHIA1:~/simple-taiko-node# curl http://localhost:9876/status
{"minProofFee":1,"maxExpiry":3600,"currentCapacity":1}

Instances set to 1

root@CHIA1:~/simple-taiko-node# cat .env | grep ZKEVM_CHAIN_INSTANCES_NUM
ZKEVM_CHAIN_INSTANCES_NUM=1

Thanks.

@davidtaikocha
Copy link
Member

Did not fix for me.

simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:33.422] Proof generating                         height=105,943 time=15m10.092780615s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:36.267] Proof generating                         height=105,636 time=27m40.20338758s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:36.626] Proof generating                         height=106,031 time=8m30.070991504s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:37.456] Proof generating                         height=105,921 time=15m20.1005304s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:39.135] Proof generating                         height=105,627 time=27m40.234377349s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:41.113] Proof generating                         height=105,920 time=18m30.133288472s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:41.386] Proof generating                         height=105,922 time=15m20.11519491s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-25|09:11:43.423] Proof generating                         height=105,943 time=15m20.093920692s

Capacity at 1

root@CHIA1:~/simple-taiko-node# curl http://localhost:9876/status
{"minProofFee":1,"maxExpiry":3600,"currentCapacity":1}

Instances set to 1

root@CHIA1:~/simple-taiko-node# cat .env | grep ZKEVM_CHAIN_INSTANCES_NUM
ZKEVM_CHAIN_INSTANCES_NUM=1

Thanks.

emmm did u restart ur simple-taiko-node?

@Motko222
Copy link

Yes, sir. It appears that ZKEVM_CHAIN_INSTANCES_NUM is respected right after start. After proof is generated, 2 proofs are started. And so on.

@0xCryptic1
Copy link

0xCryptic1 commented Sep 26, 2023

Proof generating is stuck and I have restarted 3 times. It works better for sometime and then it encounters the same error.

simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:21.659] Proof generating                         height=129,194 time=2h10m13.137740774s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:21.849] Proof generating                         height=129,245 time=1h54m22.47554411s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:21.907] Proof generating                         height=129,380 time=1h37m21.507755067s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:22.238] Proof generating                         height=129,860 time=1h8m41.956685482s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:22.590] Proof generating                         height=130,889 time=6m10.125888695s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:22.948] Proof generating                         height=129,445 time=1h31m51.59970451s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:23.062] Proof generating                         height=129,268 time=1h48m3.557507004s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:23.082] Proof generating                         height=130,805 time=13m10.331939648s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:24.413] Proof generating                         height=130,133 time=51m52.86056534s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:24.591] Proof generating                         height=129,288 time=1h42m32.441641005s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:24.698] Proof generating                         height=129,199 time=2h5m32.675979997s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:24.747] Proof generating                         height=131,108 time=1m10.009478176s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:24.801] Proof generating                         height=130,340 time=46m41.401231267s
simple-taiko-node-taiko_client_prover_relayer-1  | INFO [09-26|05:16:25.349] Proof generating                         height=130,605 time=24m50.459255812s

I have set instances to 2

cat .env | grep ZKEVM_CHAIN_INSTANCES_NUM
ZKEVM_CHAIN_INSTANCES_NUM=2

Current capacity is 5.

# curl http://localhost:9876/status
{"minProofFee":10,"maxExpiry":3600,"currentCapacity":5}

@cyberhorsey
Copy link
Contributor

Additional fix in #412 should fix issues.

@Motko222
Copy link

Motko222 commented Sep 27, 2023

Looks much better now! However, I found this case:

Screenshot 2023-09-27 at 11 02 33

It looks like when proof is cancelled, capacity is released twice.

@Motko222
Copy link

This is being logged as an error, maybe it should be a warning.
simple-taiko-node-taiko_client_prover_relayer-1 | ERROR[09-27|08:53:11.720] Block batch iterator callback error error="no prover capacity available

@davaymne
Copy link
Author

davaymne commented Sep 28, 2023

Set --proposeInterval = 1s and got issue with capacity manager, where Proposer was able to assign prover up to x10 times, before capacityManager started taking out capacity.

INFO [09-28|01:00:29.049] "📝 Propose transactions succeeded"       txs=1
INFO [09-28|01:00:30.051] Start fetching L2 execution engine's transaction pool content 
INFO [09-28|01:00:30.052] Current base fee                         fee=1
INFO [09-28|01:00:30.059] Transactions lists count                 count=1
INFO [09-28|01:00:30.060] Proposer account information             chainHead=4,378,378 nonce=14072
INFO [09-28|01:00:30.060] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,630
INFO [09-28|01:00:30.062] Prover assigned                          address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83 endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,630
INFO [09-28|01:00:30.062] Prover's deposited taikoTokenBalance     balance=50000000000000000000 address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83
INFO [09-28|01:00:51.076] "📝 Propose transactions succeeded"       txs=7
INFO [09-28|01:00:52.076] Start fetching L2 execution engine's transaction pool content 
INFO [09-28|01:00:52.077] Current base fee                         fee=1
INFO [09-28|01:00:52.086] Transactions lists count                 count=1
INFO [09-28|01:00:52.087] Proposer account information             chainHead=4,378,380 nonce=14073
INFO [09-28|01:00:52.088] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,652
INFO [09-28|01:00:52.089] Prover assigned                          address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83 endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,652
INFO [09-28|01:00:52.090] Prover's deposited taikoTokenBalance     balance=47500000000000000000 address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83
INFO [09-28|01:01:01.099] "📝 Propose transactions succeeded"       txs=10
INFO [09-28|01:01:02.099] Start fetching L2 execution engine's transaction pool content 
INFO [09-28|01:01:02.100] Current base fee                         fee=1
INFO [09-28|01:01:02.107] Transactions lists count                 count=1
INFO [09-28|01:01:02.108] Proposer account information             chainHead=4,378,381 nonce=14074
INFO [09-28|01:01:02.109] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,662
INFO [09-28|01:01:02.110] Prover assigned                          address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83 endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,662
INFO [09-28|01:01:02.110] Prover's deposited taikoTokenBalance     balance=45000000000000000000 address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83
INFO [09-28|01:01:14.123] "📝 Propose transactions succeeded"       txs=10
INFO [09-28|01:01:15.123] Start fetching L2 execution engine's transaction pool content 
INFO [09-28|01:01:15.124] Current base fee                         fee=1
INFO [09-28|01:01:15.129] Transactions lists count                 count=1
INFO [09-28|01:01:15.130] Proposer account information             chainHead=4,378,382 nonce=14075
INFO [09-28|01:01:15.130] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,675
INFO [09-28|01:01:15.132] Prover assigned                          address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83 endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,675
INFO [09-28|01:01:15.133] Prover's deposited taikoTokenBalance     balance=42500000000000000000 address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83
INFO [09-28|01:01:27.140] "📝 Propose transactions succeeded"       txs=13
INFO [09-28|01:01:28.141] Start fetching L2 execution engine's transaction pool content 
INFO [09-28|01:01:28.142] Current base fee                         fee=1
INFO [09-28|01:01:28.149] Transactions lists count                 count=1
INFO [09-28|01:01:28.150] Proposer account information             chainHead=4,378,383 nonce=14076
INFO [09-28|01:01:28.151] Attempting to assign prover              endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,688
INFO [09-28|01:01:28.152] Prover assigned                          address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83 endpoint=http://taiko_client_prover_relayer:9876 fee=10 expiry=1,695,864,688
INFO [09-28|01:01:28.153] Prover's deposited taikoTokenBalance     balance=40000000000000000000 address=0x0fb12735757D2Bc87964c32f261F018E2Ee17d83
taiko@grassets-2:~/simple-taiko-node$ curl http://172.23.0.10:9876/status
{"minProofFee":10,"maxExpiry":3600,"currentCapacity":3}
INFO [09-28|01:00:20.035] Reading capacity                         capacity=3
INFO [09-28|01:00:30.061] Reading capacity                         capacity=3
INFO [09-28|01:00:52.089] Reading capacity                         capacity=3
INFO [09-28|01:01:02.109] Reading capacity                         capacity=3
INFO [09-28|01:01:15.132] Reading capacity                         capacity=3
INFO [09-28|01:01:19.126] Reading capacity                         capacity=3
INFO [09-28|01:01:20.227] Reading capacity                         capacity=3
INFO [09-28|01:01:28.152] Reading capacity                         capacity=3
INFO [09-28|01:01:38.065] Reading capacity                         capacity=3
INFO [09-28|01:01:38.173] Reading capacity                         capacity=3
INFO [09-28|01:01:51.203] Reading capacity                         capacity=3
INFO [09-28|01:02:04.269] Reading capacity                         capacity=3
INFO [09-28|01:03:29.949] Took one capacity                        capacityAfterTaking=2
INFO [09-28|01:03:30.480] Took one capacity                        capacityAfterTaking=1
INFO [09-28|01:03:30.574] Took one capacity                        capacityAfterTaking=0

@cyberhorsey
Copy link
Contributor

Looks much better now! However, I found this case:

Screenshot 2023-09-27 at 11 02 33

It looks like when proof is cancelled, capacity is released twice.

I have pushed a fix here:
bdca930

this should resolve both yours and @davaymne issues.

this introduces a new "tempCapacity" on the server, to solve davaymes issue, and it ties the capacity to a blockID, preventing it from being released twice now, to solve your issue. Please let me know if any other issues arise from these changes, thank you for the feedback!

@davaymne
Copy link
Author

I can observe that capacity Manager is partially working, but under certain condition it is still failing to control capacity.

Set --proposeInterval = 1s and got issue with capacity manager, where Proposer was able to assign prover 10 times (again), before capacityManager started taking out capacity.

Please see proposer logs and screenshot of txs:

@davaymne
Copy link
Author

Please modify log as current version could mislead.
Should be: max capacity and current usage

INFO [09-29|21:39:35.284] Reading capacity                         capacity=3
INFO [09-29|21:39:35.732] Could not take one temp capacity         capacity=0 tempCapacity=0

I read Reading capacity capacity=3 - as max capacity rather than usage.

@cyberhorsey
Copy link
Contributor

Please modify log as current version could mislead. Should be: max capacity and current usage

INFO [09-29|21:39:35.284] Reading capacity                         capacity=3
INFO [09-29|21:39:35.732] Could not take one temp capacity         capacity=0 tempCapacity=0

I read Reading capacity capacity=3 - as max capacity rather than usage.

I will improve the logs.

Regarding the capacity being taken, I am wondering if your prover was running and caught up when it was assigned? Right now we have the concept of tempCapacity, which is only taken for 36seconds, which makes sense as your logs assign a prover at xx::xx::45, then at xx:xx::21, it can assign again, indicating 36 seconds passed. However, if your prover was active, it should have found those blocks when proposed, and taken a capacity.

I am wondering if the issue is potentially here:

	// If there is newly generated proofs, we need to submit them as soon as possible.
	if len(p.proofGenerationCh) > 0 {
		end()
		return nil
	}

which is code we have in the onBlockProposed handler, which may not work anymore as intended, because the prover will no longer take the capacity in this case. Not sure though.

Do you have the prover logs from this same proposer spam opration, that would maybe help me diagnose?

@davaymne
Copy link
Author

davaymne commented Oct 3, 2023

Please see attached logs from prover and proposer, where proposer were able to assign and submit 5 x Proposes.

Screenshot 2023-10-04 at 00 49 54

prover.log
proposer.log

@davidtaikocha
Copy link
Member

Capacity manager will be removed for A6: #478

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
Status: Done
Development

No branches or pull requests

5 participants