-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Large pruning parameter still not honored despite substrate fix. #6378
Comments
I can see the previously reported limit still in master. https://github.com/paritytech/substrate/blob/master/client/service/src/builder.rs#L279 However I can also note PR comments:
There is also the possibility that the fix only works on parity-db, I did not specify any DB in the node setup. However, I do not see any memory overflow as previously observed. In any case, and for the record, my node was run with the following parameters and version is
|
I tried to verify this after fixes included in released Substrate ( |
@jasl I did not specify any DB. However I did not see any memory overflow (perhaps due to the --sync fast?) I will restart the node with parityDB. Perhaps I find the same issue you are experiencing. |
I don't use Yes, Kusama, but it seems Kusama (the Relaychain) doesn't have the problem, that issue's reporters are all point to parachains |
I removed |
@rvalle Are the blocks past the 4096 most recent still missing with these settings? How exactly do you check if the block is present or missing? |
@arkpar the node is still syncing. 2.1M KSM blocks so far. Will take 1 day, I guess. I normally try to retrieve a block hash by number and then the block by hash. With:
From polkadot.js development tab. If the block is there, I get a hash. And afterwards I try to request the block content using it's hash. I have tried to query the node right now, while syncing, which I think I could with full archive nodes. But does not seem to work. Everything returns 0 or empty. Even the following call:
which seems odd to me. As I understood while syncing at block N one could ask the node as if it was block N. |
@arkpar it is synchronizing unusually slow. Its been 2 days, and still at 9M KSM blocks. Bottleneck seems to be CPU. |
@arkpar it is still taking ages, now at block 10M. BUT it is more promising now.
I will wait until full sync, and do some additional testing. I might just double the CPUs of this VM to complete the Sync. I think it would take the whole week to finish, while full sync as archive node took 2 days. |
An alternative testing way could be to fastsync and wait 4096 blocks (not having the history just after warp/fast sync is expected, but after 4096 blocks things should behave as in a full sync). |
@cheme but what we are testing is if raising that 4096 limit to 1.2M works. But you mean I could sync fast up to the last 1.2M blocks? or? Do you imply that the previous 4096 block history is built during runtime and, say, if you restart the node you need to wait 4096 blocks for it to be available? |
Oh yes, you are right, testing this way would require fast sync and then wait 1.2M blocks to synch afterward, which is certainly too long.
just that when warp synching or fast synching you don't have state history (by design), so the last N blocks history will be build afterward during the next N normal blocks synching (also there is a background process that does the synching from start, but waiting for this one is same as doing a full synch). |
@cheme the changes introduced that I am trying to test resort to the DB instead of memory for history blocks. Perhaps this has now changed in some way. Still, fast syncing all but the last 1.5M blocks, to be on the safe side, would have been better. I am also concerned of what is going to happen after a node restart. Is the state history persistent? I guess it, right? |
mmmmm @cheme I stopped the node to double its CPUs, and now I can no longer query the 9.2M block, as I could right before. I need to gain a better understanding of how the history works. Perhaps the history is being rebuilt as you mentioned. Is there any API to query the status of the history? |
In any case this rebuilding would take place form data already local, right? I guess it would be less effort. |
should be persistent yes. The history is simply written in db, and removed when outside the pruning range.
Not sure, I think it should just be: block head number - pruning history. Thinking about it, my test would be: |
@arkpar @cheme What test did I do: In any case, I will wait for full sync and do a final test, but my bet is for the node not to be able to reply to queries about blocks far beyond the 4096 limit. |
|
@arkpar yes! That definitely shows the expected behavior. Still sync is tremendously slow. Still at 11.2M. I am going to instantiate an Archive node to compare. |
The archive node is syncing at a comparable speed. So I will just wait for the final syncing and review that these patches actually provide the expected result. |
@arkpar I can now confirm that the fixes have worked as expected. I tested setting a prune list of 1.2M and it was possible to run a subquery project querying those last 1M blocks, even while the blockchain is syncing. The only thing is that, for some reason, the sync process is tremendously slow. I estimate 3x times slower minimum. In parallel I have setup another node on archive mode (and default db) and it took 3 days in our environment. The node syncing with --prune 1.2M and parityDb is still at 14M blocks after one week, I expect it to take 9 days. Which is a "minor" problem compared to not been able to setup a node with a large prune list. |
I reported in the past #5807 , that pruning parameter was not honored. As setting a large pruning (2M for example) would result in only 4096 blocks being accessible.
I also reported that setting a large pruning parameter would result in out of memory #5804 where nodes with large pruning setting would get out of memory.
It was noted that:
@arkpar said:
@arkpar said:
This led to the issue been fixed at paritytech/substrate#11911 by using information in the database instead using memory:
@arkpar said:
I have run a node with pruning set to 1M blocks and indeed, memory consumption remains minimum, so it looks like the @NingLin-P substrate PR paritytech/substrate#11980 made it to polkadot and produced the expected result.
However, I believe the hard maximum setting of IIRC 4096 blocks is still in place even thou it is now no longer applicable, as I can still only query 4096 blocks despite setting a pruning of 1M.
CC: @jasl
The text was updated successfully, but these errors were encountered: