Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

Large pruning parameter still not honored despite substrate fix. #6378

Closed
rvalle opened this issue Dec 2, 2022 · 21 comments
Closed

Large pruning parameter still not honored despite substrate fix. #6378

rvalle opened this issue Dec 2, 2022 · 21 comments

Comments

@rvalle
Copy link

rvalle commented Dec 2, 2022

I reported in the past #5807 , that pruning parameter was not honored. As setting a large pruning (2M for example) would result in only 4096 blocks being accessible.

I also reported that setting a large pruning parameter would result in out of memory #5804 where nodes with large pruning setting would get out of memory.

It was noted that:

@arkpar said:

Pruning setting indeed has a hard maximum setting of IIRC 4096 blocks. This is has to do with the fact that current implementation requires some in-memory upkeep for tracking live branches and setting it too high will use too much memory and degrade performance.

@arkpar said:

It should be possible to get rid of in-memory bookeeping for state pruning with some changes to the implementation.

This led to the issue been fixed at paritytech/substrate#11911 by using information in the database instead using memory:

@arkpar said:

It is required indeed, but we don't need to keep it in memory. On each block import a journal record is written to the database here which contains a list of deleted keys. When it is time to prune a block we can get that record from the database.

I have run a node with pruning set to 1M blocks and indeed, memory consumption remains minimum, so it looks like the @NingLin-P substrate PR paritytech/substrate#11980 made it to polkadot and produced the expected result.

However, I believe the hard maximum setting of IIRC 4096 blocks is still in place even thou it is now no longer applicable, as I can still only query 4096 blocks despite setting a pruning of 1M.

CC: @jasl

@rvalle
Copy link
Author

rvalle commented Dec 2, 2022

I can see the previously reported limit still in master.

https://github.com/paritytech/substrate/blob/master/client/service/src/builder.rs#L279

However I can also note PR comments:

Note that the queue required the backend database to support reference counting (i.e parity-db), databases that do not support reference counting (i.e rocksdb) still need to keep all blocks in memory

There is also the possibility that the fix only works on parity-db, I did not specify any DB in the node setup. However, I do not see any memory overflow as previously observed.

In any case, and for the record, my node was run with the following parameters and version is v0.9.33:

            "Cmd": [
                "--name",
                "privazio-kusama-archive-node",
                "--chain",
                "kusama",
                "--pruning",
                "1000000",
                "--rpc-cors",
                "all",
                "--rpc-external",
                "--ws-external",
                "--prometheus-external",
                "--out-peers",
                "2",
                "--in-peers",
                "6",
                "--max-parallel-downloads",
                "3",
                "--sync",
                "fast"
            ],

@jasl
Copy link
Contributor

jasl commented Dec 2, 2022

I tried to verify this after fixes included in released Substrate (polkadot-0.9.30 i remember),
the pruning fix must use with ParityDB, or all not pruning blocks will keep in memory that will made the node cosumes extream large memory and will killed by OS
however, I met another problem paritytech/substrate#12613 so my verify get stuck

@rvalle
Copy link
Author

rvalle commented Dec 2, 2022

@jasl I did not specify any DB. However I did not see any memory overflow (perhaps due to the --sync fast?)

I will restart the node with parityDB. Perhaps I find the same issue you are experiencing.
I am running on Kusama, you?

@jasl
Copy link
Contributor

jasl commented Dec 2, 2022

@jasl I did not specify any DB. However I did not see any memory overflow (perhaps due to the --sync fast?)

I will restart the node with parityDB. Perhaps I find the same issue you are experiencing. I am running on Kusama, you?

I don't use --sync fast

Yes, Kusama, but it seems Kusama (the Relaychain) doesn't have the problem, that issue's reporters are all point to parachains

@rvalle
Copy link
Author

rvalle commented Dec 2, 2022

I removed --sync fast and added --database paritydb with --pruning 1300000 Memory is behaving with <500MB after 400K blocks

@arkpar
Copy link
Member

arkpar commented Dec 2, 2022

I removed --sync fast and added --database paritydb with --pruning 1300000 Memory is behaving with <500MB after 400K blocks

@rvalle Are the blocks past the 4096 most recent still missing with these settings? How exactly do you check if the block is present or missing?

@rvalle
Copy link
Author

rvalle commented Dec 2, 2022

@arkpar the node is still syncing. 2.1M KSM blocks so far. Will take 1 day, I guess.

I normally try to retrieve a block hash by number and then the block by hash. With:

storage.system.blockHash(N)

From polkadot.js development tab.

If the block is there, I get a hash. And afterwards I try to request the block content using it's hash.

I have tried to query the node right now, while syncing, which I think I could with full archive nodes. But does not seem to work. Everything returns 0 or empty. Even the following call:

storage.system.number()

which seems odd to me. As I understood while syncing at block N one could ask the node as if it was block N.

@rvalle
Copy link
Author

rvalle commented Dec 4, 2022

@arkpar it is synchronizing unusually slow. Its been 2 days, and still at 9M KSM blocks. Bottleneck seems to be CPU.

@rvalle
Copy link
Author

rvalle commented Dec 5, 2022

@arkpar it is still taking ages, now at block 10M.

BUT it is more promising now.

system.number() returned 10,052,608

I will wait until full sync, and do some additional testing.

I might just double the CPUs of this VM to complete the Sync. I think it would take the whole week to finish, while full sync as archive node took 2 days.

@cheme
Copy link
Contributor

cheme commented Dec 5, 2022

An alternative testing way could be to fastsync and wait 4096 blocks (not having the history just after warp/fast sync is expected, but after 4096 blocks things should behave as in a full sync).

@rvalle
Copy link
Author

rvalle commented Dec 5, 2022

@cheme but what we are testing is if raising that 4096 limit to 1.2M works.
I have the setting as --pruning 1200000
And the 4096 limit is supposed to be gone at least when using the parityDB.

But you mean I could sync fast up to the last 1.2M blocks? or?

Do you imply that the previous 4096 block history is built during runtime and, say, if you restart the node you need to wait 4096 blocks for it to be available?

@cheme
Copy link
Contributor

cheme commented Dec 5, 2022

Oh yes, you are right, testing this way would require fast sync and then wait 1.2M blocks to synch afterward, which is certainly too long.

Do you imply that the previous 4096 block history is built during runtime and, say, if you restart the node you need to wait 4096 blocks for it to be available?

just that when warp synching or fast synching you don't have state history (by design), so the last N blocks history will be build afterward during the next N normal blocks synching (also there is a background process that does the synching from start, but waiting for this one is same as doing a full synch).

@rvalle
Copy link
Author

rvalle commented Dec 5, 2022

@cheme the changes introduced that I am trying to test resort to the DB instead of memory for history blocks. Perhaps this has now changed in some way.

Still, fast syncing all but the last 1.5M blocks, to be on the safe side, would have been better.

I am also concerned of what is going to happen after a node restart. Is the state history persistent? I guess it, right?

@rvalle
Copy link
Author

rvalle commented Dec 5, 2022

mmmmm

@cheme I stopped the node to double its CPUs, and now I can no longer query the 9.2M block, as I could right before.

I need to gain a better understanding of how the history works. Perhaps the history is being rebuilt as you mentioned.

Is there any API to query the status of the history?

@rvalle
Copy link
Author

rvalle commented Dec 5, 2022

there is a background process that does the synching from start, but waiting for this one is same as doing a full sync

In any case this rebuilding would take place form data already local, right? I guess it would be less effort.

@cheme
Copy link
Contributor

cheme commented Dec 5, 2022

I am also concerned of what is going to happen after a node restart. Is the state history persistent? I guess it, right?

should be persistent yes. The history is simply written in db, and removed when outside the pruning range.
The latest change were related to non-finalized blocks that were previously drop (pruning range was small enough that we could simply resynch them), so observing it should be from a synch chain with a big pruning range and no finalization happening.

Is there any API to query the status of the history?

Not sure, I think it should just be: block head number - pruning history.
warpsync fastsync are an exception to this rule as we want to be able to use the chain before synching the history.
Generally I would just run a state query of a known key at a given block (just querying block hash does not indicate if your state is available).

Thinking about it, my test would be:
fastsync or warpsync with a very large pruning range, wait for more than 4096 blocks and then ensure I can access state from the intial fast sync number to head. At some point observe pruning happening (when head - pruning range > fastsynch initial block number).
But using a fully synched chain would be better.

@rvalle
Copy link
Author

rvalle commented Dec 5, 2022

@arkpar @cheme
I have been testing further, and I think that the history is still limited to 4096 despite prunning 1.2M now working.

What test did I do:
While syncing asround 10M I can ask for system.blockHash(N) where N is in the range of 4096 from the system.number() and a hash is returned.
Now I wait for the sync to advance to another block M > N + 4096. Reload the polakadotJS page to stop any local cache from interfearing and request again system.blockHash(N) and now the response is "empty".
There is clearly a window of 4096 blocks that it is affecting this API call.

In any case, I will wait for full sync and do a final test, but my bet is for the node not to be able to reply to queries about blocks far beyond the 4096 limit.

@arkpar
Copy link
Member

arkpar commented Dec 5, 2022

system.blockHash and system.number query the latest chain state. The system pallet maintains a few recent block hashes in the storage which you can query. The pruning setting however does not control this. What you probably want to do is query the block hash not from the storage, but from the chain database directly using the chain.getBlockHash RPC calls (or corresponding section in the polkadot-js UI).
Once you have the hash you can query the state of any block within the pruning window by passing the block hash to the RPC calls that accept the at parameter. Eg. state.getStorage or state.call

@rvalle
Copy link
Author

rvalle commented Dec 6, 2022

@arkpar yes!

That definitely shows the expected behavior.

Still sync is tremendously slow. Still at 11.2M.

I am going to instantiate an Archive node to compare.

@rvalle
Copy link
Author

rvalle commented Dec 7, 2022

The archive node is syncing at a comparable speed. So I will just wait for the final syncing and review that these patches actually provide the expected result.

@rvalle
Copy link
Author

rvalle commented Dec 12, 2022

@arkpar I can now confirm that the fixes have worked as expected.

I tested setting a prune list of 1.2M and it was possible to run a subquery project querying those last 1M blocks, even while the blockchain is syncing.

The only thing is that, for some reason, the sync process is tremendously slow. I estimate 3x times slower minimum.

In parallel I have setup another node on archive mode (and default db) and it took 3 days in our environment. The node syncing with --prune 1.2M and parityDb is still at 14M blocks after one week, I expect it to take 9 days. Which is a "minor" problem compared to not been able to setup a node with a large prune list.

@rvalle rvalle closed this as completed Dec 12, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants