Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pinning: Notification block pinning limit reached for warp-sync #4389

Closed
lexnv opened this issue May 6, 2024 · 6 comments · Fixed by #5129
Closed

pinning: Notification block pinning limit reached for warp-sync #4389

lexnv opened this issue May 6, 2024 · 6 comments · Fixed by #5129

Comments

@lexnv
Copy link
Contributor

lexnv commented May 6, 2024

The following warning is continuously printed for a warp-sync kusama node (running litep2p backend) at around ~20 seconds intervals.

This warning did not reproduce however for a full-syncing node.

This was discovered during a long-running node triage (around 2 weeks running litep2p backend).

It seems that during the warp-sync the pinning limit is reached, then the pinning always keeps around 1k pinned blocks. When a new block is discovered, the LRU block is unpinned causing an warning.
It might be the case that references to pinned blocks are kept around for too long (maybe because we have too many blocks).

2024-05-06 13:06:12.494  WARN tokio-runtime-worker db::notification_pinning: Notification block pinning limit reached. Unpinning block with hash = 0x354f531933d7a20b92e652ad0ec2034a74f645923f13ce2651056b2a951a7692

Warning is coming from:

if *references > 0 {
log::warn!(
target: LOG_TARGET,
"Notification block pinning limit reached. Unpinning block with hash = {key:?}"
);
if let Some(backend) = self.backend.upgrade() {
(0..*references).for_each(|_| backend.unpin_block(*key));
}

cc @paritytech/sdk-node @paritytech/networking

@skunert
Copy link
Contributor

skunert commented May 7, 2024

I will warp sync a new kusama node on a dev machine and see if I can find anything fishy.

@skunert
Copy link
Contributor

skunert commented May 13, 2024

I was able to reproduce this. As a quick recap, we do not prune blocks that have FinalityNotifications or ImportNotifications floating around. Only when the UnpinHandle is dropped, the block can be pruned in the backend.

It looks like during initialization, the beefy worker is expecting a given header ancestry to be available and is basically waiting for headers to be available here:

info!(
target: LOG_TARGET,
"🥩 Parent of header number {} not found. \
BEEFY gadget waiting for header sync to finish ...",
current.number()
);
tokio::time::sleep(delay).await;

In my test example it took 6 hours for this loop to continue. But in the meantime we have a finality stream that is not getting polled here:

let mut finality_notifications = client.finality_notification_stream().fuse();

So there, the notifications pile up and the pinning cache limit is reached.

I am not too familiar with the beefy logic, but maybe we could drain the stream while working for the headers or, if we need to act on every notification, we could map them to a different type that does not include the handle.

@acatangiu What do you think?

@bkchr
Copy link
Member

bkchr commented May 13, 2024

As I already proposed here: #3945 (comment) the beefy networking should not be registered before beefy is active.

@bkchr
Copy link
Member

bkchr commented May 13, 2024

(Basically this entire code should not do anything until BEEFY is ready.)

@acatangiu
Copy link
Contributor

I believe this is a different case that would still be a problem even if BEEFY registers networking only after being active.

In case of warp sync, only mandatory headers get synced, and chain "normal operation" starts while rest of the headers sync in the background.

In either case, BEEFY worker needs to register for finality notifications because it is exclusively driven by GRANDPA finality. Simplified initialization steps:

  1. It checks for each finalized block if BEEFY protocol was initialized on chain (beefy_genesis set),
  2. Block N is finalized where BEEFY was initialized: beefy_genesis <= N,
    1. if beefy_genesis == N it's simple, just get initial BEEFY validators from the header,
    2. beefy_genesis < N, we need to get initial validators from beefy_genesis header - BUT when warp-syncing, this header is not yet available (coming in later), so without BEEFY: add BEEFY-specific warp proofs so we can warp sync when using BEEFY #1118 the workaround was to "just wait" for the sync to complete then resume processing GRANDPA finality stream (this comment explanation). We can't continue processing without the initial validator set, we also can't drop finality notifications without processing them.

Therefore, not helped by #3945 (comment)

@acatangiu
Copy link
Contributor

I am not too familiar with the beefy logic, but maybe we could drain the stream while working for the headers or, if we need to act on every notification, we could map them to a different type that does not include the handle.

Yes, either drain the notifications and map/enqueue just the relevant data or just full header to some BEEFY worker queue - once the sync finished, worker needs to process the queue then switch back to processing notifications once it caught up.

github-merge-queue bot pushed a commit that referenced this issue Jul 26, 2024
This should prevent excessive pinning of blocks while we are waiting for
the block ancestry to be downloaded after gap sync.
We spawn a new task that gets polled to transform finality notifications
into an unpinned counterpart. Before this PR, finality notifications
were kept in the notification channel. This led to pinning cache
overflows.

fixes #4389

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
TarekkMA pushed a commit to moonbeam-foundation/polkadot-sdk that referenced this issue Aug 2, 2024
)

This should prevent excessive pinning of blocks while we are waiting for
the block ancestry to be downloaded after gap sync.
We spawn a new task that gets polled to transform finality notifications
into an unpinned counterpart. Before this PR, finality notifications
were kept in the notification channel. This led to pinning cache
overflows.

fixes paritytech#4389

---------

Co-authored-by: Bastian Köcher <git@kchr.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants