-
Notifications
You must be signed in to change notification settings - Fork 475
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate tokio panic (assertion failed: self.ref_count() > 0
)
#7162
Comments
90% confident that it's not a neon bug |
There are two similar upstream issues: |
assertion failed: self.ref_count() > 0
)
Turns out we are on the latest tokio version: e62baa9 |
@VladLazar I'm investigating this right now on the tokio side. We think this is a memory corruption issue unrelated to tokio which tokio is the first to yell about. Have you folks done any other dependency updates recently that line up? Also a possibility is that it's actually a recent neon issue, but I'm strongly leaning towards this being another common crate, likely in tokio's tree, given the number of reports. |
assertion failed: self.ref_count() > 0
)assertion failed: self.ref_count() > 0
)
Got another panic today (sentry link). The stack trace is identical to the other throttling panic. This makes me suspect that that something is wrong on our side (or in leaky bucket). I'll try to throttle aggressively under pagebench. Perhaps it reproes. |
Interesting! Keep me in the loop here, if this ends up being an issue on your end that helps us narrow down what could be going on with our other reports, as our current theory is that there's a common issue |
Interesting observation: Peter's benchmarks seem to repro this. We have seen the throttling rooted panic two times in staging in the past 7 days. Both times it was on Peter's tenants. Metric collection is broken during the benchmarks so, we are a bit blind, but CPU usage is high. |
Got a new panic in prod eu-central-1@pageserver-4
|
This is my current understanding as well. I found a way to trigger both panics seen by I cannot find any other way at the moment to trigger the same behaviour with any code in tokio 1.37.0. When investigating any dependencies, looking for potentially double-free bugs is what we should be looking for |
Haven't seen any recent instances of this. Can we close this issue out @VladLazar? |
Sentry link: https://neondatabase.sentry.io/issues/5076699787/?alert_rule_id=13489026&alert_type=issue¬ification_uuid=821438c5-fe8b-4aef-94a9-4f932dd3c959&project=4504220031582208&referrer=slack
Slack thread: https://neondb.slack.com/archives/C033RQ5SPDH/p1710750099615559
Current theories are:
leaky_bucket
bugCall site:
neon/pageserver/src/tenant/throttle.rs
Line 146 in 77f3a30
Backtrace:
The text was updated successfully, but these errors were encountered: