-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node.js v10.15.0 segfault in BackgroundRunner → CancelableTask::Run → ConcurrentMarking::Run #25814
Comments
just wondering where is the main thread! its state at the time of fault may be the key |
@Cabalbl4 - could you do |
Core dump pid was 14894
|
thanks! but unfortunately that does not reveal anything - main thread is supposed to be having Could you try with |
No, can not find
|
I examined other dumps I have and see same picture, but with less threads.
|
@Cabalbl4 - thanks.
If so, could you please do On the fact the main thread is not showing any symbols, one reason could be that it is executing JIT compiled code, but If we can ascertain that main thread indeed in JS land, that would eliminate one of the suspect I had on the crash. By any chance these crashes are observed when the application was about to close? Or is it a webapp with a service loop? or
or
Please let me know which way you want to go. also pinging @nodejs/v8 to see if they have a better proposal. |
@gireeshpunathil I will try to get all dependencies from production and load them into debugger. It is really hard to pin-point JS source of problem, since program is more than 10k lines :( This will take some time (have some urgent tasks), I will ping you once all is ready. Most likely start of next week. Sorry for that. |
no issues, thanks! meanwhile we may also hear from others if they have a say on this. |
@gireeshpunathil I re-created container locally, installed gdb, and produced more useful stacks, since all production deps are loaded.
I attach those as files. |
I see some "Backtrace stopped: previous frame inner to this frame (corrupt stack?)" in those new stacks. |
to see if all 3 dumps show the same pattern / location, can you issue: |
Looks like same problem for threads where backtrace failed:
|
@Cabalbl4 - sorry; but in this case none of the 3 threads are the failing threads! We are interested in the disassembly of only the failing one. Can you dump that? thanks! |
@gireeshpunathil you need the threads with segfaults, right? Will do. |
|
thanks for the quick revert; all looks bad sequence to me; so I am looking at a wild branch from frame 1.
sorry if it is tedious! also |
No problem, will do once have time, today or tomorrow at worst. |
PID: 14894
PID: 32247
PID: 476
|
@Cabalbl4 - if you can just dump If the crash is same as my suspect, we should find a string value there: there is only one callsite in the whole of inlined method The wild branch upon invoking This reminds issues discovered in master (#25007) and with matching context with #25007 (comment) But will wait for @Cabalbl4 's output to confirm. |
@gireeshpunathil
PID 32247
PID 476
|
@gireeshpunathil #25061 (specifically 4da7e6e) has not landed on 10.x, was that a mistype? |
@MylesBorins - no, I meant #25061 itself. what I meant to say is; we want 4da7e6e for sure; but not sure what else would be needed - as we had a number of race related issues and I don't have a mapping between issue and the commit matching its resolution. |
Insert a NULLCHECK prior to return. Ideally we do this in the caller, but the TraceController object is somewhat special as: 1. It is accessed by most threads 2. It's life cycle is managed by Agent::Agent 3. It's getter is invoked through Base Methods (upstream) Refs: nodejs#25814
Insert a NULLCHECK prior to return. Ideally we do this in the caller, but the TraceController object is somewhat special as: 1. It is accessed by most threads 2. It's life cycle is managed by Agent::Agent 3. It's getter is invoked through Base Methods (upstream) Refs: nodejs#25814 PR-URL: nodejs#25943 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Masashi Hirano <shisama07@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
Insert a NULLCHECK prior to return. Ideally we do this in the caller, but the TraceController object is somewhat special as: 1. It is accessed by most threads 2. It's life cycle is managed by Agent::Agent 3. It's getter is invoked through Base Methods (upstream) Refs: #25814 PR-URL: #25943 Reviewed-By: James M Snell <jasnell@gmail.com> Reviewed-By: Masashi Hirano <shisama07@gmail.com> Reviewed-By: Richard Lau <riclau@uk.ibm.com> Reviewed-By: Anna Henningsen <anna@addaleax.net>
@Cabalbl4 - |
There is also a A service which we suspect had the same issue has been running without error with |
ping @Cabalbl4 |
inactive, closing. please re-open if this is still outstanding. |
Node.js v10.15.0 segfault in BackgroundRunner → CancelableTask::Run → ConcurrentMarking::Run
We are running node.js in docker on centos nodes:
Recently, we migrated our image to new node version:
FROM node:8.12.0-alpine → FROM node:10.15.0-alpine
We started to observe lots of segfaults in prod:
We use node to spawn a lot of puppeteer scrapers (adding this, because puppeteer/puppeteer#2872 may be related)
I was able to get a few core dumps from inside container, here is the stack:
Other core dumps also contained ConcurrentMarking::Run as last instruction, ~PromiseWrap was not always there.
Env parameters that may be useful:
The text was updated successfully, but these errors were encountered: