-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Double free or corruption on Raspbian #10523
Comments
This is blocking PSCore6.1 release and is regression from dotnetcore 2.0 (and thus PSCore6.0) |
The code that allocates and frees the affected pointer hasn't changed since .NET core 1.0, so it looks like it is more likely a memory corruption rather than a double free. @anmenaga can you please share the core file with me? |
@janvorli In our case this is happening on a Pi where PowerShell is working with special hardware. Continuously running PS instance where different scripts are getting launched 24x7. |
@anmenaga I have an additional question. When it crashes, is the call stack always the same / similar to the one you've shown or is it completely random? |
@janvorli this is the first time we attached debugger before the crash. Machine is currently holding live debug session at the point of the crash. |
Here is another one:
|
I've got similar reports from users running my Sadly I don't have any stacktraces, only generic reports. Just wanted to confirm this one with entirely different project, thanks.
|
Hi,
However, when running with .NET Core 2.0, this does not appear. This issue is currently blocking us from moving to .NET Core 2.1. Unfortunately atm I don't know how I can get further diagnostic information about this crash. Is there something I can do to help diagnosing this? Thanks! |
@kpreisser Enable code dumps by running |
Hi @jkotas, thank you! I enabled core dumps and ran the application today, where it crashed after a few hours with:
This is the output from
I do not yet have more stack traces, but I will continue to run the application and then post the stacks if they are different. Thanks! |
Yesterday the application crashed with a different error:
gdb:
|
I have this same problem, i haved in 2.1.300 and i update to 2.1.301 to see if it stops and it have the same problems |
In my case this issue was not resolved in 2.1.301 |
Same here. I have 2 stacktraces are the same.
Here my .net core version `Host (useful for support): .NET Core SDKs installed: .NET Core runtimes installed: To install additional .NET Core runtimes or SDKs: Build on a windows pc with the following version info: `.NET Core SDK (gemäß "global.json"): Laufzeitumgebung: Host (useful for support): .NET Core SDKs installed: .NET Core runtimes installed: |
This could be some kind of corruption or a bad free somewhere. The object being deleted would have been allocated not long before the point of failure. Could someone please share the stacks of all threads when this occurs (the double-free issue)? If it's a bad free that's happening elsewhere the stacks of other threads may help to narrow down where the bad free might be. If someone could share a core dump of the double-free issue that may be useful as well. |
It may also be useful to see what values were extracted from the memory before the delete, and the pointer value itself, to see if there is any obvious corruption, though it may not be reliable info. |
Here is a backtrace of all threads. I could give you access to my raspberry if needed as well. |
Hi, For the "Segmentation fault" issue: The core files have about 300 MB; I think I cannot share them publicly as they might contain private code, but maybe I can share them privately. Additionally to these two crashes, we have also discovered that sometimes the application does not crash but is stuck at 100% CPU usage (which doesn't happen with .NET Core 2.0). Thank you! |
Hello,
How i can get the backtrace?
I use the dotnet for an app call PTMagic and it crash every 6 or 12 hours.
Respectfully
El 19 jul. 2018, a la(s) 2:42 a. m., Konstantin Preißer <notifications@github.com> escribió:
… Hi,
For the "double free or corruption" issue:
Backtrace
For the "Segmentation fault" issue:
Backtrace
The core files have about 300 MB; I think I cannot share them publicly as they might contain private code, but maybe I can share them privately.
Additionally to these two crashes, we have also discovered that sometimes the application does not crash but is stuck at 100% CPU usage (which doesn't happen with .NET Core 2.0).
Thank you!
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
We're running into the same scenario as this:
Working on getting dumps. |
Adding to @rquackenbush comment: here is what happens in our situation after about 4 hours of runtime:
This particular error was on Raspbian as well, but it occurs on all ARM devices where our applications are executing. |
I get that same error
Enviado desde mi iPhone
El 20 jul. 2018, a la(s) 11:16 a. m., Jesse Beard <notifications@github.com> escribió:
… Adding to @rquackenbush comment: here is what happens in our situation after about 4 hours of runtime:
** Error in 'Application': double free or corruption (fasttop): 0x73630b18 ***
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
This repo contains a This is the code that is running: |
In the seg fault stack trace it looks like a different recently heap-allocated object's data is corrupted. The point of failure is close to the double-free issue, they may be caused by the same underlying issue. I didn't gather much from the other threads' stacks, they look typical. It looks like the point of failure would be too late, I'll look into getting a repro for the PS issue meanwhile. If you have some repro steps that I could use (even if it takes a while before crash, just something that works) with any of the apps above that would be helpful. |
@kouvel - thanks for looking at that. Were you able to identify what code the problematic thread was running (or trying to run)? Any help there would help us in creating a repro. |
@rquackenbush I wasn't able to identify a suspicious thread from the native stacks that were posted. I haven't yet figured out how to look at the core dump you shared, it seems like more things would be needed (executable image, dependency modules, maybe more). If you're looking for managed stacks in order to determine what code is running in each thread it may be easier for you to open the core dump with lldb-5.0 on the same machine, "plugin load libsosplugin.so" (should be alongside the loaded libcoreclr.so), and "clrstack" for each thread as described here. |
@kouvel - unfortunately it looks like lldb-5.0 isn't available on arm quite yet. I've tried running lldb-3.9 per the instructions, but the sosplugin doesn't appear to be compatible per this issue: https://github.com/dotnet/coreclr/issues/18889 I'm attempting to build lldb locally on the pi, but that is taking an eternity. It's also not clear how I'll be able to compile a matching |
Oh it looks like the cross-build script for arm installs lldb 3.6 dev package by default: And it looks like the sos plugin build would use the latest lldb headers that are installed, so I'm not sure which version it would be build against, probably 3.6. Though I'm not sure if it actually works, based on other issues linked to the issue above the sos plugin appears to have issues on arm. I'll have to try this out myself. |
I got a repro now on 2.1.2 with the same stack trace, it just took longer. |
Its pretty elusive. I think once we find the root cause, it will be easy to get a test case that happens faster. My original thought was it was a TheradPool cleanup issue. Somehow a thread was getting cleaned up twice. But I have nothing to base that on other than 5 lines of a stack trace. :) But, at least you have something now. |
Hello,
Do you need is to recreate the crash??
I can show you how my dotnet crash every 6 hours.
Respectfully,
Enviado desde mi iPhone
El 8 ago. 2018, a la(s) 4:55 p. m., Roy Salisbury <notifications@github.com> escribió:
… Its pretty elusive. I think once we find the root cause, it will be easy to get a test case that happens faster. My original thought was it was a TheradPool cleanup issue. Somehow a thread was getting cleaned up twice. But I have nothing to base that on other than 5 lines of a stack trace. :)
But, at least you have something now.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@js8749 I'm sure multiple tests cases can always help. You should defiantly upload something that they can try. |
This is good news. (depending on how you look at it). Hopefully your tests
prove your theory and we can get this into the next update.
Roy
|
Hmmm.. Not showing up here for some reason, but just saw a reply in my email from @kouvel that stated he may have found a possible culprit.
|
Yes that's most likely it, I deleted my comments, will update once I test a fix |
My repro scenario involves very frequent operations with Thread creation/deletion; |
I've tested a fix that seems to work, working on getting a fix in |
Yesterday i update my RPI and dotnet. My dotnet apps have not crash sin yesterday.
My CPU usage stays low.
Enviado desde mi iPhone
El 21 ago. 2018, a la(s) 12:15 p. m., Koundinya Veluri <notifications@github.com> escribió:
… I've tested a fix that seems to work, working on getting a fix in
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@kouvel That's great news. How does that typically work on things like this? If approved it just goes into the next release (e.g, 2.1.3), or does it take much longer (e.g., 2.2)? |
@js8749 the timing in my runs were also unpredictable, sometimes it took days and sometimes hours. @RoySalisbury, it would go into 2.1, 2.2, and master for 3.0, which version of 2.1 is not clear at the moment, hopefully the next one. |
Fixes https://github.com/dotnet/coreclr/issues/18486 - Lock release needs to be at least volatile
Fix for https://github.com/dotnet/coreclr/issues/18486 - Lock release needs to be at least volatile
Fix for https://github.com/dotnet/coreclr/issues/18486 - Lock release needs to be at least volatile
Fix for https://github.com/dotnet/coreclr/issues/18486 - Lock release needs to be at least volatile
Fix for https://github.com/dotnet/coreclr/issues/18486 - Lock release needs to be at least volatile coreclr master PR: dotnet/coreclr#19604
/cc @MichaelSimons |
Fixes https://github.com/dotnet/coreclr/issues/18486 - Lock release needs to be at least volatile
The fix is currently targeting a September release for 2.1. Closing based on dotnet/coreclr#19606. |
Fixes https://github.com/dotnet/coreclr/issues/18486 - Lock release needs to be at least volatile
Could we get a small notification in this issue when runtime with this issue being fixed gets released? Thank you in advance 👍 |
@kouvel Can you confirm that 2.1.4 runtime has dotnet/coreclr#19606 fix included? Thank you. |
According to this, its not included... https://github.com/dotnet/core/blob/master/release-notes/2.1/2.1.4/2.1.4-commits.md |
Based on the commit hash of coreclr.dll it looks like it did not make it into 2.1.4 though it was expected to be at the time (latest commit included was on Aug 13). It should be included in 2.1.5, which is scheduled for October. Apologies for the confusion. |
Thank you for further explanation. Let's hope it's going to make it in 2.1.5 then 🙂 |
Looks like this DID make it into 2.1.5 2018-08-28 - [9663131aec] Fix a PAL spin lock issue (#19606) |
After the move from .NETCore 2.0 to 2.1 this started happening very frequently.
PowerShell running on Raspberry Pi 3 Model B ("Raspbian GNU/Linux 9 (stretch)")
crashes with:
*** Error in "pwsh": double free or corruption (fasttop): 0x6e800fe0 ***
stack from gdb:
Can share the core file with above stack.
The text was updated successfully, but these errors were encountered: