Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free the tls memory on thread termination #95362

Merged
merged 10 commits into from
Nov 30, 2023
Merged

Conversation

kunalspathak
Copy link
Member

The TLS memory that we used to store the TLS data per thread was never freed on thread termination. This results in memory leak.

@ghost ghost assigned kunalspathak Nov 28, 2023
@kunalspathak kunalspathak changed the title Free the tls memory Free the tls memory on thread termination Nov 28, 2023
@kunalspathak kunalspathak marked this pull request as ready for review November 29, 2023 22:14
@kunalspathak
Copy link
Member Author

@jkotas

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look ok to me. I think @AaronRobinsonMSFT is also taking a look.

@@ -7702,6 +7708,15 @@ void Thread::DeleteThreadStaticData()
CONTRACTL_END;

m_ThreadLocalBlock.FreeTable();

delete[] t_ThreadStatics.NonGCThreadStaticBlocks;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so these were thought to be implicitly deleted but that isnt the case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

correct.

@carlossanlop
Copy link
Member

If we intend to backport this fix to 8.0 so it gets included in the January Release, please target the release/8.0 branch directly, not staging, so we can flow it to internal quickly. Use:

/backport to release/8.0

@AndyAyersMS
Copy link
Member

@kunalspathak no symbolization on the stack traces so I can only guess what's behind the failures -- maybe the cleanup needs to be conditional on the TLS not being nullptr? Linux may be more tolerant of calling delete on nulls.

@kunalspathak
Copy link
Member Author

maybe the cleanup needs to be conditional on the TLS not being nullptr

Here is what I found for 1 repro, but it doesn't always stay like that. Notice the JIT_GetSharedGCThreadStaticBaseOptimized at the end, so my first guess was it has to do with CONSISTENCY_CHECK(staticBlock != NULL);, but adding printf doesn't help as it is not consistent.

CORECLR! CHECK::Trigger + 0x1E6 (0x00007ffe`907a7356)
CORECLR! CLRVectoredExceptionHandlerPhase3 + 0x4B5 (0x00007ffe`900d7135)
CORECLR! CLRVectoredExceptionHandlerPhase2 + 0x91 (0x00007ffe`900d6a11)
CORECLR! CLRVectoredExceptionHandler + 0xDFD (0x00007ffe`900d696d)
CORECLR! CLRVectoredExceptionHandlerShim + 0x1B4 (0x00007ffe`900d73b4)
NTDLL! RtlGetLengthWithoutLastFullDosOrNtPathElement + 0x3DA (0x00007fff`c6e67a8a)
NTDLL! RtlFindCharInUnicodeString + 0x872 (0x00007fff`c6e0e242)
NTDLL! KiUserExceptionDispatcher + 0x2E (0x00007fff`c6e9340e)
CORECLR! JIT_GetSharedGCThreadStaticBaseOptimized + 0x4CE (0x00007ffe`903f8e9e)
<no module>! <no symbol> + 0x0 (0x00007ffe`375eb155)
    File: D:\git\runtime2\src\coreclr\vm\excep.cpp Line: 7177
    Image: E:\helix\95362_2\correlation-payload\test-runtime-net9.0-windows-Release-x64\dotnet.exe

Under windbg, I always hit this kind of AV, which I think is coming because staticBlock is nullptr? The AV in windbg is consistent but I don't understand why I don't see if windbg is not attached:

0007ffe`4a379c9e c5fe7f65e0      vmovdqu ymmword ptr [rbp-20h],ymm4
00007ffe`4a379ca3 48894d10        mov     qword ptr [rbp+10h],rcx
00007ffe`4a379ca7 488b4d10        mov     rcx,qword ptr [rbp+10h]
00007ffe`4a379cab 488b4908        mov     rcx,qword ptr [rcx+8]
00007ffe`4a379caf 48894df8        mov     qword ptr [rbp-8],rcx
00007ffe`4a379cb3 488b4df8        mov     rcx,qword ptr [rbp-8]
00007ffe`4a379cb7 48baf8597e4afe7f0000 mov rdx,7FFE4A7E59F8h
00007ffe`4a379cc1 e89a14435c      call    coreclr!JIT_ClassProfile32 (00007ffe`a67ab160)
00007ffe`4a379cc6 488b4df8        mov     rcx,qword ptr [rbp-8]
00007ffe`4a379cca 48894de8        mov     qword ptr [rbp-18h],rcx
00007ffe`4a379cce 488b4de8        mov     rcx,qword ptr [rbp-18h]
00007ffe`4a379cd2 488b45e8        mov     rax,qword ptr [rbp-18h]
00007ffe`4a379cd6 488b00          mov     rax,qword ptr [rax] ds:00000000`00000000=???????????????? ; <----
00007ffe`4a379cd9 488b4060        mov     rax,qword ptr [rax+60h]
00007ffe`4a379cdd ff5018          call    qword ptr [rax+18h]
00007ffe`4a379ce0 85c0            test    eax,eax

@kunalspathak
Copy link
Member Author

I just pushed the conditional checking as well as reset the other two variables.

@kunalspathak
Copy link
Member Author

I just pushed the conditional checking as well as reset the other two variables.

I verified locally and MSVC doesn't complain if using delete[] on nullptr.

@kunalspathak
Copy link
Member Author

alright, verified the AV in jit code that I shared earlier and I can see it repros even on my local build from main, so perhaps it is from the test itself.

@jkotas
Copy link
Member

jkotas commented Nov 30, 2023

Is the DeleteThreadStaticData method guaranteed to be called on the terminating thread? Can it be called on some other thread?

@kunalspathak
Copy link
Member Author

Is the DeleteThreadStaticData method guaranteed to be called on the terminating thread? Can it be called on some other thread?

Looking at the comments about OnThreadTerminate, it seems to be the case:

// One way OnThreadTerminate() is called is when the thread finishes doing useful
// work. This case always happens on the correct thread.
//
// The other way OnThreadTerminate() is called is during product shutdown. We do
// a "best effort" to eliminate all threads except the Main thread before shutdown
// happens. But there may be some background threads or external threads still
// running.
)

One of the caller DestroyThread of OnThreadTerminate has this assert:

_ASSERTE (th == GetThread());

but the fact that inside OnThreadTerminate() has this code just few lines after DeleteThreadStaticData() makes me wonder if the terminating thread is the one that calls DeleteThreadStaticData:

// Guaranteed to NOT be a shutdown case, because we tear down the heap before
// we tear down any threads during shutdown.
if (ThisThreadID == CurrentThreadID)

I don't want to disturb the CI, but once it is done, I can try adding this check inside DeleteThreadStaticData

@JulieLeeMSFT
Copy link
Member

@mangod9 for his input:

but the fact that inside OnThreadTerminate() has this code just few lines after DeleteThreadStaticData() makes me wonder if the terminating thread is the one that calls DeleteThreadStaticData:

@kunalspathak
Copy link
Member Author

kunalspathak commented Nov 30, 2023

but the fact that inside OnThreadTerminate() has this code just few lines after DeleteThreadStaticData() makes me wonder if the terminating thread is the one that calls DeleteThreadStaticData:

alright, I added an assert locally and I hit that assert.

     m_ThreadLocalBlock.FreeTable();

+    Thread *pCurrentThread = GetThreadNULLOk();
+    DWORD CurrentThreadID = pCurrentThread?pCurrentThread->GetThreadId():0;
+    DWORD ThisThreadID = GetThreadId();
+
+    _ASSERTE(CurrentThreadID == ThisThreadID);
Assert failure(PID 84504 [0x00014a18], Thread: 99544 [0x184d8]): CurrentThreadID == ThisThreadID

CORECLR! Thread::DeleteThreadStaticData + 0x13F (0x00007ffe`ad4f7b6f)
CORECLR! Thread::OnThreadTerminate + 0x22A (0x00007ffe`ad500f6a)
CORECLR! Thread::CleanupDetachedThreads + 0x219 (0x00007ffe`ad4f5429)
CORECLR! Thread::DoExtraWorkForFinalizer + 0x1A7 (0x00007ffe`ad4f9227)
CORECLR! FinalizerThread::FinalizerThreadWorker + 0x8C2 (0x00007ffe`ad610952)
CORECLR! ManagedThreadBase_DispatchInner + 0xB4 (0x00007ffe`ad4ffd84)
CORECLR! ManagedThreadBase_DispatchMiddle + 0xB6 (0x00007ffe`ad4ffe86)
CORECLR! ``ManagedThreadBase_DispatchOuter'::`11'::__Body::Run'::`5'::__Body::Run + 0x7B (0x00007ffe`ad50265b)
CORECLR! `ManagedThreadBase_DispatchOuter'::`11'::__Body::Run + 0x93 (0x00007ffe`ad502953)
CORECLR! ManagedThreadBase_DispatchOuter + 0xDD (0x00007ffe`ad4fff9d)
    File: D:\git\runtime2\src\coreclr\vm\threads.cpp Line: 7716
    Image: E:\helix\95362_2\correlation-payload\test-runtime-net9.0-windows-Release-x64\dotnet.exe

So given that I can guard the deletion only if we are on current thread, where can we delete the memory for detached threads? Is there code that detached threads execute before termination?

@kunalspathak
Copy link
Member Author

Is there code that detached threads execute before termination?

Seems DetachThread could be the right method.

_ASSERTE (this == GetThread());

@JulieLeeMSFT
Copy link
Member

JulieLeeMSFT commented Nov 30, 2023

@kouvel, please see Kunal's question. We need to backport this PR tonight.

@kunalspathak
Copy link
Member Author

/backport to release/8.0

Copy link
Contributor

Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/7042036275

Comment on lines +1788 to +1797
if (t_ThreadStatics.NonGCThreadStaticBlocks != nullptr)
{
delete[] t_ThreadStatics.NonGCThreadStaticBlocks;
t_ThreadStatics.NonGCThreadStaticBlocks = nullptr;
}
if (t_ThreadStatics.GCThreadStaticBlocks != nullptr)
{
delete[] t_ThreadStatics.GCThreadStaticBlocks;
t_ThreadStatics.GCThreadStaticBlocks = nullptr;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: None of these null checks are needed.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants