-
Notifications
You must be signed in to change notification settings - Fork 582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
application unable to process requests #273
Comments
This appears to happen as a result of node.js application recycle when iisnode detects that node.js files on disk have changed. Hypothesis: stack traces suggest a deadlock between adding and removing entries in CFileWatcher. There is a possible logical issue in removing file watches in https://github.com/tjanczuk/iisnode/blob/master/src/iisnode/cfilewatcher.cpp#L453. This line of code may cause incomplete removal of WatchedFile entires associated with a given CNodeApplication. (There may be many WatchedFile entries within and across WatchedDirectories associated with a single CNodeApplication). Since the CNodeApplication is later disposed, subsequent file changed events may be trying to acquire a lock over a disposed CRITICAL_SECTION, possibly leading to a deadlock. Cases that cause an AV might have escaped detection since they result in IIS spinning another instance of the worker process. |
This is a classic deadlock situation that can arise when a new message arrives during recycling of a node.js application as a result of changes in the watched files. There are two locks in the scenario: FileWatcher lock (FW) and Application Manager lock (AM). The file watcher thread holds the FW lock to scan files for changes. It detects a file has changed, and under FW lock, calls into the CNodeApplicationManager to recycle the affected application. The CNodeApplicationManager attempts to acquire the AM lock to perform that operation:
In the meantime, a new message arrives on another thread. That thread acquires the AW lock to create a new node.js application to process it. Under the AW lock, as part of the initialization process it attempts to acquire the FW lock to register a new file watch:
So we have one thread waiting on AW lock under FW, and another waiting on FW lock under AW. Deadlock. Open question: Why do application exist in the CFileWatcher that CNodeApplicationManager is unaware of (since it attempts to create a new CNodeApplication for the incoming message? Is that a side effect of the issue mentioned in the previous comment of not cleaning up CFileWatcher structres properly in https://github.com/tjanczuk/iisnode/blob/master/src/iisnode/cfilewatcher.cpp#L453? |
Repro steps: server.js:
(self-killing the process every 1s is not related to the issue but appears to shorten the time to repro) iisnode.yml:
update.js:
Start the updater and use curl as a stress test client:
You will see curl outputting PIDs of node.exe processes interlaced with some HTTP 500 messages, which is expected. When curl appears to hang, the w3wp.exe is in the deadlocked state. This takes a few minutes on my box. Attach VS and party on. |
The fix in e907682 fixes the problem for configurations with only one node.js application within an IIS worker process (i.e. one entry point to a node.js applicaiton, e.g. server.js). This is the vast majority of cases. In fact, I am unaware of anyone utilizing the ability of iisnode to host more than one node.js application in a single IIS worker process (i.e. server1.js and server2.js). This fix fixes the deadlock issue for deployments with single node.js application running in an IIS worker process. It does not solve the deadlock issue for deployments with multiple node.js applications running in a single IIS worker process. Such fix would be more risky in terms of regressions to other scenarios. Given no evidence anyone needs it I am postponing this fix indefinitely. Please reactivate if you hit the issue. |
The fix seems to have resolved the deadlock problem, but uncovered an AV that occurs about ~15 mins into stress testing the app above by auto-updating it every 1.3 seconds. Below are the details: FAULTING_IP: EXCEPTION_RECORD: ffffffff -- (.exr 0xffffffffffffffff) FAULTING_THREAD: 0000148c PROCESS_NAME: w3wp.exe ERROR_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s. EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s. EXCEPTION_PARAMETER1: 00000000 EXCEPTION_PARAMETER2: 00000018 READ_ADDRESS: 00000018 FOLLOWUP_IP: NTGLOBALFLAG: 2 APPLICATION_VERIFIER_FLAGS: 0 APP: w3wp.exe MANAGED_STACK: !dumpstack -EE MANAGED_BITNESS_MISMATCH: BUGCHECK_STR: APPLICATION_FAULT_NULL_CLASS_PTR_READ_STACK_POINTER_MISMATCH PRIMARY_PROBLEM_CLASS: NULL_CLASS_PTR_READ_STACK_POINTER_MISMATCH DEFAULT_BUCKET_ID: NULL_CLASS_PTR_READ_STACK_POINTER_MISMATCH LAST_CONTROL_TRANSFER: from 71360989 to 7135aa6b STACK_TEXT: FAULTING_SOURCE_LINE: e:\projects\iisnode\src\iisnode\cnodeeventprovider.cpp FAULTING_SOURCE_FILE: e:\projects\iisnode\src\iisnode\cnodeeventprovider.cpp FAULTING_SOURCE_LINE_NUMBER: 63 SYMBOL_STACK_INDEX: 0 SYMBOL_NAME: iisnode!CNodeEventProvider::Log+1b FOLLOWUP_NAME: MachineOwner MODULE_NAME: iisnode IMAGE_NAME: iisnode.dll DEBUG_FLR_IMAGE_TIMESTAMP: 5180077c STACK_COMMAND: ~15s ; kb FAILURE_BUCKET_ID: NULL_CLASS_PTR_READ_STACK_POINTER_MISMATCH_c0000005_iisnode.dll!CNodeEventProvider::Log BUCKET_ID: APPLICATION_FAULT_NULL_CLASS_PTR_READ_STACK_POINTER_MISMATCH_iisnode!CNodeEventProvider::Log+1b WATSON_STAGEONE_URL: http://watson.microsoft.com/StageOne/w3wp_exe/8_0_9200_16384/5010885f/iisnode_dll/0_2_5_0/5180077c/c0000005/0000aa6b.htm?Retriage=1 Followup: MachineOwner 0:015> !uniqstack -bvpn . 14 Id: ad4.10e8 Suspend: 1 Teb: fee4e000 Unfrozen ChildEBP RetAddr Args to Child00 0382f6e0 776d1129 0000049c 00000001 00000000 ntdll!NtWaitForSingleObject+0xc . 15 Id: ad4.148c Suspend: 1 Teb: fee4b000 Unfrozen ChildEBP RetAddr Args to Child00 038afb7c 71360989 7138fa28 00000005 00a80b04 iisnode!CNodeEventProvider::Log+0x1b [e:\projects\iisnode\src\iisnode\cnodeeventprovider.cpp @ 63] . 16 Id: ad4.d3c Suspend: 1 Teb: fee45000 Unfrozen ChildEBP RetAddr Args to Child00 039afb80 77718092 000004a8 039afbe8 039afbcc ntdll!NtRemoveIoCompletion+0xc . 18 Id: ad4.1570 Suspend: 1 Teb: fee6c000 Unfrozen ChildEBP RetAddr Args to Child00 006cfb38 776dc752 00000001 01370ce8 00000000 ntdll!NtWaitForMultipleObjects+0xc . 23 Id: ad4.1184 Suspend: 1 Teb: fedd0000 Unfrozen ChildEBP RetAddr Args to Child00 0407fdd0 776d1129 00002bac 00000001 00000000 ntdll!NtWaitForSingleObject+0xc |
Closing this issue and tracking the new issue in #274 |
The node.js app is unable to process requests. Below is the dump of w3wp.exe threads. It appears iisnode is blocked waiting on critical sections which prevents processing of requests.
ntdll!NtWaitForKeyedEvent+15
ntdll!RtlAcquireSRWLockExclusive+12e
iisnode!CNodeApplicationManager::RecycleApplication+45 c:\projects\iisnode\src\iisnode\cnodeapplicationmanager.cpp @ 298 + a
iisnode!CNodeApplicationManager::OnScriptModified+11 c:\projects\iisnode\src\iisnode\cnodeapplicationmanager.cpp @ 336
iisnode!CFileWatcher::ScanDirectory+1c0 c:\projects\iisnode\src\iisnode\cfilewatcher.cpp @ 565 + 19
iisnode!CFileWatcher::Worker+bb c:\projects\iisnode\src\iisnode\cfilewatcher.cpp @ 504
msvcr100!_endthreadex+3f
msvcr100!_endthreadex+ce
kernel32!BaseThreadInitThunk+e
ntdll!__RtlUserThreadStart+70
ntdll!_RtlUserThreadStart+1b
We got other threads like below stuck wanting to get that critical section lock
ntdll!ZwWaitForSingleObject+15
ntdll!RtlpWaitOnCriticalSection+13e
ntdll!RtlEnterCriticalSection+150
iisnode!CFileWatcher::RemoveWatch+3f c:\projects\iisnode\src\iisnode\cfilewatcher.cpp @ 408 + a
iisnode!CNodeApplication::~CNodeApplication+13 c:\projects\iisnode\src\iisnode\cnodeapplication.cpp @ 11 + 10
iisnode!CNodeProcessManager::GracefulShutdown+e5 c:\projects\iisnode\src\iisnode\cnodeprocessmanager.cpp @ 343
msvcr100!_endthreadex+3f
msvcr100!_endthreadex+ce
kernel32!BaseThreadInitThunk+e
ntdll!__RtlUserThreadStart+70
ntdll!_RtlUserThreadStart+1b
ntdll!ZwWaitForSingleObject+15
ntdll!RtlpWaitOnCriticalSection+13e
ntdll!RtlEnterCriticalSection+150
iisnode!CFileWatcher::WatchFiles+33b c:\projects\iisnode\src\iisnode\cfilewatcher.cpp @ 139 + 10
iisnode!CNodeApplication::Initialize+fc c:\projects\iisnode\src\iisnode\cnodeapplication.cpp @ 49 + 26
iisnode!CNodeApplicationManager::GetOrCreateNodeApplicationCore+ad c:\projects\iisnode\src\iisnode\cnodeapplicationmanager.cpp @ 380 + c
iisnode!CNodeApplicationManager::GetOrCreateNodeApplication+98 c:\projects\iisnode\src\iisnode\cnodeapplicationmanager.cpp @ 653 + 11
iisnode!CNodeApplicationManager::Dispatch+111 c:\projects\iisnode\src\iisnode\cnodeapplicationmanager.cpp @ 205 + 12
iisnode!CNodeHttpModule::OnExecuteRequestHandler+97 c:\projects\iisnode\src\iisnode\cnodehttpmodule.cpp @ 26 + 11
iiscore!NOTIFICATION_CONTEXT::RequestDoWork+128
iiscore!NOTIFICATION_CONTEXT::CallModulesInternal+305
iiscore!NOTIFICATION_CONTEXT::CallModules+28
iiscore!W3_CONTEXT::DoStateRequestExecuteHandler+36
iiscore!W3_CONTEXT::DoWork+d7
iiscore!W3_MAIN_CONTEXT::StartNotificationLoop+49
iiscore!W3_CONTEXT::ExecuteRequest+20c
rewrite!RegisterModule+4fd4
rewrite!RegisterModule+817e
rewrite!RegisterModule+1c24f
rewrite!RegisterModule+3e65
rewrite!RegisterModule+5eb
iiscore!NOTIFICATION_CONTEXT::RequestDoWork+d5
iiscore!NOTIFICATION_CONTEXT::CallModulesInternal+305
iiscore!NOTIFICATION_CONTEXT::CallModules+28
iiscore!W3_CONTEXT::DoStateRequestStart+63
iiscore!W3_CONTEXT::DoWork+8d
iiscore!W3_MAIN_CONTEXT::StartNotificationLoop+49
iiscore!W3_MAIN_CONTEXT::OnNewRequest+47
w3dt!UL_NATIVE_REQUEST::DoStateProcess+26
w3dt!UL_NATIVE_REQUEST::DoWork+60
w3dt!OverlappedCompletionRoutine+1a
w3tp!THREAD_POOL_DATA::ThreadPoolThread+89
w3tp!THREAD_POOL_DATA::ThreadPoolThread+24
w3tp!THREAD_MANAGER::ThreadManagerThread+39
kernel32!BaseThreadInitThunk+e
ntdll!__RtlUserThreadStart+70
ntdll!_RtlUserThreadStart+1b
ntdll!NtWaitForKeyedEvent+15
ntdll!RtlAcquireSRWLockShared+105
iisnode!CNodeApplicationManager::Dispatch+96 c:\projects\iisnode\src\iisnode\cnodeapplicationmanager.cpp @ 186 + a
iisnode!CNodeHttpModule::OnExecuteRequestHandler+97 c:\projects\iisnode\src\iisnode\cnodehttpmodule.cpp @ 26 + 11
iiscore!NOTIFICATION_CONTEXT::RequestDoWork+128
iiscore!NOTIFICATION_CONTEXT::CallModulesInternal+305
iiscore!NOTIFICATION_CONTEXT::CallModules+28
iiscore!W3_CONTEXT::DoStateRequestExecuteHandler+36
iiscore!W3_CONTEXT::DoWork+d7
iiscore!W3_MAIN_CONTEXT::StartNotificationLoop+49
iiscore!W3_CONTEXT::ExecuteRequest+20c
rewrite!RegisterModule+4fd4
rewrite!RegisterModule+22aa9
rewrite!RegisterModule+1c3f2
rewrite!RegisterModule+3e65
rewrite!RegisterModule+5eb
iiscore!NOTIFICATION_CONTEXT::RequestDoWork+d5
iiscore!NOTIFICATION_CONTEXT::CallModulesInternal+305
iiscore!NOTIFICATION_CONTEXT::CallModules+28
iiscore!W3_CONTEXT::DoStateRequestStart+63
iiscore!W3_CONTEXT::DoWork+8d
iiscore!W3_MAIN_CONTEXT::StartNotificationLoop+49
iiscore!W3_MAIN_CONTEXT::OnNewRequest+47
w3dt!UL_NATIVE_REQUEST::DoStateProcess+26
w3dt!UL_NATIVE_REQUEST::DoWork+60
w3dt!OverlappedCompletionRoutine+1a
w3tp!THREAD_POOL_DATA::ThreadPoolThread+89
w3tp!THREAD_POOL_DATA::ThreadPoolThread+24
w3tp!THREAD_MANAGER::ThreadManagerThread+39
kernel32!BaseThreadInitThunk+e
ntdll!__RtlUserThreadStart+70
ntdll!_RtlUserThreadStart+1b
The text was updated successfully, but these errors were encountered: