-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Alpine Linux PAL test suite -> 12 failures #6177
Comments
@jkotas, @janvorli, did some research and was pointed out by #musl folks over IRC that the first two sscanf failures are due to the corner case bug in [g]libc, where musl libc conforms with C99+POSIX standard, glibc bug was discovered late and won't get fixed due to the compatibility issues: https://sourceware.org/bugzilla/show_bug.cgi?id=1765 Self-contained example of the problem: https://ideone.com/ReFzeO To fix this, I can think of two options:
While this will fix the test bug .. is there anything that needs to be fixed in the src in order to have consistent behavior across the various libc implementations for the end-user? |
It should be ok to remove the tests for these two corner cases - it does not look like we depend on them anywhere. |
Thanks @jkotas. dotnet/coreclr#5873 presents the fix. Regarding VirtualAlloc, Alpine Linux' musl based CRT has the same memory protection as iOS, Windows Phone etc. that a process can either generate the code or execute it but not the both. In order to circumvent the protection at test-time, we need to PaX mask or patch the kernel memory pages: Syntax: (install paxctl first: paxctl -c test_binary
paxctl -psm test_binary All commands: paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test2/paltest_virtualprotect_test2
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test3/paltest_virtualprotect_test3
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test4/paltest_virtualprotect_test4
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test3/paltest_virtualalloc_test3
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test4/paltest_virtualalloc_test4
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test5/paltest_virtualalloc_test5
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test15/paltest_virtualalloc_test15
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test16/paltest_virtualalloc_test16
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test17/paltest_virtualalloc_test17
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test2/paltest_virtualprotect_test2
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test3/paltest_virtualprotect_test3
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test4/paltest_virtualprotect_test4
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test3/paltest_virtualalloc_test3
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test4/paltest_virtualalloc_test4
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test5/paltest_virtualalloc_test5
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test15/paltest_virtualalloc_test15
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test16/paltest_virtualalloc_test16
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test17/paltest_virtualalloc_test17 @barthalion, what I have understood is if we paxmark the aports package, we won't have to specify this manually and this will become a package time issue (where build machine would need paxctl installed). Is this correct? @jkotas, in the final product would we only need to paxmark one executable file: corerun? Or are there other executables which require memory protect off? |
With the PaX matter sorted out, we are left with one failure in PAL suite: The following test(s) failed:
threading/NamedMutex/test1/paltest_namedmutex_test1. Exit code: 1
PAL Test Results:
Passed: 807
Failed: 1 Probing further on the named mutex yielded: alp:~/coreclr$ ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/threading/NamedMutex/test1/paltest_namedmutex_test
1
'paltest_namedmutex_test1' failed at line 703. Expression: WaitForSingleObject(m, FailTimeoutMilliseconds) == WAIT_ABANDONED_0
'paltest_namedmutex_test1' failed at line 875. Expression: AbandonTests_Parent() |
Any process that hosts CoreCLR would need this, e.g. dotnet host (https://github.com/schellap/core-setup/tree/master/src/corehost) would need this too. |
@jkotas, noted, thanks! |
I have seen intermittent failures of This looks like a different issue from the failure on ARM. Based on the line numbers, this is the scenario:
|
Thank you very much @kouvel, for giving me the rundown! I am not aware of deep internals of multithreading on Alpine, however, I can dig deep and try to spot the issue (bydebugging the test case etc.). @jyoungyun on #6014 mentioned that setting Failing test: How to install Alpine, build CoreCLR on it and get to the point to repro this issue: |
@kouvel, replied to you on gist (in case GitHub hasn't sent notification for gist comment). |
Thanks @jasonwilliams200OK, that worked well. It looks like in these cases:
Then process B's wait is being released with ENOTRECOVERABLE instead of EOWNERDEAD. The former is a terminal situation so it wouldn't work. I'll see if I can write a test to detect this and switch to file locks. |
Thanks for looking into it @kouvel! Yesterday i captured the strace log for IRC conversation: https://gist.github.com/jasonwilliams200OK/eaa719ab096bba3e7b6d4bc29c5c8213 (forgot to post here) |
The log says the futex wait in process B was released successfully, but I didn't find anything about the mutex state changing at the time process A exits or at the time process B is resumed from the wait. |
ENOTRECOVERABLE should only happen if, after process A exits, another thread takes the mutex (getting EOWNERDEAD) and unlocks it without calling pthread_mutex_consistent on it. Perhaps there's a bug in the test where this is happening or a bug in musl causing it. I'll check to see if there are any obvious bugs on our side (musl). |
In this test, process B is the only other thread/process waiting on the mutex at the time process A exits, and it seems to be getting ENOTRECOVERABLE right away. I'm working on making a small repro for this, will post it here once I have it. |
If it looks like a bug on our side and you have a small (plain C) test case to reproduce it, that would be great and I can probably find and fix the issue right away. |
Update: Void Linux has two variants; glibc and musl-libc. After getting the last missing package lttng-ust package ported (https://github.com/voidlinux/void-packages/issues/4377), I installed musl x86_64, successfully built CoreCLR master (without any patch) and ran the PAL tests. It yields ditto results: [root@void coreclr]# src/pal/tests/palsuite/runpaltests.sh $(pwd)/bin/obj/Linux.x64.Debug
...
...
...'paltest_namedmutex_test1' failed at line 703. Expression: WaitForSingleObject(m, FailTimeoutMilliseconds) == WAIT_ABANDONED_0
'paltest_namedmutex_test1' failed at line 875. Expression: AbandonTests_Parent()
FAILED: threading/NamedMutex/test1/paltest_namedmutex_test1. Exit code: 1
...
...
Finished running PAL tests.
The following test(s) failed:
threading/NamedMutex/test1/paltest_namedmutex_test1. Exit code: 1
PAL Test Results:
Passed: 807
Failed: 1 |
Do you talk about MutualExclusionTests of namedmutex pal tests? |
@richfelker, here is a test case: #include <sys/mman.h>
#include <sys/time.h>
#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <new>
using namespace std;
struct Shm
{
pthread_mutex_t syncMutex;
pthread_cond_t syncCondition;
pthread_mutex_t robustMutex;
int conditionValue;
Shm() : conditionValue(0)
{
}
} *shm;
int GetFailTimeoutTime(struct timespec *timeoutTimeRef)
{
int getTimeResult = clock_gettime(CLOCK_REALTIME, timeoutTimeRef);
if (getTimeResult != 0)
{
struct timeval tv;
getTimeResult = gettimeofday(&tv, NULL);
if (getTimeResult != 0)
return __LINE__;
timeoutTimeRef->tv_sec = tv.tv_sec;
timeoutTimeRef->tv_nsec = tv.tv_usec * 1000;
}
timeoutTimeRef->tv_sec += 30;
return 0;
}
int WaitForConditionValue(int desiredConditionValue)
{
struct timespec timeoutTime;
int errorLine = GetFailTimeoutTime(&timeoutTime);
if (errorLine != 0)
return errorLine;
if (pthread_mutex_timedlock(&shm->syncMutex, &timeoutTime) != 0)
return __LINE__;
if (shm->conditionValue != desiredConditionValue)
{
errorLine = GetFailTimeoutTime(&timeoutTime);
if (errorLine != 0)
return errorLine;
if (pthread_cond_timedwait(&shm->syncCondition, &shm->syncMutex, &timeoutTime) != 0)
return __LINE__;
if (shm->conditionValue != desiredConditionValue)
return __LINE__;
}
if (pthread_mutex_unlock(&shm->syncMutex) != 0)
return __LINE__;
return 0;
}
int SetConditionValue(int newConditionValue)
{
struct timespec timeoutTime;
int errorLine = GetFailTimeoutTime(&timeoutTime);
if (errorLine != 0)
return __LINE__;
if (pthread_mutex_timedlock(&shm->syncMutex, &timeoutTime) != 0)
return __LINE__;
shm->conditionValue = newConditionValue;
if (pthread_cond_signal(&shm->syncCondition) != 0)
return __LINE__;
if (pthread_mutex_unlock(&shm->syncMutex) != 0)
return __LINE__;
return 0;
}
void DoTest_Child();
int DoTest()
{
// Map some shared memory
void *shmBuffer = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
if (shmBuffer == MAP_FAILED)
return __LINE__;
shm = new(shmBuffer) Shm;
// Create sync mutex
pthread_mutexattr_t syncMutexAttributes;
if (pthread_mutexattr_init(&syncMutexAttributes) != 0)
return __LINE__;
if (pthread_mutexattr_setpshared(&syncMutexAttributes, PTHREAD_PROCESS_SHARED) != 0)
return __LINE__;
if (pthread_mutex_init(&shm->syncMutex, &syncMutexAttributes) != 0)
return __LINE__;
if (pthread_mutexattr_destroy(&syncMutexAttributes) != 0)
return __LINE__;
// Create sync condition
pthread_condattr_t syncConditionAttributes;
if (pthread_condattr_init(&syncConditionAttributes) != 0)
return __LINE__;
if (pthread_condattr_setpshared(&syncConditionAttributes, PTHREAD_PROCESS_SHARED) != 0)
return __LINE__;
if (pthread_cond_init(&shm->syncCondition, &syncConditionAttributes) != 0)
return __LINE__;
if (pthread_condattr_destroy(&syncConditionAttributes) != 0)
return __LINE__;
// Create the robust mutex that will be tested
pthread_mutexattr_t robustMutexAttributes;
if (pthread_mutexattr_init(&robustMutexAttributes) != 0)
return __LINE__;
if (pthread_mutexattr_setpshared(&robustMutexAttributes, PTHREAD_PROCESS_SHARED) != 0)
return __LINE__;
if (pthread_mutexattr_setrobust(&robustMutexAttributes, PTHREAD_MUTEX_ROBUST) != 0)
return __LINE__;
if (pthread_mutex_init(&shm->robustMutex, &robustMutexAttributes) != 0)
return __LINE__;
if (pthread_mutexattr_destroy(&robustMutexAttributes) != 0)
return __LINE__;
// Start child test process
int error = fork();
if (error == -1)
return __LINE__;
if (error == 0)
{
DoTest_Child();
return -1;
}
// Wait for child to take a lock
WaitForConditionValue(1);
// Wait to try to take a lock. Meanwhile, child abandons the robust mutex.
struct timespec timeoutTime;
int errorLine = GetFailTimeoutTime(&timeoutTime);
if (errorLine != 0)
return errorLine;
error = pthread_mutex_timedlock(&shm->robustMutex, &timeoutTime);
if (error != EOWNERDEAD) // expect to be notified that the robust mutex was abandoned
{
printf("pthread_mutex_timedlock error: %d\n", error);
printf("ENOTRECOVERABLE: %d\n", ENOTRECOVERABLE);
return __LINE__;
}
if (pthread_mutex_consistent(&shm->robustMutex) != 0)
return __LINE__;
if (pthread_mutex_unlock(&shm->robustMutex) != 0)
return __LINE__;
if (pthread_mutex_destroy(&shm->robustMutex) != 0)
return __LINE__;
return 0;
}
void DoTest_Child()
{
int errorLine;
do
{
// Lock the robust mutex
struct timespec timeoutTime;
errorLine = GetFailTimeoutTime(&timeoutTime);
if (errorLine != 0)
break;
if (pthread_mutex_timedlock(&shm->robustMutex, &timeoutTime) != 0)
{
errorLine = __LINE__;
break;
}
// Notify parent that robust mutex is locked
errorLine = SetConditionValue(1);
if (errorLine != 0)
break;
// Wait a short period to let the parent block on waiting for a lock
sleep(1);
// Abandon the mutex by exiting the thread while holding the lock. Parent's wait should be released by EOWNERDEAD.
errorLine = 0;
} while (false);
printf("child: %d\n", errorLine);
}
int main()
{
int result = DoTest();
if (result >= 0)
printf("parent: %d\n", result);
return 0;
} On Ubuntu x64 I'm seeing the expected output:
On Alpine Linux x64, I'm seeing the following:
This indicates that pthread_mutex_timedlock is returning ENOTRECOVERABLE without first returning EOWNERDEAD with lock ownership. It seems to work fine cross-thread, but not working cross-process. |
Both of the attempts to lock the mutex in the parent process should fail with timeout because the child process owns the lock: TestAssert(WaitForSingleObject(m, 0) == WAIT_TIMEOUT); // try to lock the mutex without waiting
TestAssert(WaitForSingleObject(m, g_expectedTimeoutMilliseconds) == WAIT_TIMEOUT); // try to lock the mutex with a timeout
If the above is working as expected, then it should follow this path to completion: // currently, child is blocked here:
YieldToParent(parentEvents, childEvents, ei); // parent attempts to lock/release, and fails
// parent:
...
TestAssert(YieldToChild(parentEvents, childEvents, ei)); // child releases the lock
// child:
TestAssert(m.Release()); // release the lock
UninitializeChild(childRunningEvent, parentEvents, childEvents);
return 0;
// parent:
TestAssert(m.Release());
UninitializeParent(testName, parentEvents);
return true;
|
@kouvel |
Ah ok, yea that test was passing on Alpine Linux, seems like a different issue from the one you saw. |
…e more cases Workaround for #5456: - Sometimes, a timed wait operation is not getting released, causing a hang - Due to the hang, it is not possible to detect this issue with code - Temporarily disabled the use of pthread process-shared mutexes on ARM/ARM64. File locks will be used instead. Workaround for #5872: - On Alpine Linux, a pthread process-shared robust mutex is detecting the case where a process abandons the mutex when it exits while holding the lock, but is putting the mutex into an unrecoverable state (ENOTRECOVERABLE) instead of assigning lock ownership to the next thread that is released from a wait for a lock and notifying of abandonment (EOWNERDEAD). - Added a test case to detect this issue, to have it use file locks instead Close #5456
Fixed upstream in musl: https://git.musl-libc.org/cgit/musl/commit/?id=384d103d94dba0472a587861f67d7ed6e8955f86 |
Great, thanks @richfelker |
@jasonwilliams200OK FYI, @ncopa cherry-picked this fix to Alpine Edge so test should pass with up-to-date packages. |
Thanks a lot for the fix in musl-libc @richfelker and @ncopa for quickly providing the patch in edge/main ! 🎉 👍 ⭐ 🎆 I think @kouvel's workaround would also be useful if someone is building with older version of musl-libc (but we should probably discourage / warn the dev to update to musl-1.1.14-r11+)? @barthalion, @kouvel, I have updated musl to musl-1.1.14-r11 by uncommenting the Finished running PAL tests.
PAL Test Results:
Passed: 808
Failed: 0 This issue is resolved. Now that PAL test, mostly focused on platform cruntime, passes; I will run the full managed .NET assemblies (Base Class Library) test suite next on Alpine Linux. |
…e more cases Workaround for #5456: - Sometimes, a timed wait operation is not getting released, causing a hang - Due to the hang, it is not possible to detect this issue with code - Temporarily disabled the use of pthread process-shared mutexes on ARM/ARM64. File locks will be used instead. Workaround for #5872: - On Alpine Linux, a pthread process-shared robust mutex is detecting the case where a process abandons the mutex when it exits while holding the lock, but is putting the mutex into an unrecoverable state (ENOTRECOVERABLE) instead of assigning lock ownership to the next thread that is released from a wait for a lock and notifying of abandonment (EOWNERDEAD). - Added a test case to detect this issue, to have it use file locks instead Close #5456
…e more cases Workaround for #5456: - Sometimes, a timed wait operation is not getting released, causing a hang - Due to the hang, it is not possible to detect this issue with code - Temporarily disabled the use of pthread process-shared mutexes on ARM/ARM64. File locks will be used instead. Workaround for #5872: - On Alpine Linux, a pthread process-shared robust mutex is detecting the case where a process abandons the mutex when it exits while holding the lock, but is putting the mutex into an unrecoverable state (ENOTRECOVERABLE) instead of assigning lock ownership to the next thread that is released from a wait for a lock and notifying of abandonment (EOWNERDEAD). - Added a test case to detect this issue, to have it use file locks instead Close #5456
Full report: https://gist.github.com/jasonwilliams200OK/b7872e0f45f2f74830dd95729bfad40b
Finished running PAL tests. The following test(s) failed: c_runtime/sscanf/test14/paltest_sscanf_test14. Exit code: 1 c_runtime/sscanf/test15/paltest_sscanf_test15. Exit code: 1 filemapping_memmgt/VirtualAlloc/test15/paltest_virtualalloc_test15. Exit code: 1 filemapping_memmgt/VirtualAlloc/test16/paltest_virtualalloc_test16. Exit code: 1 filemapping_memmgt/VirtualAlloc/test17/paltest_virtualalloc_test17. Exit code: 1 filemapping_memmgt/VirtualAlloc/test3/paltest_virtualalloc_test3. Exit code: 1 filemapping_memmgt/VirtualAlloc/test4/paltest_virtualalloc_test4. Exit code: 1 filemapping_memmgt/VirtualAlloc/test5/paltest_virtualalloc_test5. Exit code: 1 filemapping_memmgt/VirtualProtect/test2/paltest_virtualprotect_test2. Exit code: 1 filemapping_memmgt/VirtualProtect/test3/paltest_virtualprotect_test3. Exit code: 1 filemapping_memmgt/VirtualProtect/test4/paltest_virtualprotect_test4. Exit code: 1 threading/NamedMutex/test1/paltest_namedmutex_test1. Exit code: 1 PAL Test Results: Passed: 796 Failed: 12
The text was updated successfully, but these errors were encountered: