Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alpine Linux PAL test suite -> 12 failures #6177

Closed
ghost opened this issue Jun 19, 2016 · 25 comments
Closed

Alpine Linux PAL test suite -> 12 failures #6177

ghost opened this issue Jun 19, 2016 · 25 comments

Comments

@ghost
Copy link

ghost commented Jun 19, 2016

Full report: https://gist.github.com/jasonwilliams200OK/b7872e0f45f2f74830dd95729bfad40b

Finished running PAL tests.

The following test(s) failed:
c_runtime/sscanf/test14/paltest_sscanf_test14. Exit code: 1
c_runtime/sscanf/test15/paltest_sscanf_test15. Exit code: 1
filemapping_memmgt/VirtualAlloc/test15/paltest_virtualalloc_test15. Exit code: 1
filemapping_memmgt/VirtualAlloc/test16/paltest_virtualalloc_test16. Exit code: 1
filemapping_memmgt/VirtualAlloc/test17/paltest_virtualalloc_test17. Exit code: 1
filemapping_memmgt/VirtualAlloc/test3/paltest_virtualalloc_test3. Exit code: 1
filemapping_memmgt/VirtualAlloc/test4/paltest_virtualalloc_test4. Exit code: 1
filemapping_memmgt/VirtualAlloc/test5/paltest_virtualalloc_test5. Exit code: 1
filemapping_memmgt/VirtualProtect/test2/paltest_virtualprotect_test2. Exit code: 1
filemapping_memmgt/VirtualProtect/test3/paltest_virtualprotect_test3. Exit code: 1
filemapping_memmgt/VirtualProtect/test4/paltest_virtualprotect_test4. Exit code: 1
threading/NamedMutex/test1/paltest_namedmutex_test1. Exit code: 1

PAL Test Results:
  Passed: 796
  Failed: 12
@ghost
Copy link

ghost commented Jun 19, 2016

@jkotas, @janvorli, did some research and was pointed out by #musl folks over IRC that the first two sscanf failures are due to the corner case bug in [g]libc, where musl libc conforms with C99+POSIX standard, glibc bug was discovered late and won't get fixed due to the compatibility issues:

https://sourceware.org/bugzilla/show_bug.cgi?id=1765
https://sourceware.org/bugzilla/show_bug.cgi?id=12701

Self-contained example of the problem: https://ideone.com/ReFzeO

To fix this, I can think of two options:

  • Remove those two corner cases from test14 and test15 files:
    • DoFloatTest("1234567890.0123456789E", "%e", 1234567936); &
    • DoFloatTest("1234567890.0123456789e", "%E", 1234567936);
  • Special case for musl #if PAL_CLIB_MUSL after defining it in cmake

While this will fix the test bug .. is there anything that needs to be fixed in the src in order to have consistent behavior across the various libc implementations for the end-user?

@jkotas
Copy link
Member

jkotas commented Jun 19, 2016

It should be ok to remove the tests for these two corner cases - it does not look like we depend on them anywhere.

@ghost
Copy link

ghost commented Jun 19, 2016

Thanks @jkotas. dotnet/coreclr#5873 presents the fix.

Regarding VirtualAlloc, Alpine Linux' musl based CRT has the same memory protection as iOS, Windows Phone etc. that a process can either generate the code or execute it but not the both. In order to circumvent the protection at test-time, we need to PaX mask or patch the kernel memory pages:

Syntax: (install paxctl first: apk add paxctl)

paxctl -c test_binary
paxctl -psm test_binary

All commands:

paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test2/paltest_virtualprotect_test2
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test3/paltest_virtualprotect_test3
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test4/paltest_virtualprotect_test4

paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test3/paltest_virtualalloc_test3
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test4/paltest_virtualalloc_test4
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test5/paltest_virtualalloc_test5
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test15/paltest_virtualalloc_test15
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test16/paltest_virtualalloc_test16
paxctl -c ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test17/paltest_virtualalloc_test17

paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test2/paltest_virtualprotect_test2
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test3/paltest_virtualprotect_test3
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualProtect/test4/paltest_virtualprotect_test4

paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test3/paltest_virtualalloc_test3
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test4/paltest_virtualalloc_test4
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test5/paltest_virtualalloc_test5
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test15/paltest_virtualalloc_test15
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test16/paltest_virtualalloc_test16
paxctl -psm ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/filemapping_memmgt/VirtualAlloc/test17/paltest_virtualalloc_test17

@barthalion, what I have understood is if we paxmark the aports package, we won't have to specify this manually and this will become a package time issue (where build machine would need paxctl installed). Is this correct?

@jkotas, in the final product would we only need to paxmark one executable file: corerun? Or are there other executables which require memory protect off?

@ghost
Copy link

ghost commented Jun 19, 2016

With the PaX matter sorted out, we are left with one failure in PAL suite:

The following test(s) failed:
threading/NamedMutex/test1/paltest_namedmutex_test1. Exit code: 1

PAL Test Results:
  Passed: 807
  Failed: 1

Probing further on the named mutex yielded:

alp:~/coreclr$ ./bin/obj/Linux.x64.Debug/src/pal/tests/palsuite/threading/NamedMutex/test1/paltest_namedmutex_test
1
'paltest_namedmutex_test1' failed at line 703. Expression: WaitForSingleObject(m, FailTimeoutMilliseconds) == WAIT_ABANDONED_0
'paltest_namedmutex_test1' failed at line 875. Expression: AbandonTests_Parent()

@jkotas
Copy link
Member

jkotas commented Jun 19, 2016

in the final product would we only need to paxmark one executable file: corerun?

Any process that hosts CoreCLR would need this, e.g. dotnet host (https://github.com/schellap/core-setup/tree/master/src/corehost) would need this too.

@ghost
Copy link

ghost commented Jun 19, 2016

@jkotas, noted, thanks!
Is paltest_namedmutex_test1 disabled on other linux such as CentOS?
I didn't find it running in CI logs of CentOS: http://dotnet-ci.cloudapp.net/job/dotnet_coreclr/job/master/job/debug_centos7.1_prtest/2509/consoleFull

@kouvel
Copy link
Member

kouvel commented Jun 20, 2016

I have seen intermittent failures of paltest_namedmutex_test1 on CentOS before in the CI, so it must be running. The PAL test runner doesn't output the name of every test that is run.

This looks like a different issue from the failure on ARM. Based on the line numbers, this is the scenario:

  • Parent process creates a named mutex
  • Child process opens the mutex
  • Child process locks the mutex
  • Parent waits to lock the mutex
  • Child abandons the mutex by exiting the process without closing the mutex
  • This should trigger the pthread robust mutex's abandon detection and release the parent process' wait with EOWNERDEAD. That part doesn't seem to be happening, hence the parent process times out waiting and fails the test.

@ghost
Copy link

ghost commented Jun 20, 2016

Thank you very much @kouvel, for giving me the rundown! I am not aware of deep internals of multithreading on Alpine, however, I can dig deep and try to spot the issue (bydebugging the test case etc.). @jyoungyun on #6014 mentioned that setting PTHREAD_PRIO_INHERIT fixed the issue for her on ARM device. I searched that on the internet in Alpine Linux context and found this recent pull request by @bodgit: https://github.com/unbit/uwsgi/pull/1210/files. Does this change give any relevant pointer for our scenario? If not, I can dive deep and find out more what's going on.

Failing test:
https://github.com/dotnet/coreclr/blob/ee68078/src/pal/tests/palsuite/threading/NamedMutex/test1/namedmutex.cpp

How to install Alpine, build CoreCLR on it and get to the point to repro this issue:
https://gist.github.com/jasonwilliams200OK/7d6f5594d3bf697a27c9c1036d349fce

@ghost
Copy link

ghost commented Jun 23, 2016

@kouvel, replied to you on gist (in case GitHub hasn't sent notification for gist comment).

@kouvel
Copy link
Member

kouvel commented Jun 24, 2016

Thanks @jasonwilliams200OK, that worked well. It looks like in these cases:

  • process A has the mutex locked
  • process B waits on the mutex
  • process A exits cleanly or abruptly (with kill for instance) without releasing the lock

Then process B's wait is being released with ENOTRECOVERABLE instead of EOWNERDEAD. The former is a terminal situation so it wouldn't work. I'll see if I can write a test to detect this and switch to file locks.

@ghost
Copy link

ghost commented Jun 24, 2016

Thanks for looking into it @kouvel! Yesterday i captured the strace log for IRC conversation: https://gist.github.com/jasonwilliams200OK/eaa719ab096bba3e7b6d4bc29c5c8213 (forgot to post here)

@kouvel
Copy link
Member

kouvel commented Jun 24, 2016

The log says the futex wait in process B was released successfully, but I didn't find anything about the mutex state changing at the time process A exits or at the time process B is resumed from the wait.

@richfelker
Copy link

ENOTRECOVERABLE should only happen if, after process A exits, another thread takes the mutex (getting EOWNERDEAD) and unlocks it without calling pthread_mutex_consistent on it. Perhaps there's a bug in the test where this is happening or a bug in musl causing it. I'll check to see if there are any obvious bugs on our side (musl).

@kouvel
Copy link
Member

kouvel commented Jun 24, 2016

In this test, process B is the only other thread/process waiting on the mutex at the time process A exits, and it seems to be getting ENOTRECOVERABLE right away. I'm working on making a small repro for this, will post it here once I have it.

@richfelker
Copy link

If it looks like a bug on our side and you have a small (plain C) test case to reproduce it, that would be great and I can probably find and fix the issue right away.

@ghost
Copy link

ghost commented Jun 27, 2016

Update: Void Linux has two variants; glibc and musl-libc. After getting the last missing package lttng-ust package ported (https://github.com/voidlinux/void-packages/issues/4377), I installed musl x86_64, successfully built CoreCLR master (without any patch) and ran the PAL tests. It yields ditto results:

[root@void coreclr]# src/pal/tests/palsuite/runpaltests.sh $(pwd)/bin/obj/Linux.x64.Debug
...
...
...'paltest_namedmutex_test1' failed at line 703. Expression: WaitForSingleObject(m, FailTimeoutMilliseconds) == WAIT_ABANDONED_0
'paltest_namedmutex_test1' failed at line 875. Expression: AbandonTests_Parent()

FAILED: threading/NamedMutex/test1/paltest_namedmutex_test1. Exit code: 1
...
...
Finished running PAL tests.

The following test(s) failed:
threading/NamedMutex/test1/paltest_namedmutex_test1. Exit code: 1

PAL Test Results:
  Passed: 807
  Failed: 1

@jyoungyun
Copy link
Contributor

@kouvel

process A has the mutex locked
process B waits on the mutex
process A exits cleanly or abruptly (with kill for instance) without releasing the lock

Do you talk about MutualExclusionTests of namedmutex pal tests?
In MutualExclusion Tests, child process locks the shared mutex and parent process attempt to lock with zero, a specific timed out. The tc is failed in case of 'WaitForSingleObject(m, g_expectedTimeoutMilliseconds) == WAIT_TIMEOUT'. But even though it works properly, parent process does not exit becuase it waits childEvents[1] mutex.
Am I missing something here ?

@kouvel
Copy link
Member

kouvel commented Jun 27, 2016

@richfelker, here is a test case:

#include <sys/mman.h>
#include <sys/time.h>

#include <errno.h>
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>

#include <new>
using namespace std;

struct Shm
{
    pthread_mutex_t syncMutex;
    pthread_cond_t syncCondition;
    pthread_mutex_t robustMutex;
    int conditionValue;

    Shm() : conditionValue(0)
    {
    }
} *shm;

int GetFailTimeoutTime(struct timespec *timeoutTimeRef)
{
    int getTimeResult = clock_gettime(CLOCK_REALTIME, timeoutTimeRef);
    if (getTimeResult != 0)
    {
        struct timeval tv;
        getTimeResult = gettimeofday(&tv, NULL);
        if (getTimeResult != 0)
            return __LINE__;
        timeoutTimeRef->tv_sec = tv.tv_sec;
        timeoutTimeRef->tv_nsec = tv.tv_usec * 1000;
    }
    timeoutTimeRef->tv_sec += 30;
    return 0;
}

int WaitForConditionValue(int desiredConditionValue)
{
    struct timespec timeoutTime;
    int errorLine = GetFailTimeoutTime(&timeoutTime);
    if (errorLine != 0)
        return errorLine;
    if (pthread_mutex_timedlock(&shm->syncMutex, &timeoutTime) != 0)
        return __LINE__;

    if (shm->conditionValue != desiredConditionValue)
    {
        errorLine = GetFailTimeoutTime(&timeoutTime);
        if (errorLine != 0)
            return errorLine;
        if (pthread_cond_timedwait(&shm->syncCondition, &shm->syncMutex, &timeoutTime) != 0)
            return __LINE__;
        if (shm->conditionValue != desiredConditionValue)
            return __LINE__;
    }

    if (pthread_mutex_unlock(&shm->syncMutex) != 0)
        return __LINE__;
    return 0;
}

int SetConditionValue(int newConditionValue)
{
    struct timespec timeoutTime;
    int errorLine = GetFailTimeoutTime(&timeoutTime);
    if (errorLine != 0)
        return __LINE__;
    if (pthread_mutex_timedlock(&shm->syncMutex, &timeoutTime) != 0)
        return __LINE__;

    shm->conditionValue = newConditionValue;
    if (pthread_cond_signal(&shm->syncCondition) != 0)
        return __LINE__;

    if (pthread_mutex_unlock(&shm->syncMutex) != 0)
        return __LINE__;
    return 0;
}

void DoTest_Child();

int DoTest()
{
    // Map some shared memory
    void *shmBuffer = mmap(NULL, 4096, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_SHARED, -1, 0);
    if (shmBuffer == MAP_FAILED)
        return __LINE__;
    shm = new(shmBuffer) Shm;

    // Create sync mutex
    pthread_mutexattr_t syncMutexAttributes;
    if (pthread_mutexattr_init(&syncMutexAttributes) != 0)
        return __LINE__;
    if (pthread_mutexattr_setpshared(&syncMutexAttributes, PTHREAD_PROCESS_SHARED) != 0)
        return __LINE__;
    if (pthread_mutex_init(&shm->syncMutex, &syncMutexAttributes) != 0)
        return __LINE__;
    if (pthread_mutexattr_destroy(&syncMutexAttributes) != 0)
        return __LINE__;

    // Create sync condition
    pthread_condattr_t syncConditionAttributes;
    if (pthread_condattr_init(&syncConditionAttributes) != 0)
        return __LINE__;
    if (pthread_condattr_setpshared(&syncConditionAttributes, PTHREAD_PROCESS_SHARED) != 0)
        return __LINE__;
    if (pthread_cond_init(&shm->syncCondition, &syncConditionAttributes) != 0)
        return __LINE__;
    if (pthread_condattr_destroy(&syncConditionAttributes) != 0)
        return __LINE__;

    // Create the robust mutex that will be tested
    pthread_mutexattr_t robustMutexAttributes;
    if (pthread_mutexattr_init(&robustMutexAttributes) != 0)
        return __LINE__;
    if (pthread_mutexattr_setpshared(&robustMutexAttributes, PTHREAD_PROCESS_SHARED) != 0)
        return __LINE__;
    if (pthread_mutexattr_setrobust(&robustMutexAttributes, PTHREAD_MUTEX_ROBUST) != 0)
        return __LINE__;
    if (pthread_mutex_init(&shm->robustMutex, &robustMutexAttributes) != 0)
        return __LINE__;
    if (pthread_mutexattr_destroy(&robustMutexAttributes) != 0)
        return __LINE__;

    // Start child test process
    int error = fork();
    if (error == -1)
        return __LINE__;
    if (error == 0)
    {
        DoTest_Child();
        return -1;
    }

    // Wait for child to take a lock
    WaitForConditionValue(1);

    // Wait to try to take a lock. Meanwhile, child abandons the robust mutex.
    struct timespec timeoutTime;
    int errorLine = GetFailTimeoutTime(&timeoutTime);
    if (errorLine != 0)
        return errorLine;
    error = pthread_mutex_timedlock(&shm->robustMutex, &timeoutTime);
    if (error != EOWNERDEAD) // expect to be notified that the robust mutex was abandoned
    {
        printf("pthread_mutex_timedlock error: %d\n", error);
        printf("ENOTRECOVERABLE: %d\n", ENOTRECOVERABLE);
        return __LINE__;
    }
    if (pthread_mutex_consistent(&shm->robustMutex) != 0)
        return __LINE__;

    if (pthread_mutex_unlock(&shm->robustMutex) != 0)
        return __LINE__;
    if (pthread_mutex_destroy(&shm->robustMutex) != 0)
        return __LINE__;
    return 0;
}

void DoTest_Child()
{
    int errorLine;

    do
    {
        // Lock the robust mutex
        struct timespec timeoutTime;
        errorLine = GetFailTimeoutTime(&timeoutTime);
        if (errorLine != 0)
            break;
        if (pthread_mutex_timedlock(&shm->robustMutex, &timeoutTime) != 0)
        {
            errorLine = __LINE__;
            break;
        }

        // Notify parent that robust mutex is locked
        errorLine = SetConditionValue(1);
        if (errorLine != 0)
            break;

        // Wait a short period to let the parent block on waiting for a lock
        sleep(1);

        // Abandon the mutex by exiting the thread while holding the lock. Parent's wait should be released by EOWNERDEAD.
        errorLine = 0;
    } while (false);

    printf("child: %d\n", errorLine);
}

int main()
{
    int result = DoTest();
    if (result >= 0)
        printf("parent: %d\n", result);
    return 0;
}

On Ubuntu x64 I'm seeing the expected output:

child: 0
parent: 0

On Alpine Linux x64, I'm seeing the following:

child: 0
pthread_mutex_timedlock error: 131
ENOTRECOVERABLE: 131
parent: <nonzero>

This indicates that pthread_mutex_timedlock is returning ENOTRECOVERABLE without first returning EOWNERDEAD with lock ownership. It seems to work fine cross-thread, but not working cross-process.

@kouvel
Copy link
Member

kouvel commented Jun 27, 2016

@jyoungyun:

In MutualExclusion Tests, child process locks the shared mutex and parent process attempt to lock with zero, a specific timed out. The tc is failed in case of 'WaitForSingleObject(m, g_expectedTimeoutMilliseconds) == WAIT_TIMEOUT'.

Both of the attempts to lock the mutex in the parent process should fail with timeout because the child process owns the lock:

    TestAssert(WaitForSingleObject(m, 0) == WAIT_TIMEOUT); // try to lock the mutex without waiting
    TestAssert(WaitForSingleObject(m, g_expectedTimeoutMilliseconds) == WAIT_TIMEOUT); // try to lock the mutex with a timeout

But even though it works properly, parent process does not exit becuase it waits childEvents[1] mutex.

If the above is working as expected, then it should follow this path to completion:

// currently, child is blocked here:
        YieldToParent(parentEvents, childEvents, ei); // parent attempts to lock/release, and fails
// parent:
    ...
    TestAssert(YieldToChild(parentEvents, childEvents, ei)); // child releases the lock
// child:
        TestAssert(m.Release()); // release the lock
    UninitializeChild(childRunningEvent, parentEvents, childEvents);
    return 0;
// parent:
    TestAssert(m.Release());
    UninitializeParent(testName, parentEvents);
    return true;

UninitializeChild(...) should release both of the childEvents, and allow the parent process to continue.

@jyoungyun
Copy link
Contributor

jyoungyun commented Jun 27, 2016

@kouvel
What you said is the same with what I mean. :)
Now I'm clear after reviewing all comments of this issue. The Alpine Linux x64 behavior is different with Linux/Arm32(currently). Raspberry target occured daedlock not failed. And fixing glibc part, it passed all parts. I confused that the comments and results were the other things what I wonder about. Thank you for comments.

@kouvel
Copy link
Member

kouvel commented Jun 27, 2016

Ah ok, yea that test was passing on Alpine Linux, seems like a different issue from the one you saw.

kouvel referenced this issue in kouvel/coreclr Jun 27, 2016
…e more cases

Workaround for #5456:
- Sometimes, a timed wait operation is not getting released, causing a hang
- Due to the hang, it is not possible to detect this issue with code
- Temporarily disabled the use of pthread process-shared mutexes on ARM/ARM64. File locks will be used instead.

Workaround for #5872:
- On Alpine Linux, a pthread process-shared robust mutex is detecting the case where a process abandons the mutex when it exits while holding the lock, but is putting the mutex into an unrecoverable state (ENOTRECOVERABLE) instead of assigning lock ownership to the next thread that is released from a wait for a lock and notifying of abandonment (EOWNERDEAD).
- Added a test case to detect this issue, to have it use file locks instead

Close #5456
@richfelker
Copy link

@kouvel
Copy link
Member

kouvel commented Jun 27, 2016

Great, thanks @richfelker

@barthalion
Copy link

@jasonwilliams200OK FYI, @ncopa cherry-picked this fix to Alpine Edge so test should pass with up-to-date packages.

@ghost
Copy link

ghost commented Jun 27, 2016

Thanks a lot for the fix in musl-libc @richfelker and @ncopa for quickly providing the patch in edge/main ! 🎉 👍 ⭐ 🎆

I think @kouvel's workaround would also be useful if someone is building with older version of musl-libc (but we should probably discourage / warn the dev to update to musl-1.1.14-r11+)?

@barthalion, @kouvel, I have updated musl to musl-1.1.14-r11 by uncommenting the /edge/main in /etc/apk/repositories and running apk fetch && apk update && apk upgrade. Then built coreclr master and ran the tests (remember to run those paxctl commands: https://gist.github.com/jasonwilliams200OK/7d6f5594d3bf697a27c9c1036d349fce#how-to-run-coreclr-pal-tests). All pal tests passed on Alpine Linux!

Finished running PAL tests.

PAL Test Results:
  Passed: 808
  Failed: 0

This issue is resolved.

Now that PAL test, mostly focused on platform cruntime, passes; I will run the full managed .NET assemblies (Base Class Library) test suite next on Alpine Linux.

@ghost ghost closed this as completed Jun 27, 2016
kouvel referenced this issue in kouvel/coreclr Jun 28, 2016
…e more cases

Workaround for #5456:
- Sometimes, a timed wait operation is not getting released, causing a hang
- Due to the hang, it is not possible to detect this issue with code
- Temporarily disabled the use of pthread process-shared mutexes on ARM/ARM64. File locks will be used instead.

Workaround for #5872:
- On Alpine Linux, a pthread process-shared robust mutex is detecting the case where a process abandons the mutex when it exits while holding the lock, but is putting the mutex into an unrecoverable state (ENOTRECOVERABLE) instead of assigning lock ownership to the next thread that is released from a wait for a lock and notifying of abandonment (EOWNERDEAD).
- Added a test case to detect this issue, to have it use file locks instead

Close #5456
kouvel referenced this issue in kouvel/coreclr Jul 5, 2016
…e more cases

Workaround for #5456:
- Sometimes, a timed wait operation is not getting released, causing a hang
- Due to the hang, it is not possible to detect this issue with code
- Temporarily disabled the use of pthread process-shared mutexes on ARM/ARM64. File locks will be used instead.

Workaround for #5872:
- On Alpine Linux, a pthread process-shared robust mutex is detecting the case where a process abandons the mutex when it exits while holding the lock, but is putting the mutex into an unrecoverable state (ENOTRECOVERABLE) instead of assigning lock ownership to the next thread that is released from a wait for a lock and notifying of abandonment (EOWNERDEAD).
- Added a test case to detect this issue, to have it use file locks instead

Close #5456
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 30, 2020
This issue was closed.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants