Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests failing with OCRv1.2 on x86-mpi #10

Open
DaoWen opened this issue Jul 5, 2017 · 0 comments
Open

Tests failing with OCRv1.2 on x86-mpi #10

DaoWen opened this issue Jul 5, 2017 · 0 comments

Comments

@DaoWen
Copy link
Contributor

DaoWen commented Jul 5, 2017

When running the CnC-OCR tests using OCRv1.2 on x86-mpi as the backend runtime, the tests almost always fail with a segfault. Rolling back to OCRv1.1 fixes the issue.

This actually seems to be a bug introduced into OCR somewhere between these two releases. The segfault happens during the CnC distributed context setup on x86-mpi. My initial impression is that there is some sort of race being introduced between the creation of a remote datablock (using affinity hints) and the creation of an EDT that depends on that datablock, since the segfault appears to always happen when resolving an EDT's dependence on a remotely-created datablock.

For now, the workaround is to use OCRv1.1 as the OCR backend. I'm going to distill the issue that's presenting in CnC-OCR down to a minimal OCR example and open a ticket in the OCR issue tracker. Hopefully this issue will be resolved in the next OCR release (assuming that this is really an OCR issue and not a bug in the CnC layer over OCR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant