Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Travis fixes #2

Merged
merged 23 commits into from
Jun 29, 2020
Merged

Travis fixes #2

merged 23 commits into from
Jun 29, 2020

Conversation

Matthew-Whitlock
Copy link
Owner

No description provided.

@Matthew-Whitlock Matthew-Whitlock merged commit c352182 into master Jun 29, 2020
Matthew-Whitlock added a commit that referenced this pull request Apr 26, 2022
* Travis fixes (sandialabs#55)

Fix some travis/testing issues.

Travis now pulls from ULFM master branch when it needs to rebuild ULFM.
Travis has an environment variable enabling oversubscription during the tests, instead of having that on all platforms when running make test
Tests that involve failure have their timeouts individually set to 1, so tests don't take 10+ seconds each w/ the default timeout of 10s
Simplified travis scripts (no more .travis_helpers directory)

* Revert "Travis fixes (sandialabs#55)" (sandialabs#56)

Reverting un-reviewer PR, it was meant to be in my fork.

This reverts commit a41fd3b.

* Update README.md

* Merge updates for HCLIB (sandialabs#57)

* Add ability to query which processes failed

* Add support for MPI_Test

* Add support for testing pre-failure requests

* Fix bug when ERR_PROC_FAILED/ERR_REVOKED discovered in MPI_Test

* Fix MPI_Wait w/ cancelled requests

* Add missing file to commit

* Fix bug with MPI_STATUS_IGNORE

* Fix another bug with MPI_Test

* Add no-jump recovery option

* Travis fixes (#2)

Fix some travis/testing issues.

Travis now pulls from ULFM master branch when it needs to rebuild ULFM.
Travis has an environment variable enabling oversubscription during the tests, instead of having that on all platforms when running make test
Tests that involve failure have their timeouts individually set to 1, so tests don't take 10+ seconds each w/ the default timeout of 10s
Simplified travis scripts (no more .travis_helpers directory)

* First pass at removing the request store

New function, "Fenix_test_cancelled" for checking if pre-failure requests completed or were cancelled.

One thing to try finding a solution for: If a failure was found during an MPI_Test, that request has
already been removed from MPI internals and replaced w/ MPI_REQUEST_NULL. Fenix_test_cancelled will
report that this req was completed

* Implement custom errhandler

This includes removing the option for comm_replace - users now must provide
a comm pointer to fenix_init and cannot rely on fenix to automatically replace
their input comm with the resilient comm.

* Fenix comms are stack-allocated now, instead of malloced

* Cleanup redundant set_errhandler calls

* Fix data recovery bug

* Add usage instructions to all examples/tests

* Add support for MPI_Issend and MPI_Ssend (#3)

Merge in Issend test

Co-authored-by: mwhitlo@sandia.gov <mwhitlo@sandia.gov>
Co-authored-by: sriraj <srirajpaul@gmail.com>

Co-authored-by: Keita Teranishi <knteran@sandia.gov>
Co-authored-by: mwhitlo@sandia.gov <mwhitlo@sandia.gov>
Co-authored-by: sriraj <srirajpaul@gmail.com>
Matthew-Whitlock added a commit that referenced this pull request Apr 26, 2022
* Add ability to query which processes failed

* Add support for MPI_Test

* Add support for testing pre-failure requests

* Fix bug when ERR_PROC_FAILED/ERR_REVOKED discovered in MPI_Test

* Fix MPI_Wait w/ cancelled requests

* Add missing file to commit

* Fix bug with MPI_STATUS_IGNORE

* Fix another bug with MPI_Test

* Add no-jump recovery option

* Travis fixes (#2)

Fix some travis/testing issues.

Travis now pulls from ULFM master branch when it needs to rebuild ULFM.
Travis has an environment variable enabling oversubscription during the tests, instead of having that on all platforms when running make test
Tests that involve failure have their timeouts individually set to 1, so tests don't take 10+ seconds each w/ the default timeout of 10s
Simplified travis scripts (no more .travis_helpers directory)

* First pass at removing the request store

New function, "Fenix_test_cancelled" for checking if pre-failure requests completed or were cancelled.

One thing to try finding a solution for: If a failure was found during an MPI_Test, that request has
already been removed from MPI internals and replaced w/ MPI_REQUEST_NULL. Fenix_test_cancelled will
report that this req was completed

* Implement custom errhandler

This includes removing the option for comm_replace - users now must provide
a comm pointer to fenix_init and cannot rely on fenix to automatically replace
their input comm with the resilient comm.

* Fenix comms are stack-allocated now, instead of malloced

* Cleanup redundant set_errhandler calls

* Fix data recovery bug

* Add usage instructions to all examples/tests

* Add support for MPI_Issend and MPI_Ssend (#3)

Merge in Issend test

Co-authored-by: mwhitlo@sandia.gov <mwhitlo@sandia.gov>
Co-authored-by: sriraj <srirajpaul@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant