Travis fixes #2

Matthew-Whitlock · 2020-06-29T17:03:48Z

No description provided.

* Travis fixes (sandialabs#55) Fix some travis/testing issues. Travis now pulls from ULFM master branch when it needs to rebuild ULFM. Travis has an environment variable enabling oversubscription during the tests, instead of having that on all platforms when running make test Tests that involve failure have their timeouts individually set to 1, so tests don't take 10+ seconds each w/ the default timeout of 10s Simplified travis scripts (no more .travis_helpers directory) * Revert "Travis fixes (sandialabs#55)" (sandialabs#56) Reverting un-reviewer PR, it was meant to be in my fork. This reverts commit a41fd3b. * Update README.md * Merge updates for HCLIB (sandialabs#57) * Add ability to query which processes failed * Add support for MPI_Test * Add support for testing pre-failure requests * Fix bug when ERR_PROC_FAILED/ERR_REVOKED discovered in MPI_Test * Fix MPI_Wait w/ cancelled requests * Add missing file to commit * Fix bug with MPI_STATUS_IGNORE * Fix another bug with MPI_Test * Add no-jump recovery option * Travis fixes (#2) Fix some travis/testing issues. Travis now pulls from ULFM master branch when it needs to rebuild ULFM. Travis has an environment variable enabling oversubscription during the tests, instead of having that on all platforms when running make test Tests that involve failure have their timeouts individually set to 1, so tests don't take 10+ seconds each w/ the default timeout of 10s Simplified travis scripts (no more .travis_helpers directory) * First pass at removing the request store New function, "Fenix_test_cancelled" for checking if pre-failure requests completed or were cancelled. One thing to try finding a solution for: If a failure was found during an MPI_Test, that request has already been removed from MPI internals and replaced w/ MPI_REQUEST_NULL. Fenix_test_cancelled will report that this req was completed * Implement custom errhandler This includes removing the option for comm_replace - users now must provide a comm pointer to fenix_init and cannot rely on fenix to automatically replace their input comm with the resilient comm. * Fenix comms are stack-allocated now, instead of malloced * Cleanup redundant set_errhandler calls * Fix data recovery bug * Add usage instructions to all examples/tests * Add support for MPI_Issend and MPI_Ssend (#3) Merge in Issend test Co-authored-by: mwhitlo@sandia.gov <mwhitlo@sandia.gov> Co-authored-by: sriraj <srirajpaul@gmail.com> Co-authored-by: Keita Teranishi <knteran@sandia.gov> Co-authored-by: mwhitlo@sandia.gov <mwhitlo@sandia.gov> Co-authored-by: sriraj <srirajpaul@gmail.com>

* Add ability to query which processes failed * Add support for MPI_Test * Add support for testing pre-failure requests * Fix bug when ERR_PROC_FAILED/ERR_REVOKED discovered in MPI_Test * Fix MPI_Wait w/ cancelled requests * Add missing file to commit * Fix bug with MPI_STATUS_IGNORE * Fix another bug with MPI_Test * Add no-jump recovery option * Travis fixes (#2) Fix some travis/testing issues. Travis now pulls from ULFM master branch when it needs to rebuild ULFM. Travis has an environment variable enabling oversubscription during the tests, instead of having that on all platforms when running make test Tests that involve failure have their timeouts individually set to 1, so tests don't take 10+ seconds each w/ the default timeout of 10s Simplified travis scripts (no more .travis_helpers directory) * First pass at removing the request store New function, "Fenix_test_cancelled" for checking if pre-failure requests completed or were cancelled. One thing to try finding a solution for: If a failure was found during an MPI_Test, that request has already been removed from MPI internals and replaced w/ MPI_REQUEST_NULL. Fenix_test_cancelled will report that this req was completed * Implement custom errhandler This includes removing the option for comm_replace - users now must provide a comm pointer to fenix_init and cannot rely on fenix to automatically replace their input comm with the resilient comm. * Fenix comms are stack-allocated now, instead of malloced * Cleanup redundant set_errhandler calls * Fix data recovery bug * Add usage instructions to all examples/tests * Add support for MPI_Issend and MPI_Ssend (#3) Merge in Issend test Co-authored-by: mwhitlo@sandia.gov <mwhitlo@sandia.gov> Co-authored-by: sriraj <srirajpaul@gmail.com>

Matthew-Whitlock added 23 commits June 29, 2020 06:45

small test

59eec3c

trying to simplify

ce88c10

small fix

428a4d8

Fix oversubscribe export

1c24eed

Test working dir change requirements

52f380b

test failure

2ea13a7

Finalize script for now

0911534

Print test logs on success, for verifying

8e74165

Switch to pulling from ULFM master branch when rebuilding ULFM

3b586ae

print ULFM install info if building fails

59c05f0

Fix brackets in build

5002bed

small test

bced404

Try another fix

c581110

Trying a new OOP fix

1a43a8d

another attempt

9b32ada

Add semicolon to {} sections

df19c0c

Braces must be separate from commands by spaces

7410b9a

Try to fix travis not showing logs

b096f63

Fix autogen.pl output to file

74a18bd

Fix ulfm install log

2f5829b

Fix trailing i from vim

37ea254

Update tests for speed, and remove hard-set --oversubscribe

f84d62d

Simplify travis a bit more

d55030b

Matthew-Whitlock merged commit c352182 into master Jun 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Travis fixes #2

Travis fixes #2

Matthew-Whitlock commented Jun 29, 2020

Travis fixes #2

Travis fixes #2

Conversation

Matthew-Whitlock commented Jun 29, 2020