-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add deterministic label to Nexus tests #3100
Conversation
Unknown error from nexus_simulation_module, but everything else is working now. @jtkrogel Any ideas?
|
Strange. All that is happening here is Does this error happen only in the CI, or can it be reproduced by using ctest on the command line? |
This is a CI thing. So perhaps specific to the docker image and its software? @williamfgc can you/we get any more info? do you have an instructions for running the CI docker image to test these sorts of things? |
@prckent @jtkrogel to run the docker container locally (on a Ubuntu system):
2 Should download the container from DockerHub and launch an interactive bash console. The container has QMCPACK dependencies, so clone QMCPACK and checkout your branch after that. Hope it helps. Let me know if you have any questions, we can rerun the workflows to make sure |
FYI, I'm able to reproduce locally on my Ubuntu 20.04 box outside of the docker container. |
I have reproduced the problem in the docker as well
|
However, fixing this is beyond me :-( |
Is there a way I can get access to an environment where the problem is demonstrable? I am willing to track it down if so. |
@jtkrogel I was able to follow William's docker instructions, git clone the repo and run the tests on my laptop.
|
OK, will try. |
Reproduced in ctest within Docker. Also reproduced within Nexus test system in Docker. Tracking now. |
The problem centers around execution of this command: This command works fine from the command line in the docker image:
But when run with the subprocess module (Python standard library), a return code of 1 is given (the command fails) and thus I'm not really sure why this is. |
Confusingly, the same command gives a return code of 0 when run via the subprocess module in a Python interactive session in the Docker image... |
Perhaps try with the full path to echo? |
It turns out that It is |
Also notice william's comment above "I'm able to reproduce locally on my Ubuntu 20.04 box outside of the docker container." |
I have a fix and a plausible explanation: https://stackoverflow.com/questions/60060142/strange-interaction-between-h5py-subprocess-and-mpirun It looks like some module imports can pollute the process environment. Alternatively, OS environ needs to be explicitly loaded for the subprocess. See also #3108 |
Thanks. So we review+merge that one, update the branch here, and hopefully it will pass. |
Nexus tests are now running correctly. Thanks Jaron. |
Proposed changes
Retry adding this label which will activate Nexus in CI.
Useful to get working to avoid unintentional Nexus breakage.
What type(s) of changes does this code introduce?
Does this introduce a breaking change?
What systems has this change been tested on?
None. Needs CI.
Checklist