Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add deterministic label to Nexus tests #3100

Merged
merged 3 commits into from
Apr 16, 2021
Merged

Conversation

prckent
Copy link
Contributor

@prckent prckent commented Apr 14, 2021

Proposed changes

Retry adding this label which will activate Nexus in CI.

Useful to get working to avoid unintentional Nexus breakage.

What type(s) of changes does this code introduce?

  • Testing changes (e.g. new unit/integration/performance tests)

Does this introduce a breaking change?

  • No

What systems has this change been tested on?

None. Needs CI.

Checklist

  • Yes. This PR is up to date with current the current state of 'develop'
  • NA. Code added or changed in the PR has been clang-formatted
  • NA. This PR adds tests to cover any new code, or to catch a bug that is being fixed
  • NA. Documentation has been added (if appropriate)

@prckent
Copy link
Contributor Author

prckent commented Apr 14, 2021

Unknown error from nexus_simulation_module, but everything else is working now. @jtkrogel Any ideas?

        Start 1645: ntest_nexus_simulation_module
715/759 Test #1645: ntest_nexus_simulation_module .................................................................***Failed    1.20 sec
Test name     : simulation_module
Test sublabel : test_execute
Test exception: "AssertionError: "
Test backtrace:
  File "/__w/qmcpack/qmcpack/nexus/bin/nxs-test", line 478, in run
    self.operation()
  File "/__w/qmcpack/qmcpack/nexus/bin/nxs-test", line 1064, in simulation
    nunit('execute')
  File "/__w/qmcpack/qmcpack/nexus/bin/nxs-test", line 349, in nunit
    run_external_unit_test(test_name,unit_test)
  File "/__w/qmcpack/qmcpack/nexus/bin/nxs-test", line 388, in run_external_unit_test
    unit_test()
  File "/__w/qmcpack/qmcpack/nexus/tests/unit/test_simulation_module.py", line 2386, in test_execute
    assert(open(outfile,'r').read().strip()=='run')

Test status: fail

@jtkrogel
Copy link
Contributor

jtkrogel commented Apr 14, 2021

Strange. All that is happening here is echo run being executed in a new shell with stdout and stderr being piped to files and the file contents checked. The file existence check passed but not the content check.

Does this error happen only in the CI, or can it be reproduced by using ctest on the command line?

@prckent
Copy link
Contributor Author

prckent commented Apr 14, 2021

This is a CI thing. So perhaps specific to the docker image and its software?

@williamfgc can you/we get any more info? do you have an instructions for running the CI docker image to test these sorts of things?

@williamfgc
Copy link
Contributor

williamfgc commented Apr 14, 2021

@prckent @jtkrogel to run the docker container locally (on a Ubuntu system):

  1. Install docker engine Ubuntu instructions. I just use apt-get repos.
  2. run docker run -it williamfgc/qmcpack-ci:ubuntu20-openmpi /bin/bash

2 Should download the container from DockerHub and launch an interactive bash console. The container has QMCPACK dependencies, so clone QMCPACK and checkout your branch after that.
See the run script for GitHub Actions steps.

Hope it helps. Let me know if you have any questions, we can rerun the workflows to make sure it's not a glitch (it's not).

@williamfgc
Copy link
Contributor

williamfgc commented Apr 14, 2021

FYI, I'm able to reproduce locally on my Ubuntu 20.04 box outside of the docker container.
File qmcpack/nexus/tests/unit/test_simulation_output/test_execute/runs/test_sim66.out shows as empty, but the expectation is to have "run" as contents.

@prckent
Copy link
Contributor Author

prckent commented Apr 14, 2021

I have reproduced the problem in the docker as well

user@b5e4b3c9c18c:~/build_pk$ ../qmcpack/nexus/bin/nxs-test -R simulation_module --verbose

 1/1 simulation_module.......................   Failed  0.47 sec
       subtest: test_import
       subtest: test_simulation_input
       subtest: test_simulation_analyzer
       subtest: test_simulation_input_template
       subtest: test_simulation_input_multi_template
       subtest: test_code_name
       subtest: test_init
       subtest: test_virtuals
       subtest: test_reset_indicators
       subtest: test_indicator_checks
       subtest: test_create_directories
       subtest: test_file_text
       subtest: test_depends
       subtest: test_undo_depends
       subtest: test_has_generic_input
       subtest: test_check_dependencies
       subtest: test_get_dependencies
       subtest: test_downstream_simids
       subtest: test_copy_file
       subtest: test_save_load_image
       subtest: test_load_analyzer_image
       subtest: test_save_attempt
       subtest: test_write_inputs
       subtest: test_send_files
       subtest: test_submit
       subtest: test_update_process_id
       subtest: test_check_status
       subtest: test_get_output
       subtest: test_analyze
       subtest: test_progress
       subtest: test_execute
Test name     : simulation_module
Test sublabel : test_execute
Test exception: "AssertionError: "
Test backtrace:
  File "../qmcpack/nexus/bin/nxs-test", line 478, in run
    self.operation()
  File "../qmcpack/nexus/bin/nxs-test", line 1064, in simulation
    nunit('execute')
  File "../qmcpack/nexus/bin/nxs-test", line 349, in nunit
    run_external_unit_test(test_name,unit_test)
  File "../qmcpack/nexus/bin/nxs-test", line 388, in run_external_unit_test
    unit_test()
  File "/home/user/qmcpack/nexus/tests/unit/test_simulation_module.py", line 2386, in test_execute
    assert(open(outfile,'r').read().strip()=='run')


0% tests passed, 1 tests failed out of 1

Total test time = 0.48 sec

@prckent
Copy link
Contributor Author

prckent commented Apr 14, 2021

However, fixing this is beyond me :-(

@jtkrogel
Copy link
Contributor

Is there a way I can get access to an environment where the problem is demonstrable? I am willing to track it down if so.

@prckent
Copy link
Contributor Author

prckent commented Apr 14, 2021

@jtkrogel I was able to follow William's docker instructions, git clone the repo and run the tests on my laptop.

  1. docker run -it williamfgc/qmcpack-ci:ubuntu20-openmpi /bin/bash
  2. Then inside the new docker shell:
git clone https://github.com/QMCPACK/qmcpack.git
mkdir build_pk
cd build_pk
cmake -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DQMC_MPI=0 ../qmcpack/
ctest -R nexus

@jtkrogel
Copy link
Contributor

OK, will try.

@jtkrogel
Copy link
Contributor

Reproduced in ctest within Docker. Also reproduced within Nexus test system in Docker. Tracking now.

@jtkrogel
Copy link
Contributor

jtkrogel commented Apr 14, 2021

The problem centers around execution of this command: mpirun -np 1 echo run.

This command works fine from the command line in the docker image:

user@14b66c49bcab:~/qmcpack/nexus/bin$ mpirun -np 1 echo run
run

But when run with the subprocess module (Python standard library), a return code of 1 is given (the command fails) and thus run does not appear in the stdout file of the command.

I'm not really sure why this is.

@jtkrogel
Copy link
Contributor

Confusingly, the same command gives a return code of 0 when run via the subprocess module in a Python interactive session in the Docker image...

@prckent
Copy link
Contributor Author

prckent commented Apr 15, 2021

Perhaps try with the full path to echo?

@jtkrogel
Copy link
Contributor

It turns out that echo is working (with or w/o full path) when executed within the Python subprocess.

It is mpirun that is the issue. It works fine in the Docker image from the command line, but not within Python subprocess (with or without full system path to mpirun exe).

@prckent
Copy link
Contributor Author

prckent commented Apr 15, 2021

Also notice william's comment above "I'm able to reproduce locally on my Ubuntu 20.04 box outside of the docker container."

@jtkrogel
Copy link
Contributor

I have a fix and a plausible explanation: https://stackoverflow.com/questions/60060142/strange-interaction-between-h5py-subprocess-and-mpirun

It looks like some module imports can pollute the process environment. Alternatively, OS environ needs to be explicitly loaded for the subprocess.

See also #3108

@prckent
Copy link
Contributor Author

prckent commented Apr 15, 2021

Thanks. So we review+merge that one, update the branch here, and hopefully it will pass.

@prckent prckent changed the title [WIP] Add deterministic label to Nexus tests Add deterministic label to Nexus tests Apr 16, 2021
@prckent
Copy link
Contributor Author

prckent commented Apr 16, 2021

Nexus tests are now running correctly. Thanks Jaron.
Removed the WIP, OK to review+merge.

@prckent prckent merged commit 0dee521 into QMCPACK:develop Apr 16, 2021
@prckent prckent deleted the nexusci branch September 19, 2024 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants