Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zoltan2: Several tests fail with 64 bit builds of of Scotch and ParMETIS #476

Closed
bartlettroscoe opened this issue Jun 30, 2016 · 6 comments

Comments

@bartlettroscoe
Copy link
Member

bartlettroscoe commented Jun 30, 2016

Next Action Status:

64-bit Scotch and ParMETIS not enabled for Zoltan2 yet. Next: Zoltan2 team to fix failing tests then enable ...

CC: @trilinos/zoltan2

Description:

As @kddevin predicted in this #158 comment, several of the Scotch and ParMETIS tests fail when using a 64 bit build of Scott and ParMETIS. These are the only TPLs that are available with the SEMS Dev Env (see a lengthy discussion in #158).

In particular, the following Zoltan2 tests failed with the 64 bit builds of Scotch and ParMETIS:

    123 - Zoltan2_Partitioning1_MPI_4 (Failed)
    124 - Zoltan2_Partitioning1_OneProc_MPI_4 (Failed)
    125 - Zoltan2_Partitioning1_VWeights_MPI_4 (Failed)
    126 - Zoltan2_Partitioning1_OneProc_VWeights_MPI_4 (Failed)
    127 - Zoltan2_Partitioning1_EWeights_MPI_4 (Failed)
    128 - Zoltan2_Partitioning1_OneProc_EWeights_MPI_4 (Failed)

However, what is interesting is that several Zoltan2 "scotch" and "parmetis" tests also passed:

$ grep " Test " ctest.out | grep "Passed" | grep "Zoltan2_" | grep -i "\(parmetis\|scotch\)"
115/221 Test #133: Zoltan2_Partitioning1_ParMETIS_EWeights_MPI_4 ...........   Passed    0.39 sec
118/221 Test #131: Zoltan2_Partitioning1_ParMETIS_VWeights_MPI_4 ...........   Passed    0.41 sec
120/221 Test #132: Zoltan2_Partitioning1_ParMETIS_OneProc_VWeights_MPI_4 ...   Passed    0.40 sec
121/221 Test #134: Zoltan2_Partitioning1_ParMETIS_OneProc_EWeights_MPI_4 ...   Passed    0.40 sec
122/221 Test #130: Zoltan2_Partitioning1_ParMETIS_OneProc_MPI_4 ............   Passed    0.40 sec
149/221 Test #129: Zoltan2_Partitioning1_ParMETIS_MPI_4 ....................   Passed    0.39 sec
217/221 Test #141: Zoltan2_OrderingScotch_MPI_4 ............................   Passed    0.38 sec
218/221 Test #167: Zoltan2_parmetis_example_MPI_4 ..........................   Passed    2.97 sec

There are many possible options to address these failing tests that I can think of:

  1. Disable only the currently failing tests for just the SEMS Dev Env build: This could be done by setting cache vars <test_name>_DISABLE=TRUE in the SEMSDevEnv.cmake file.
    • Pro: Easy to implement by non-Zoltan2 developers
    • Pro: Still enables Scotch and ParMETIS TPLs and gets at least some tests run using these
    • Con: Does not exercise some functionality of Zoltan2 for Scotch and ParMETIS
    • Con: As Zoltan2 tests using Scotch and ParMETIS are changed but only tested with 32 bit builds of Scotch and ParMETIS, there is greater risk that these updated tests which are currently passing on the SEMS Dev Env may then fail with the 64 bit builds of these TPLs.
    • Summary: Easy short-term solution that yields all passing CI tests with Zoltan2
  2. Disable Scotch and ParMETIS TPL support for Zoltan2: This could be done by setting Zoltan2_ENABLE_Scotch=OFF and Zoltan2_ENABLE_ParMETIS=OFF in the SEMSDevEnv.cmake file.
    • Pro: Easy to implement by non-Zoltan2 developers
    • Pro: There would never be a Scotch or ParMETIS related test failure on the SEMS Dev Env.
    • Con: The build and usage of Zoltan2 with Scotch and ParMETIS would not be getting tested on the SEMS Dev Env.
    • Summary: Easy short-term solution that yields all passing CI tests with Zoltan2
  3. Update the Zoltan2 test suite to work with 64 bit Scotch and ParMETIS: This would require Zoltan developers to do the updates.
    • Pro: Would allow full Zoltan2 test suite to be run on the SEMS Dev Env.
    • Pro: Strengthens the Zoltan2
    • Con: Requires Zoltan2 developers to update the Zoltan test suite
    • Summary: Best long-term solution but requires work from the Zoltan2 developers

I will provide detailed reproducibility instructions in a later comment.

Definition of Done:

  • No failing Zoltan2 tests in pre-push CI testing with the SEMS Dev Env
  • Zoltan2 developers decide on best approach to dealing with these failing tests.

Tasks:

???

@bartlettroscoe
Copy link
Member Author

The same as for similar failing Zoltan tests in #475, I think the best short-term solution is option-1 "Disable only the currently failing tests for just the SEMS Dev Env build".

NOTE: These tests will only be disabled for the SEMS Dev Env build and no other builds.

I will provide reproducible instructions once I push the commit for the SEMSDevEnv.cmake file that disables these tests.

@trilinos/framework,

FYI: we need all clean Zoltan2 tests in pre-push CI testing.

@bartlettroscoe
Copy link
Member Author

Note that the situation for Zoltan2 is different from Zoltan w.r.t. to enabling 64-bit versions of Scotch and ParMETIS. In the case of Zoltan shown in #475, enabling Scotch and ParMETIS only enables several new tests (about 1/2 of which actually pass). So there is no downside in enabling 64-bit versions of Scotch and ParMETIS and then (temporarily) disabling the ones the don't pass because all of the tests without Scotch and ParMETIS enabled still pass.

But for Zoltan2, several tests that actually do pass with Scotch and ParMETIS disabled fail when you enable the 64 bit versions of these as shown here:

http://testing.sandia.gov/cdash/viewTest.php?onlypassed&buildid=2487199

(e.g. see passing tests Zoltan2_Partitioning1_MPI_4, Zoltan2_Partitioning1_OneProc_MPI_4, Zoltan2_Partitioning1_VWeights_MPI_4, etc.)

What is even more interesting is that there are several Zoltan2 tests shown in the above CDash page that have "ParMETIS" in the name and are shown as passing in the query:

http://testing.sandia.gov/cdash/viewTest.php?onlypassed&buildid=2487199&filtercount=1&showfilters=1&field1=testname&compare1=63&value1=parmetis

but ParMETIS is clearly not enable if you look at the configure for this build here:

http://testing.sandia.gov/cdash/viewConfigure.php?buildid=2487199

which shows:

Final set of non-enabled TPLs: ... Scotch ... ParMETIS ... 93

How can ParMETIS tests pass if ParMETIS is not even enabled? It appears from the test output that these tests are actually failing but the test is reporting as passing because it is printing "PASS". Take for example the test Zoltan2_Partitioning1_ParMETIS_OneProc_EWeights_MPI_4 with output shown here:

http://testing.sandia.gov/cdash/testDetails.php?test=34046494&build=2487199

which shows the output:

Calling solve() 
Runtime exception returned from solve(): BUILD ERROR:  ParMETIS requested but not compiled into Zoltan2.
Please set CMake flag Zoltan2_ENABLE_ParMETIS:BOOL=ON. PASS
Runtime exception returned from solve(): BUILD ERROR:  ParMETIS requested but not compiled into Zoltan2.
Runtime exception returned from solve(): BUILD ERROR:  ParMETIS requested but not compiled into Zoltan2.
Runtime exception returned from solve(): BUILD ERROR:  ParMETIS requested but not compiled into Zoltan2.
Please set CMake flag Zoltan2_ENABLE_ParMETIS:BOOL=ON. PASS
Please set CMake flag Zoltan2_ENABLE_ParMETIS:BOOL=ON. PASS
Please set CMake flag Zoltan2_ENABLE_ParMETIS:BOOL=ON. PASS

One might argue that it is better to not enable a test than to have it pass in a trivial way.

Anyway, given all of this, I think it is best to just not yet enable Soctch and ParMETIS testing for Zoltan2 until the tests can be fixed for 64 bit versions of these libraries. Again, as I noted in #475, there are no current automated builds of Trilinos that appear to be enabling Scotch or ParMETIS so Zoltan2 will not be loosing any testing.

bartlettroscoe added a commit that referenced this issue Jul 1, 2016
Currently if you eanble 64-bit Scotch and ParMETIS with Zoltan2 several tests
fail.  This is a known issue and these tests will be fixed soon.  After that,
this commit can be reverted.

Also, you need to disable ParMETIS in ShyLU as well.
bartlettroscoe added a commit that referenced this issue Jul 1, 2016
This enables ParMETIS with passing tests for Amesos, Amesos2, ML, and SEACAS
(see Trilinos #158).

Currently these 64-bit TPLs are not enabled for Zoltan and Zoltan2 in this
build because the Zoltan and Zoltan2 test suites don't currently work with
64-bit libraries (see Trilinos #475 and #476).  Also, ShyLU support for
ParMETIS is also not enabled because it needs ParMETIS suppot from Zoltan2
which is not enabled.  Once the Zoltan and Zoltan2 test suites using 64 bit
TPLs are fixed, then these TPLs can be enabled.  Note that the ShyLU test for
ParMETIS passes for the 64 bit ParMETIS so nothing in ShyLU needs to be fixed.

Note that SEMS only provides MPI builds of these TPLs so they are disabled for
serial builds.

Build/Test Cases Summary
Enabled Packages:
Disabled Packages: PyTrilinos,Pliris,Claps,TriKota
Enabled all Packages
0) MPI_DEBUG => Test case MPI_DEBUG was not run! => Does not affect push readiness! (-1.00 min)
1) SERIAL_RELEASE => Test case SERIAL_RELEASE was not run! => Does not affect push readiness! (-1.00 min)
2) MPI_RELEASE_DEBUG_ST => passed: passed=2346,notpassed=0 (341.29 min)
3) SERIAL_RELEASE_ST => passed: passed=2163,notpassed=0 (172.52 min)
Other local commits for this build/test group: 92a1d8d, f2b3c92
@bartlettroscoe
Copy link
Member Author

bartlettroscoe commented Jul 1, 2016

Zoltan2 Developers,

To reproduce the failing Zoltan2 tests building against the 64-bit ParMETIS and Scotch TPLs in the SEMS Dev Env in order to fix them, one just needs to be on a machine that provides the SEMS Dev Env and then do something like the following:

$ cd Trilinos/  # Make sure you are on the 'develop' tracking branch
$ git pull   # from origin/develop
$ mkdir BUILD/
$ echo /BUILD/ >> .git/info/exclude
$ cd BUILD/
$ source ../cmake/load_ci_sems_dev_env.sh
$ cmake \
  -DCMAKE_BUILD_TYPE=RELEASE \
  -DTrilinos_ENABLE_DEBUG=ON \
  -DTPL_ENABLE_MPI=ON \
  -DTrilinos_ENABLE_TESTS=ON \
  -DTrilinos_ENABLE_Zoltan2=ON \
  -DZoltan2_ENABLE_Scotch=ON \
  -DZoltan2_ENABLE_ParMETIS=ON \
  ..
$ make -j16
$ ctest -j16

If that does not work to reproduce the failing tests shown above, please let me know.

Once all of the Zoltan tests are passing, then the commit 92a1d8d just needs to be reverted using:

$ cd Trilinos/
$ git revert 92a1d8d 

Then, if desired, a Zoltan2 developer could test and push these changes on a machine with the SEMS Dev Env with:

$ cd Trilinos/
$ mkdir CHECKIN/
$ cd CHECKIN/
$ ln -s ../cmake/std/sems/checkin-test-sems.sh .
$ ./checkin-test-sems.sh --enable-all-packages=off --no-enable-fwd-packages \
   --enable-packages=Zoltan2 --do-all --push

Thanks,

-Ross

@kddevin
Copy link
Contributor

kddevin commented Jul 3, 2016

Zoltan2 is working fine with ParMETIS. The tests that are failing above are Scotch tests.
The issue appears to be due to a bad installation of Scotch in SEMS. We receive the following error from Scotch:
(0): ERROR: SCOTCH_dgraphInit: Scotch compiled with SCOTCH_PTHREAD and program not launched with MPI_THREAD_MULTIPLE
I doubt that anyone who uses Scotch from the SEMS repo wants to use it with pthreads.
( @srajama1 Is that correct?)
I am happy to share my Scotch configuration settings with the SEMS team to allow a working installation. If the pthreads implementation is really needed, we'll need to find another solution.

I verified that Zoltan2 works with proper installations of ParMETIS and Scotch (32-bit or 64-bit). Thus, I will close this issue and re-enable the tests.

@kddevin kddevin closed this as completed Jul 3, 2016
@srajama1
Copy link
Contributor

srajama1 commented Jul 4, 2016

Scotch should be compiled without pthreads support for all our purposes.

bartlettroscoe added a commit that referenced this issue Jul 5, 2016
…476)

Now that Zoltan2 should be compatible with 64-bit ParMETIS, we should be able
to allow the enable of ParMETIS with Zoltan2 and ShyLU.

Note that the 32-bit Scotch is still disabled globally (see Trilinos #158 and
$476 for details).

Build/Test Cases Summary
Enabled Packages: Zoltan2, ShyLUCore
Disabled Packages: PyTrilinos,Pliris,Claps,TriKota
Enabled all Forward Packages
0) MPI_DEBUG => Test case MPI_DEBUG was not run! => Does not affect push readiness! (-1.00 min)
1) SERIAL_RELEASE => Test case SERIAL_RELEASE was not run! => Does not affect push readiness! (-1.00 min)
2) MPI_RELEASE_DEBUG_SHARED_ST => passed: passed=696,notpassed=0 (85.66 min)
3) SERIAL_RELEASE_SHARED_ST => passed: passed=548,notpassed=0 (11.34 min)
4) MPI_RELEASE_DEBUG_STATIC_ST => passed: passed=696,notpassed=0 (49.19 min)
5) SERIAL_RELEASE_STATIC_ST => passed: passed=548,notpassed=0 (10.31 min)
@bartlettroscoe
Copy link
Member Author

NOTE: With the commit d960992, now ParMETIS is enabled for Zoltan2 and ShyLU for the SEMS Dev Env builds. This enables and runs the tests:

511/696 Test  #50: Zoltan2_Partitioning1_ParMETIS_VWeights_MPI_4 .....................   Passed    1.52 sec
517/696 Test  #48: Zoltan2_Partitioning1_ParMETIS_MPI_4 ..............................   Passed    1.73 sec
526/696 Test  #52: Zoltan2_Partitioning1_ParMETIS_EWeights_MPI_4 .....................   Passed    1.44 sec
527/696 Test  #51: Zoltan2_Partitioning1_ParMETIS_OneProc_VWeights_MPI_4 .............   Passed    1.42 sec
528/696 Test  #49: Zoltan2_Partitioning1_ParMETIS_OneProc_MPI_4 ......................   Passed    1.49 sec
534/696 Test  #53: Zoltan2_Partitioning1_ParMETIS_OneProc_EWeights_MPI_4 .............   Passed    1.41 sec
695/696 Test  #89: ShyLUCore_iqr_driver_with_parmetis_MPI_4 ..........................   Passed    1.86 sec
696/696 Test  #88: ShyLUCore_epetra_interface_with_parmetis_MPI_4 ....................   Passed    1.96 sec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants