Set up new Intel 17.x build to use as auto PR build #2463

bartlettroscoe · 2018-03-27T20:26:04Z

CC: @trilinos/framework, @mhoemmen, @rppawlo, @ibaned, @crtrott

Next Action Status

Intel 17.0.1 PR builds running since 6/1/2018

Description

This Issue is to scope out and track efforts to create an Intel 17.x build that matches the auto PR build described in #2317 (comment).

The settings for this build are:

Intel 17.x with GCC 4.9.x standard C++ headers using the SEMS env
TPL_ENABLE_MPI=ON (OpenMPI 2.x)
Primary Tested Packages
Primary Tested TPLs
BUILD_SHARED_LIBS=ON
CMAKE_BUILD_TYPE=RELEASE
Trilinos_ENABLE_DEBUG=OFF
Trilinos_ENABLE_EXPLICIT_TEMPLATE_INSTANTIATION=ON
Xpetra_ENABLE_Experimental=ON
MueLu_ENABLE_Experimental=ON
Trilinos_TRACE_ADD_TEST=ON
Trilinos_TEST_CATEGORIES=BASIC

The existing GCC 4.8.4 CI build shown here that has been running for 1.5+ years may be a good foundation for this build since it has most of the options already set and the Trilinos/cmake/load_sems_env.sh script already allows setting different compilers.

Tasks:

Select the version of Intel and OpenMPI from the SEMS env:
a. NOTE: SEMS only provides sems-intel/17.0.1.
b. NOTE: The highest version of OpenMPI provided by SEMS in sems-openmpi/1.10.1.
Set up a trial build using these settings and test locally ...
Set up a Nightly Jenkins build submitting to the "Specialized" CDash Track/Group ...
Clean up all failures in the new build ...
???

Related Issues:

Part of: Select set of builds for initial mandatory auto PR testing process #2317

The text was updated successfully, but these errors were encountered:

bartlettroscoe · 2018-03-27T23:16:04Z

Found a problem with this plan for the Intel 17.x build. It looks like the SEMS env does not provide any builds of OpenMPI 2.x. All that seems to be available is:

sems-openmpi/1.10.1                                                                                                                         
sems-openmpi/1.6.5                                                                                                                          
sems-openmpi/1.8.7

Now they do have:

sems-mpich/3.2

Seems like it would be a good idea to us a different MPI implementation for one of the builds. I know in CASL that we switched to MPICH because it caught more errors than OpenMPI at the time. Is this still true?

As for Intel 17.x, it looks like the only version supported by SEMS is:

sems-intel/17.0.1

Is it a problem then if we go with Intel 17.0.1 + OpenMPI 1.8.7? Is there really any value in going with OpenMPI 1.10.1? I can try it if people think that is useful. Otherwise, should we ask SEMS to install a software stack with OpenMPI 2.x? That could take a while and it seems like they need to retire an OpenMPI version (like 1.6.5) before they add another OpenMPI version.

nmhamster · 2018-03-27T23:50:51Z

@bartlettroscoe - we have generally found OpenMPI to be every bit as good as MPICH. More important, we use OpenMPI on the testbeds, CTS and it will underpin IBM Spectrum MPI on ATS2. In an ideal world we would test both MPI variants, but if we have to pick one I would select OpenMPI because of the CTS use.

bartlettroscoe · 2018-03-28T00:59:08Z

I just looked and it seems that openmpi/1.10.4 is being used for the ATDM build on 'white' and 'ride' and openmpi/2.1.1 is being used on 'hansen' and 'shiller'. So it would be nice to be able to test with OpenMPI 2.x in this Intel build but is not available. So in that case, it seems we should use OpenMPI 1.10.1 which is provided by the SEMS env for this Intel 17.x build.

bartlettroscoe · 2018-03-28T01:16:44Z

we have generally found OpenMPI to be every bit as good as MPICH.

@nmhamster, how do you define "good"? In the CASL case, we found that MPICH found errors in the usage of MPI that OpenMPI did not. I would have to dig up what versions those were that that was our experience. We did not care if OpenMPI ran faster than MPICH or visa versa because this was just for our test env. If I remember right (since it was many years ago), there was a defect in Tpetra MPI usage that OpenMPI let pass but when we ran CASL VERA on another machine, it bombed. It took a long time to debug to find the issue.

Anyway, given that OpenMPI is the target for ATS-2, it seems like a good choice for our testing.

mhoemmen · 2018-03-28T03:30:38Z

OpenMPI 1.10.x implements the bits of MPI 3 that Tpetra optionally uses (with macros for the MPI version). For GPU builds, it's better to use newer versions of OpenMPI, but for CPU builds, I'm less worried about that for now.

rppawlo · 2018-03-28T13:24:25Z

In the CASL case, we found that MPICH found errors in the usage of MPI that OpenMPI did not.

I can't remember all of the CASL errors, but one of the easier ones to diagnose was that openmpi allowed for aliasing of arrays which was technically not allowed in the standard and mpich automatically flagged those uses in casl code.

mhoemmen · 2018-03-28T15:32:45Z

@rppawlo That's a good point -- it would be helpful to have an extra Dashboard test for other MPI implementations.

bartlettroscoe · 2018-03-28T15:37:42Z

That's a good point -- it would be helpful to have an extra Dashboard test for other MPI implementations.

So should we try MPICH for this Intel 17.0.1 build or the GCC 4.8.4 build? Note that OpenMPI 1.8.7 causes 30 test timeouts with the GCC 4.8.4 build as described in #2462 (comment). I am currently testing OpenMPI 1.10.1 with that GCC 4.8.4 build.

mhoemmen · 2018-03-28T16:04:19Z

@bartlettroscoe I very deliberately said "Dashboard" not necessarily PR ;-) . I would welcome more MPI options for PR testing, but I would rather have mandatory PR testing sooner than have multiple MPIs in PR testing later :-) .

I would say, OpenMPI 1.10.x w/ GCC 4.8.4, and MPICH w/ Intel 17.0.1.

prwolfe · 2018-04-17T16:33:47Z

Working on this now - notes

Matches our setup:

Intel 17.x with GCC 4.9.x standard C++ headers using the SEMS env
TPL_ENABLE_MPI=ON (mopish 3.2)
Primary Tested Packages
Primary Tested TPLs
BUILD_SHARED_LIBS=ON
CMAKE_BUILD_TYPE=RELEASE
Trilinos_ENABLE_DEBUG=_ON_
Trilinos_ENABLE_EXPLICIT_TEMPLATE_INSTANTIATION=ON (This is actually Trilinos_ENABLE_EXPLICIT_INSTANTIATION=ON)
Trilinos_TRACE_ADD_TEST=ON
Trilinos_TEST_CATEGORIES=BASIC

New stuff to look at. Why experimental code in PR instead of specialized or experimental tracks?

Xpetra_ENABLE_Experimental=ON
MueLu_ENABLE_Experimental=ON

I will take a few days to get this much set up and working as I want to refactor the driver script as it is.

Paul

mhoemmen · 2018-04-19T05:48:51Z

@prwolfe I don't think we actually need the MueLu and Xpetra "experimental" options enabled:

#2317 (comment)

but I'm not sure if the ATDM builds have disabled these options (yet).

prwolfe · 2018-04-19T12:35:57Z

Thanks for the reference to that discussion @mhoemmen. That matches my instincts as well!

This could be used, for example, for the Intel 17 build in trilinos#2463.

Kokkos is not using Pthread so don't name the build 'Pthread'. The Pthread TPL is enabled to allow other testing.

This could be used, for example, for the Intel 17 build in trilinos#2463.

It was requested that we use GCC 4.9.3 heades with Intel 17.0.1 builds of Trilinos (see trilinos#2317 and trilinos#2463).

bartlettroscoe · 2018-04-24T14:12:21Z

One option for this Intel build is to use the SEMS Dev Env setup documented in:

https://github.com/trilinos/Trilinos/wiki/SEMS-Dev-Env

You basically just source:

$ source <trilinos-base-dir>/cmake/load_sems_dev_env.sh sems-intel/17.0.1

and then configure Trilinos using the option:

  -C <trilinos-base-dir>/Trilinos/cmake/std/MpiReleaseSharedPtSerial.cmake \

Using the new aggregate file MpiReleaseSharedPtSerial.cmake in PR #2609, you just configure with:

$ source <trilinos-base-dir>/cmake/load_sems_dev_env.sh sems-intel/17.0.1

$ cmake \
  -C <trilinos-base-dir>/cmake/std/MpiReleaseSharedPtSerial.cmake \
  -DTrilinos_ENABLE_TESTS=ON -DTrilinos_ENABLE_<PKG0> -DTrilinos_ENABLE_<PKG1> ... \
  -DTrilinos_ENABLE_ALL_FORWARD_DEP_PACKAGES=ON \
  <trilinos-base-dir>

and that is it!

If we want to allow for tweaks (like some specific tests that need to be temporarily disabled), then we might create a new file called something like Intel-17.0.1-PrBuild.cmake that contains the same includes as MpiReleaseSharedPtSerial.cmake.

Yesterday I tested the full set of Primary Tested packages and TPLs on my machine crf450 with an experimental all-at-once build, test, and summit (with CMake 3.10.1) and it submitted to:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&date=2018-04-19&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Linux-MPI_RELEASE_DEBUG_SHARED_PT&field2=buildstamp&compare2=61&value2=20180424-0225-Experimental

This showed 7 failing tests for the packages:

MueLu: 4
Kokkos: 1
Zoltan: 1
Zoltan2: 1

That build took:

Configure: 2m 41s
Build: 3h 27m 19s
Test: 22m 5s

That is nearly 4 hours to run. Is that too long for an auto PR build?

That ran the test categories NIGHTLY. The Zoltan tests alone took a "Proc Time" of 1h 38m. Should we run the test categories BASIC tests instead? That would cut down on the time a little bit. But perhaps saving that little bit of time is not worth it since we are building from scratch every time?

See details below.

We could set up a "Specialized" build for this that runs nightly and then get this cleaned up.

DETAILS: (click to expand)

$ cd ~/Trilinos.base/BUILDS/INTEL-17.0.1/MPI_RELEASE_DEBUG_SHARED_PT/

$ rm -r CMake*

$ source ~/Trilinos.base/Trilinos/cmake/load_sems_dev_env.sh sems-intel/17.0.1
        WARNING: sems-gcc dependency already found but does not match listed dependency sems-gcc/4.7.2
        I will use the sems-gcc you have loaded but correct behavior is not guaranteed

$ export PATH=/home/vera_env/common_tools/cmake-3.10.1/bin:$PATH

$ which cmake
/home/vera_env/common_tools/cmake-3.10.1/bin/cmake

$ cmake --version
cmake version 3.10.1

$ time cmake \
  -C ../../../Trilinos/cmake/std/MpiReleaseDebugSharedPtSerial.cmake \
  -DTrilinos_CTEST_DO_ALL_AT_ONCE=ON \
  -DTrilinos_CTEST_USE_NEW_AAO_FEATURES=ON \
  -DCTEST_BUILD_FLAGS=-j16 \
  -DCTEST_PARALLEL_LEVEL=16 \
  -DTrilinos_ENABLE_ALL_PACKAGES=ON \
  ../../../Trilinos \
  &> configure.out

real    5m30.441s
user    0m22.995s
sys     0m17.851s

$ time make dashboard &> make.dashboard.out

real    234m34.838s
user    2752m21.065s
sys     143m9.997s

That submitted to:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&date=2018-04-19&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=2&showfilters=1&filtercombine=and&field1=buildname&compare1=61&value1=Linux-MPI_RELEASE_DEBUG_SHARED_PT&field2=buildstamp&compare2=61&value2=20180424-0225-Experimental

Use GCC 4.9.3 headers with Intel 17.0.1 (#2463, #2317)

bartlettroscoe · 2018-06-23T14:08:45Z

This has been done since about 6/1/2018 as shown in this query run just now.

Closing as complete.

bartlettroscoe added the client: ATDM Any issue primarily impacting the ATDM project label Mar 27, 2018

bartlettroscoe mentioned this issue Mar 27, 2018

Select set of builds for initial mandatory auto PR testing process #2317

Closed

bartlettroscoe added this to the Improve productivity, stability, and quality of Trilinos milestone Mar 27, 2018

bartlettroscoe added the type: enhancement Issue is an enhancement, not a bug label Apr 3, 2018

prwolfe self-assigned this Apr 19, 2018

prwolfe mentioned this issue Apr 20, 2018

New Driver script and intel config file for PR testing #2606

Merged

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 21, 2018

Add driver for MPI, Release serial Kokkos threading (trilinos#2463)

950fe4f

This could be used, for example, for the Intel 17 build in trilinos#2463.

bartlettroscoe mentioned this issue Apr 21, 2018

Provide a single file MpiReleaseDebugSharedPtSerial.cmake that can be read in with cmake -C (#2462) #2609

Merged

9 tasks

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 23, 2018

Allow Pthread TPL to be enabled for Kokkos Serial build (trilinos#2463)

b3a59be

Kokkos is not using Pthread so don't name the build 'Pthread'. The Pthread TPL is enabled to allow other testing.

bartlettroscoe added a commit to bartlettroscoe/Trilinos that referenced this issue Apr 23, 2018

Add driver for MPI, Release serial Kokkos threading (trilinos#2463)

579a77b

This could be used, for example, for the Intel 17 build in trilinos#2463.

bartlettroscoe mentioned this issue Apr 24, 2018

Use GCC 4.9.3 headers with Intel 17.0.1 (#2463, #2317) #2623

Merged

9 tasks

bartlettroscoe mentioned this issue Apr 24, 2018

Framework: Allow commits to non-code directories w/o autotesting #2594

Closed

bartlettroscoe added a commit that referenced this issue Apr 24, 2018

Merge pull request #2623 from bartlettroscoe/2463-intel-17.0.1-gcc-4.9.3

ea34bbb

Use GCC 4.9.3 headers with Intel 17.0.1 (#2463, #2317)

bartlettroscoe closed this as completed Jun 23, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set up new Intel 17.x build to use as auto PR build #2463

Set up new Intel 17.x build to use as auto PR build #2463

bartlettroscoe commented Mar 27, 2018 •

edited

Loading

bartlettroscoe commented Mar 27, 2018

nmhamster commented Mar 27, 2018

bartlettroscoe commented Mar 28, 2018

bartlettroscoe commented Mar 28, 2018

mhoemmen commented Mar 28, 2018 •

edited

Loading

rppawlo commented Mar 28, 2018

mhoemmen commented Mar 28, 2018

bartlettroscoe commented Mar 28, 2018

mhoemmen commented Mar 28, 2018

prwolfe commented Apr 17, 2018

mhoemmen commented Apr 19, 2018

prwolfe commented Apr 19, 2018

bartlettroscoe commented Apr 24, 2018

bartlettroscoe commented Jun 23, 2018

Set up new Intel 17.x build to use as auto PR build #2463

Set up new Intel 17.x build to use as auto PR build #2463

Comments

bartlettroscoe commented Mar 27, 2018 • edited Loading

Next Action Status

Description

Tasks:

Related Issues:

bartlettroscoe commented Mar 27, 2018

nmhamster commented Mar 27, 2018

bartlettroscoe commented Mar 28, 2018

bartlettroscoe commented Mar 28, 2018

mhoemmen commented Mar 28, 2018 • edited Loading

rppawlo commented Mar 28, 2018

mhoemmen commented Mar 28, 2018

bartlettroscoe commented Mar 28, 2018

mhoemmen commented Mar 28, 2018

prwolfe commented Apr 17, 2018

mhoemmen commented Apr 19, 2018

prwolfe commented Apr 19, 2018

bartlettroscoe commented Apr 24, 2018

bartlettroscoe commented Jun 23, 2018

bartlettroscoe commented Mar 27, 2018 •

edited

Loading

mhoemmen commented Mar 28, 2018 •

edited

Loading