Select set of builds for initial mandatory auto PR testing process #2317

bartlettroscoe · 2018-03-01T22:23:03Z

CC: @trilinos/framework

Next Action Status

This ship has sailed on "Initial" a long time ago. The only remaining build in the CUDA PR build and that is being tracked in #2464.

Description

This story is to involve interested members of the Trilinos team to collaborate to select the best build configurations in order to allow the new Trilinos auto PR testing and merging tool and process (#1155) to become the mandatory way to test and push to the main Trilinos ‘develop’ branch (#2312). This selection must be done considering that there are limited computing resources (on the Jenkins build farms) to run automated builds. So while we would like to run many different useful builds as part of pre-push automated PR testing, we have to be strategic about what builds we run where to not overwhelm current capacity. As more build machines are added to the Jenkins build farm, more builds can be added to the auto PR testing process.

This story is a follow on from the action item:

ACTION (Ross): Set up a separate meeting to discuss what that build / those builds (if more than one) should be

in the 2018-02-26 Trilinos Planning Meeting.

The set of Trilinos team members interested in being part of this discussion (and meetings) include:

Since this selection of these builds will impact every Trilinos developer and every close customer and collaboration of Trilinos, it is important that we get input from many different people in making this selection.

Definition of Done

Document conversation between Trilinos developers on the selection of these builds
New build configurations selected (with actual Trilinos configuration files)
Trial builds of Trilinos posted to CDash for the chosen configurations

Related Issues

Task

Find initial selection of team members interested to discuss this topic [DONE]
Set up and have meeting with working group [DONE]
Create initial set of builds in meeting [DONE]
Create concrete *.cmake files for each proposed configuration and set up nightly builds sumitting to "Specialized" CDash Track/Group:
a. GCC 4.8.4: See Modify existing GCC 4.8.4 CI build to match selected auto PR build #2462
b. Intel 17.x: See Set up new Intel 17.x build to use as auto PR build #2463
c. CUDA: See Set up a CUDA build for an auto PR build #2464
???

The text was updated successfully, but these errors were encountered:

bartlettroscoe · 2018-03-01T22:41:52Z

I sent the following email. I will give it until the week of 3/12/2-18 after the SIAM PP conference to have this meeting. That should be urgent enough.

To: sandia-trilinos-developers@software.sandia.gov
Cc: trilinos-framework@software.sandia.gov
Subject: [Trilinos-Framework] Interested in selection of builds used in mandatory auto PR testing and merge process?

Hello Sandia Trilinos Developers,

Following up on the action item I was given at the Trilinos Developers Planning meeting on Monday with notes listed at:

https://docs.google.com/document/d/1JClLSR3n79XJT_yLPPoQv_e3rJToHxl8fUzRYojtY5Y

I am going to organize a discussion on the selection of the set of builds the auto PR testing process should be using before it becomes the required way to test and push to the Trilinos ‘develop’ branch.

Therefore, if you are interested in being part of this discussion, please add your name in the first comment in the existing list under the “Description” section in the new issue:

#2317

and then I will try to organize a meeting of all the interested parties.

Since this selection of build will impact every Trilinos developer and every close customer and collaboration of Trilinos, it is important that we get input from many different people in making this selection.

Thanks,

-Ross

csiefer2 · 2018-03-02T17:14:40Z

Any machine used for PR testing must be accessible (either directly or by an identical system) by all developers so they can fix issues that PR testing exposes.

mhoemmen · 2018-03-03T21:59:06Z

I'm with @csiefer2 -- for example, if we intend to support Windows builds, the Windows Dashboard build should be a VM that we can access (either log into, or download and run on our own) and use for testing.

bartlettroscoe · 2018-03-03T22:29:51Z

I don't think there is no way that we would make a Windows build a auto PR build or even a CI build anytime soon. There are so many other builds that more important than that.

But I think the rule should be that before any build can enter the "Nightly", "Clean", or "Auto PR" category, that it must be easy for any Sandia staff member to access a machine where the build can be exactly reproduced. I agree with @csiefer2 and @mhoemmen on this point.

bartlettroscoe · 2018-03-06T18:44:01Z

FYI: @trilinos/muelu

It appears that the current auto PR testing GCC 4.8.4 posting to CDash never runs any MueLu tests as shown at:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=Pull%20Request&field2=buildstarttime&compare2=84&value2=now&field3=subprojects&compare3=93&value3=MueLu

Not sure why that is because the standard CI build set up for last 1.5 has been running MueLu tests without any issues (except for real failures) as shown in recent history at:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercount=4&showfilters=1&filtercombine=and&field1=buildname&compare1=66&value1=-MPI_RELEASE_DEBUG_SHARED_PT_CI&field2=groupname&compare2=61&value2=Continuous&field3=buildstarttime&compare3=84&value3=now&field4=subprojects&compare4=93&value4=MueLu

bartlettroscoe · 2018-03-06T18:45:33Z

FYI: @trilinos/anasazi

It also appears that the auto PR builds are not running any Anazazi tests either as shown by:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=Pull%20Request&field2=buildstarttime&compare2=84&value2=now&field3=subprojects&compare3=93&value3=Anasazi

bartlettroscoe · 2018-03-06T18:48:10Z

FYI: @trilinos/ifpack2

It also appears that the auto PR builds are not running any Ifpack2 tests as shown by:

https://testing-vm.sandia.gov/cdash/index.php?project=Trilinos&filtercombine=and&filtercombine=and&filtercombine=and&filtercount=3&showfilters=1&filtercombine=and&field1=groupname&compare1=61&value1=Pull%20Request&field2=buildstarttime&compare2=84&value2=now&field3=subprojects&compare3=93&value3=Ifpack2

jwillenbring · 2018-03-06T21:30:00Z

@bartlettroscoe Thank you for the reminder about the MueLu, Ifpack2, and Anasazi tests. These packages had test failures a while back and were disabled for that reason (needed 100% passing). Some of these issues I believe have been resolved. I am testing that now.

bartlettroscoe · 2018-03-06T21:33:55Z

Thank you for the reminder about the MueLu, Ifpack2, and Anasazi tests. These packages had test failures a while back and were disabled for that reason (needed 100% passing). Some of these issues I believe have been resolved. I am testing that now.

@jwillenbring,

Just the failing tests should be disabled, not all the tests for a package. You write a GitHub issue for the failing test(s), then disable just those tests. Generally one wants to disable with a scapulae, not a machete.

jwillenbring · 2018-03-06T21:40:19Z

Just the failing tests should be disabled, not all the tests for a package. You write a GitHub issue for the failing test(s), then disable just those tests. Generally one wants to disable with a scapulae, not a machete.

Agreed. At the time I didn't want the issues to be lost and not notice that the tests were disabled. Much of this has to do with ETI.

bartlettroscoe · 2018-03-06T22:05:54Z

At the time I didn't want the issues to be lost and not notice that the tests were disabled. Much of this has to do with ETI.

If the developers of those packages don't care to fix the broken tests for GitHub issues associated with their packages then that is on them. And you only disable the tests just for that particular build, not for all builds. Look at the git log file for cmake/std/BasicCiTestingSettings.cmake to see how to do this. These tests will still show up failing in "Clean", "Nightly" and other builds but at least they will not mess up other developers PR testing. That way, no one will forget these failed tests. But if they let the tests fail for more than a day or two, then these tests need to be disabled in "Clean" and "Nightly" as well.

It is important to fix this quickly because developers might be thinking that the auto PR builds are testing Ifpack2, Anasasi, and MueLu but they are not. This means that the auto PR builds have even a higher chance of breaking tests in these packages.

mhoemmen · 2018-03-07T16:38:51Z

Wait, what?!? Ifpack2 and MueLu tests were totally disabled?!?

bartlettroscoe · 2018-03-12T20:02:00Z

Hello, it is after SIAM PP. Can we set up a meeting on this?

bartlettroscoe · 2018-03-12T22:32:49Z

@jwillenbring and/or @bmpersc,

Do you guys want to be included in this meeting? If so, please add your GitHub IDs to the list in the above Description field. I will then try to set up a meeting using people's SNL calendars to select a time that works for everyone. (Or if I can't find a time, I will set up a dreaded doodle.com poll.)

bartlettroscoe · 2018-03-13T21:14:23Z

One issue to consider is that according to:

https://gcc.gnu.org/wiki/openmp

GCC 4.8.4 should implement OpenMP 3.1 while GCC 4.9.3 should implement OpenMP 4.0.

Is that an issue for testing OpenMP code with Trilinos? How much difference is there in the Kokkos implementation of threading of OpenMP 3.1 vs OpenMP 4.0? Or is this not a concern because updates to Kokkos do testing with many different compilers before merging to Trilinos 'develop'?

If we are not worried about the OpenMP 3.1 vs. OpenMP 4.0 issue, then a single GCC 4.8.4 build for auto PR testing would seem to be okay.

nmhamster · 2018-03-13T21:15:54Z

@bartlettroscoe - OpenMP3.1 would be good for now because we have a range of platforms where this is the most up to date complete supported standard.

bartlettroscoe · 2018-03-13T22:24:37Z

@trilinos/shylu developers,

Why are there no tests being run for any of the ShyLU packages in the existing auto PR and CI builds:

?

These packages are listed a Primary Tested (PT) packages but they don't have any tests?

ndellingwood · 2018-03-13T23:06:46Z

@bartlettroscoe each package in each of ShyLU's subpackages is marked either ST or EX:

shylu_dd

shylu_node

I'm not sure why they are being listed as PT packages, the use of

TRIBITS_PACKAGE_DEFINE_DEPENDENCIES(
  SUBPACKAGES_DIRS_CLASSIFICATIONS_OPTREQS
...

in the shylu_node and shylu_dd subpackages seems to fit the example from tribits dev guide example. Does anything stand out that clearly needs to be changed?

Opening a new issue after speaking with @srajama1 to get packages migrated to PT testing, #2375

Edit: Added reference to ShyLU issue.

bartlettroscoe · 2018-03-15T16:43:59Z

NOTE: Defects like #2391 show why we need to be enabling OpenMP in our auto PR build.

bartlettroscoe · 2018-03-16T12:32:21Z

CC: @trilinos/zoltan2

I was just reminded today by #2397 that the auto PR builds based on SEMS should also enable the Scotch TPL. That CI build (and the builds derived from it) are the only automated builds of Trilinos that enable Scotch and therefore are the only builds that run these tests that depend on Scotch. See #2065.

bartlettroscoe · 2018-04-23T13:20:51Z

@mhoemmen said:

The Pthread TPL gets automatically detected and enabled by default. (There are reasons to enable it that have nothing to do with thread parallelism.) Users found it unpleasantly surprising to get thread parallelism when they didn't ask for it explicitly. Thus, Kokkos and Tpetra historically did not enable the Pthreads back-end unless explicitly requested. Tpetra treats Kokkos::Threads as a last resort in terms of defaults.

That makes sense. I just remembered that the GCC C++11 threads library requires you link in the pthread lib in order to work. This is used, for example, in the threaded testing of the new thread-safe Teuchos Memory Management class mode.

What is the impact then of enabling the Pthread TPL and OpenMP as long as Kokkos just uses OpenMP?

ibaned · 2018-04-23T13:54:03Z

@bartlettroscoe enabling the Pthread TPL and ~~(only)~~ the Kokkos OpenMP backend should be just fine, I think thats what currently happens in most Trilinos builds that enable OpenMP.

Edit: the most common configuration enables the Kokkos OpenMP and Serial backends.

bartlettroscoe · 2018-04-24T13:08:37Z

Edit: the most common configuration enables the Kokkos OpenMP and Serial backends.

@ibaned, does this mean that we should or should not allow the enable of the Pthread TPL when we configure Trilinos with OpenMP or Serial Kokkos backends?

@mhoemmen, other than some than some tests that need std:: C++11 threading to run, what other reasons are there for enabling the Pthread TPL when configuring Trilinos, even if Kokkos will use a different (or no) threading backend?

It was requested that we use GCC 4.9.3 heades with Intel 17.0.1 builds of Trilinos (see trilinos#2317 and trilinos#2463).

ibaned · 2018-04-24T13:52:00Z

The Pthread TPL is fine to enable regardless of Kokkos backends. It doesn't cause any issues with OpenMP or otherwise. It is also fine to disable, unless the Kokkos Threads backend is used.

mhoemmen · 2018-04-24T15:45:33Z

@bartlettroscoe wrote:

... what other reasons are there for enabling the Pthread TPL when configuring Trilinos, even if Kokkos will use a different (or no) threading backend?

I actually searched Trilinos for pthread_ and found nothing. Teuchos requires C++11 so we should all use the C++ Standard Library stuff like std::call_once and std::mutex if we need coarse-grained thread synchronization.

mhoemmen · 2018-04-24T15:47:39Z

@bartlettroscoe My above comments suggest that maybe we don't need the Pthread TPL at all. I've been fine with it enabled but perhaps it's not necessary. On the other hand, sometimes OpenMP implementations or other TPLs need it, so I'm not sure we should just turn it off by default.

bartlettroscoe · 2018-04-24T21:55:17Z

My above comments suggest that maybe we don't need the Pthread TPL at all. I've been fine with it enabled but perhaps it's not necessary. On the other hand, sometimes OpenMP implementations or other TPLs need it, so I'm not sure we should just turn it off by default.

@mhoemmen, the experience with the Teuchos MM classes thread-safe work with that you need to explicitly link in the -lpthread lib if you want to use C++11 threading support. This is needed to run the multi-threaded unit tests.

Use GCC 4.9.3 headers with Intel 17.0.1 (#2463, #2317)

mhoemmen · 2018-04-25T03:21:37Z

@bartlettroscoe good point; not sure if actual C++11 implementations are supposed to require that (vs. the -std=gnu-c++0x stuff) but it's worth checking

bartlettroscoe · 2018-04-25T16:49:52Z

FYI: As pointed out by @etphipp in #2628 (comment), setting:

 -D MPI_EXEC_PRE_NUMPROCS_FLAGS="--bind-to;none"

seems to fix the problem of OpenMP threads all binding to the same core on a RHEL6 machine. Perhaps this will let us enable OpenMP for the GCC 4.8.4 auto PR build being set up in #2462?

bartlettroscoe · 2018-05-03T21:23:51Z

FYI: As mentioned in the new issue #2674, EMPIRE is now enabling the options:

-D MueLu_ENABLE_Kokkos_Refactor:BOOL=ON \
-D Xpetra_ENABLE_Kokkos_Refactor:BOOL=ON \
-D MueLu_ENABLE_Kokkos_Refactor_Use_By_Default:BOOL=ON \

Therefore, the PR builds should enable these as well. This is the same argument for why we agreed to add the "Experimental" enables for Xpetra and MueLu described above.

This was the agreement as part trilinos#2317. NOTE: This is using 'mpiexec --bind-to none ...' to avoid pinning the threads in differnet MPI ranks to the same cores. See trilinos#2422.

….1-and-ninja Use atdm-cmake/3.11.1 module and Ninja for GCC 4.8.4 + OpenMPI 1.10.1 + OpenMP build. This should be the build that satisfies the GCC auto PR build in #2317 and #2462.

jwillenbring · 2018-10-01T17:39:24Z

@bartlettroscoe Since the PR builds have been running for some time I propose we close this ticket and deal with additional issues in other tickets.

bartlettroscoe · 2018-10-02T00:34:43Z

@bartlettroscoe Since the PR builds have been running for some time I propose we close this ticket and deal with additional issues in other tickets.

@jwillenbring, sure, the only remaining build is the CUDA build in #2464 and we have a plan for that (see the issue).

Closing.

bartlettroscoe added this to the Improve productivity, stability, and quality of Trilinos milestone Mar 1, 2018

bartlettroscoe mentioned this issue Mar 1, 2018

Issues to be address before making automated PR testing and merging mandatory #2312

Closed

mhoemmen mentioned this issue Mar 7, 2018

Correct warnings in the test code #2343

Merged

6 tasks

ibaned mentioned this issue Mar 7, 2018

Correct some of the promotion documentation kokkos/kokkos#1467

Merged

This was referenced Mar 8, 2018

Update changedPackages.bash to reenable tests for PR testing jwillenbring/Trilinos#25

Closed

Update changedPackages.bash to add additional package tests to PR testing #2353

Merged

This was referenced Mar 13, 2018

Fix pragma error with gcc/4.8.4 kokkos/kokkos-kernels#181

Merged

-Werror gcc/4.8.4: unknown-pragmas #pragma omp simd with OpenMP build kokkos/kokkos-kernels#180

Closed

bartlettroscoe mentioned this issue Mar 15, 2018

NOX: Forgot counter initialization in updated PrePostOperator test #2391

Merged

9 tasks

bartlettroscoe mentioned this issue Mar 19, 2018

[Shyu/Tacho] Tacho GPU tasking #2234

Merged

bartlettroscoe mentioned this issue Apr 24, 2018

Use GCC 4.9.3 headers with Intel 17.0.1 (#2463, #2317) #2623

Merged

9 tasks

bartlettroscoe added a commit that referenced this issue Apr 24, 2018

Merge pull request #2623 from bartlettroscoe/2463-intel-17.0.1-gcc-4.9.3

ea34bbb

Use GCC 4.9.3 headers with Intel 17.0.1 (#2463, #2317)

bartlettroscoe mentioned this issue May 3, 2018

Add MueLu "Refactor" enables to auto PR and ATDM Trilinos builds #2674

Closed

bartlettroscoe added Framework tasks Framework tasks (used internally by Framework team) client: ATDM Any issue primarily impacting the ATDM project labels May 3, 2018

bartlettroscoe mentioned this issue May 8, 2018

Three ShyLU_DDFROSch_test_frosch_XXX tests failing in new GCC 4.8.4 + OpenMPI 1.10.1 + OpenMP build #2691

Closed

bartlettroscoe mentioned this issue May 9, 2018

Test Teko_testdriver_tpetra_MPI_1 is failing in new GCC 4.8.4 + OpenMPI 1.10.1 + OpenMP build #2712

Closed

bartlettroscoe mentioned this issue May 21, 2018

Replace existing GCC 4.8.4 auto PR serial Kokkos build with updated GCC 4.8.4 + OpenMPI 1.10.1 + OpenMP configuration #2788

Closed

bartlettroscoe added the type: enhancement Issue is an enhancement, not a bug label May 22, 2018

This was referenced May 30, 2018

Temp disable three ShyLU_DD tests for this OpenMP build (#2691) #2841

Merged

Switch default checkin-test-sems.sh build and post-push CI build to use updated GCC 4.8.4 + OpenMPI 1.10.1 + OpenMP configuration #2851

Merged

bartlettroscoe mentioned this issue Jun 14, 2018

Address basic stability of Trilinos 'develop' branch short-term #1304

Closed

bartlettroscoe closed this as completed Oct 2, 2018

trilinos-autotester mentioned this issue Jun 17, 2021

Tpetra: Use HIPSpace for HIPWrapperNode #9279

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Select set of builds for initial mandatory auto PR testing process #2317

Select set of builds for initial mandatory auto PR testing process #2317

bartlettroscoe commented Mar 1, 2018 •

edited

Loading

bartlettroscoe commented Mar 1, 2018 •

edited

Loading

csiefer2 commented Mar 2, 2018

mhoemmen commented Mar 3, 2018

bartlettroscoe commented Mar 3, 2018

bartlettroscoe commented Mar 6, 2018

bartlettroscoe commented Mar 6, 2018

bartlettroscoe commented Mar 6, 2018 •

edited

Loading

jwillenbring commented Mar 6, 2018

bartlettroscoe commented Mar 6, 2018

jwillenbring commented Mar 6, 2018

bartlettroscoe commented Mar 6, 2018

mhoemmen commented Mar 7, 2018

bartlettroscoe commented Mar 12, 2018

bartlettroscoe commented Mar 12, 2018

bartlettroscoe commented Mar 13, 2018

nmhamster commented Mar 13, 2018

bartlettroscoe commented Mar 13, 2018

ndellingwood commented Mar 13, 2018 •

edited

Loading

bartlettroscoe commented Mar 15, 2018

bartlettroscoe commented Mar 16, 2018 •

edited

Loading

bartlettroscoe commented Apr 23, 2018

ibaned commented Apr 23, 2018 •

edited

Loading

bartlettroscoe commented Apr 24, 2018

ibaned commented Apr 24, 2018

mhoemmen commented Apr 24, 2018 •

edited

Loading

mhoemmen commented Apr 24, 2018

bartlettroscoe commented Apr 24, 2018

mhoemmen commented Apr 25, 2018

bartlettroscoe commented Apr 25, 2018

bartlettroscoe commented May 3, 2018

jwillenbring commented Oct 1, 2018

bartlettroscoe commented Oct 2, 2018

Select set of builds for initial mandatory auto PR testing process #2317

Select set of builds for initial mandatory auto PR testing process #2317

Comments

bartlettroscoe commented Mar 1, 2018 • edited Loading

Next Action Status

Description

Definition of Done

Related Issues

Task

bartlettroscoe commented Mar 1, 2018 • edited Loading

csiefer2 commented Mar 2, 2018

mhoemmen commented Mar 3, 2018

bartlettroscoe commented Mar 3, 2018

bartlettroscoe commented Mar 6, 2018

bartlettroscoe commented Mar 6, 2018

bartlettroscoe commented Mar 6, 2018 • edited Loading

jwillenbring commented Mar 6, 2018

bartlettroscoe commented Mar 6, 2018

jwillenbring commented Mar 6, 2018

bartlettroscoe commented Mar 6, 2018

mhoemmen commented Mar 7, 2018

bartlettroscoe commented Mar 12, 2018

bartlettroscoe commented Mar 12, 2018

bartlettroscoe commented Mar 13, 2018

nmhamster commented Mar 13, 2018

bartlettroscoe commented Mar 13, 2018

ndellingwood commented Mar 13, 2018 • edited Loading

bartlettroscoe commented Mar 15, 2018

bartlettroscoe commented Mar 16, 2018 • edited Loading

bartlettroscoe commented Apr 23, 2018

ibaned commented Apr 23, 2018 • edited Loading

bartlettroscoe commented Apr 24, 2018

ibaned commented Apr 24, 2018

mhoemmen commented Apr 24, 2018 • edited Loading

mhoemmen commented Apr 24, 2018

bartlettroscoe commented Apr 24, 2018

mhoemmen commented Apr 25, 2018

bartlettroscoe commented Apr 25, 2018

bartlettroscoe commented May 3, 2018

jwillenbring commented Oct 1, 2018

bartlettroscoe commented Oct 2, 2018

bartlettroscoe commented Mar 1, 2018 •

edited

Loading

bartlettroscoe commented Mar 1, 2018 •

edited

Loading

bartlettroscoe commented Mar 6, 2018 •

edited

Loading

ndellingwood commented Mar 13, 2018 •

edited

Loading

bartlettroscoe commented Mar 16, 2018 •

edited

Loading

ibaned commented Apr 23, 2018 •

edited

Loading

mhoemmen commented Apr 24, 2018 •

edited

Loading