Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GitHub Actions as CI provider #775

Merged
merged 9 commits into from
Jun 26, 2020
Merged

Use GitHub Actions as CI provider #775

merged 9 commits into from
Jun 26, 2020

Conversation

bryanwweber
Copy link
Member

@bryanwweber bryanwweber commented Dec 19, 2019

Thanks for contributing code! Please include a description of your change and
check your PR against the list below (for further questions, refer to the
contributing guide).

  • There is a clear use-case for this code change
  • The commit message has a short title & references relevant issues
  • Build passes (scons build & scons test) and unit tests address code coverage

Changes proposed in this pull request

  • Move our CI to GitHub Actions as a single platform

Motivation

As described in Cantera/enhancements#37, we have had a number of issues with our CI services in the past. Briefly, the biggest problem with Travis is that Miniconda must be downloaded and installed for every macOS build, and installing alternate Pythons on Ubuntu is not all that easy. With Appveyor, we are limited to one concurrent build across all of the Cantera repositories (at least, the last I checked...).

This PR proposes a switch of our CI service to GitHub Actions.

Advantages of GH Actions:

  • Builds are easier to configure than either Travis or Appveyor because they use composable "actions" specified as steps in a YAML file. Travis or Appveyor require writing shell scripts entirely.
  • The use of YAML configuration makes specifying a build matrix easier, especially when the matrix should vary on something other than just the operating system (e.g., Python version, VS version, etc.)
  • The use of the YAML configuration file makes adding new jobs much easier as well (e.g., a new dependency version). Each job can be encapsulated in its own element within the overall jobs sequence. A new job is totally independent from any other job that is running by default. Try doing that with a shell script... (I have, it's a lot of if...elseif...fi 😂)
  • GitHub Actions offers 60 builds in parallel, due to our non-profit status with GitHub. This is significantly more than any other hosted CI service offers.
  • I'm pretty sure that GH Actions allows jobs to run in a custom container, meaning we could create a container with other OS that we want to test (e.g., Fedora, Gentoo, etc.).

Disadvantages of GH Actions:

  • Yet another CI system to learn and learn how to debug (although I propose removing Appveyor and Travis CI).
  • We still cannot test with Intel compilers, or build/test the MATLAB package. This was true on the other public CI services as well, and remains true with GH Actions. A buildbot instance that we control remains the only way for us to test those proprietary components.
  • We are further tied to GitHub as a platform... if we ever need to leave, recreating this infrastructure will be challenging.
  • There is no way to specify jobs that are "allowed to fail"... for instance, a new version of SUNDIALS, which we know will cause failures, but we don't want to put a red X on the PR when we already know it's a problem. There is an option to mark particular CI jobs as "Required" in the interface, though, which may serve just fine if we want a feature like this.

Summary

Overall, in my opinion, the advantages here (particularly the number of builds in parallel) outweigh the negatives. The extra builds in parallel allow us to test a much broader set of our dependencies, as well as many other jobs/regression tests that would be really nice such as the examples.

@bryanwweber
Copy link
Member Author

Just to write this down somewhere, GitHub now supports "self-hosted runners", machines that the user owns that can run GitHub actions. This would be very useful for us to be able to run MATLAB. However, they recommend that public repositories DO NOT use self-hosted runners due to the possibility of arbitrary code execution: https://help.github.com/en/actions/automating-your-workflow-with-github-actions/adding-self-hosted-runners

@bryanwweber
Copy link
Member Author

So GitHub actions does not yet let us mark a job as "allowed to fail", e.g., for new versions of SUNDIALS that we haven't gotten working yet but we want to run the test anyways. So I think I'll propose that those "allowed to fail" jobs move to Travis (which has this feature) and the "core" builds move to GitHub actions which allows 20 concurrent jobs (if I'm reading the documentation correctly).

@codecov
Copy link

codecov bot commented Feb 10, 2020

Codecov Report

Merging #775 into master will decrease coverage by 0.04%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #775      +/-   ##
==========================================
- Coverage   70.54%   70.49%   -0.05%     
==========================================
  Files         375      375              
  Lines       45349    45649     +300     
==========================================
+ Hits        31993    32182     +189     
- Misses      13356    13467     +111     
Impacted Files Coverage Δ
src/thermo/IonsFromNeutralVPSSTP.cpp 40.11% <0.00%> (-3.17%) ⬇️
src/thermo/IdealSolnGasVPSS.cpp 68.71% <0.00%> (-2.63%) ⬇️
include/cantera/thermo/ConstDensityThermo.h 7.50% <0.00%> (-2.26%) ⬇️
src/numerics/CVodesIntegrator.cpp 75.00% <0.00%> (-2.06%) ⬇️
src/equil/vcs_rxnadj.cpp 78.57% <0.00%> (-1.54%) ⬇️
src/thermo/BinarySolutionTabulatedThermo.cpp 83.87% <0.00%> (-1.38%) ⬇️
src/tpx/CarbonDioxide.cpp 93.87% <0.00%> (-1.25%) ⬇️
src/base/ct2ctml.cpp 77.16% <0.00%> (-1.20%) ⬇️
test_problems/cxx_ex/transport_example1.cpp 82.35% <0.00%> (-0.99%) ⬇️
src/thermo/MultiSpeciesThermo.cpp 75.23% <0.00%> (-0.96%) ⬇️
... and 141 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d321abe...2a06aec. Read the comment docs.

@bryanwweber
Copy link
Member Author

bryanwweber commented Feb 12, 2020

Tracking the progress of the test jobs being added. To be updated.

Existing Travis Tests:

  • Coverage
  • Docs + upload to website (could be in a separate workflow to limit times that it runs)
  • Link with LAPACK

Existing Appveyor Tests:

  • MSVC 14.0, 14.1, and 14.2
  • ...

New tests to add:
- [ ] Most recent versions of all dependencies
- [ ] Tests of the installer
- [ ] Tests on Fedora using a Docker container (not sure if this is possible on GH Actions)
- [ ] mingw32 compiler on Windows

For scope reasons, I think it'd be better to add further new tests in a new PR. See Cantera/enhancements#37 (comment)

This PR should finish up adding the existing tests from Travis and Appveyor before we turn them off.

GitHub has a method to mark certain jobs as "Required" in the settings for the repo, maybe we could use that instead of relying on a second or third CI service for the "optional" jobs as I mentioned before. Or we could just have no optional jobs, and everything is required.

@bryanwweber bryanwweber force-pushed the add-github-workflows branch 8 times, most recently from 60b355f to 6d4cba5 Compare May 27, 2020 13:34
@bryanwweber bryanwweber changed the title [WIP] First try at GitHub actions Use GitHub Actions as CI provider Jun 9, 2020
@bryanwweber
Copy link
Member Author

@speth @decaluwe @ischoegl I'm interested in your thoughts on this change, now that it's at an almost-complete stage. I still need to fix the coverage (not sure why it changed like that) and add back docs uploading. However, before I do what will undoubtedly be 10% of the work and 90% of the time, I'd like to have a discussion about whether this is a worthwhile direction to pursue any further for the main repo.

As a side note, I think using GH actions is even more valuable for the conda recipe, since it allows many more builds in parallel than Appveyor for the Windows packages, so I'm definitely planning to pursue using GH actions over there unless there are strong objections.

@ischoegl
Copy link
Member

@bryanwweber ... not sure that I can provide much meaningful input, outside of commenting that this looks useful. I'm doing most of my own CI over at GitLab, and my (private) projects aren't nearly as large (just running unit tests and building sphinx); I am using their CI/CD pipelines though.

Perhaps the only comment I have is that I have faced issues that forced me to build cantera on an older ubuntu version before (I'm usually using 18.04, but had to switch to 16.04). While the issue was caused by Python 3.5, it is conceivable that dependencies outside of Python could cause trouble. So checking oldest supported OS + oldest python (to ensure compatibility) and newest supported OS + newest python (to catch deprecations) may be an easy tweak here. It's obviously not efficient to check all permutations (CI minutes would go overboard). Creating runners for checking of examples and sphinx documentation are other topics, but they're likely better handled separately.

@bryanwweber
Copy link
Member Author

@ischoegl Thanks for the feedback! Whether or not this seems useful is what I was looking for 😉 To your point about oldest/newest dependencies, there is a check running on Ubuntu 16.04 which builds Cantera for Python 3.5 (also using the built-in Python 2 to run SCons). The point is well taken though about not blowing up the test matrix unnecessarily. Thank you!

Copy link
Member

@speth speth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are enough advantages to GitHub actions to make it worth migrating from Travis and AppVeyor. I like the idea of using the self-hosted runners, and I think if we use some combination of running within dedicated VMs and/or Docker containers it wouldn't present much risk.

With the resolution of PR #803, all of the commits except for the last one (Switch CI to GitHub Actions) are already merged, aren't they? Can you rebase this to acknowledge that?

One question I have is whether we can shorten the case names enough to actually see more of the result message than "Su...". For starters, can the top-level title just be renamed "CI" instead of "Continuous Integration"?

I don't quite understand what's happening with Codecov, but I definitely would like to get that resolved. It seems to be reporting almost all lines of code that are actually covered by the test suite as "partially covered" which usually only shows up if the code only follows one branch of a conditional, and makes no sense for lines that aren't branch points. Maybe this is something to do with the use of gcovr instead of plain old gcov?

Running on all Python versions on all OS variants does seems like it might be a bit excessive. I know we're not at our limit of simultaneous runs yet, but I'd rather use some of that allotment to test things like additional versions of other dependencies like yaml-cpp, libfmt, and Eigen.

.github/workflows/main.yml Show resolved Hide resolved
@bryanwweber
Copy link
Member Author

@speth

With the resolution of PR #803, all of the commits except for the last one (Switch CI to GitHub Actions) are already merged, aren't they? Can you rebase this to acknowledge that?

Done.

One question I have is whether we can shorten the case names enough to actually see more of the result message than "Su...". For starters, can the top-level title just be renamed "CI" instead of "Continuous Integration"?

Done with the starters. I can see the full "Successful" status message on my 1920x1080 monitor even with the full "Continuous Integration" 😜

Maybe this is something to do with the use of gcovr instead of plain old gcov?

I was using gcovr to be able to use the Codecov action to upload. I'll try using the old "pipe to bash" method and see if it works.

Running on all Python versions on all OS variants does seems like it might be a bit excessive. I know we're not at our limit of simultaneous runs yet, but I'd rather use some of that allotment to test things like additional versions of other dependencies like yaml-cpp, libfmt, and Eigen.

I'd prefer to leave adding those other tests and/or cleaning up the tests that are here for another pull request, if that's alright with you.

@ischoegl
Copy link
Member

[...] I can see the full "Successful" status message on my 1920x1080 monitor even with the full "Continuous Integration"

Don't see it on either iPad nor Firefox @ Fedora 32 with a 3840x2160 panel (GitHub only uses a fraction of horizontal space, at least if you're using defaults) ... so 👍 on shortening of titles

@bryanwweber
Copy link
Member Author

bryanwweber commented Jun 11, 2020

Ok coverage is now at -2%, so that seems much more manageable to troubleshoot. I wonder if it has to do with using the wrong base commit since I've rebased this branch so many times? Will try to investigate...

@speth
Copy link
Member

speth commented Jun 12, 2020

Well, there are definitely some oddities in the claimed change in coverage here, but what it reports as the coverage for the current commit seems sane. One difference I'm seeing is that this build runs the "samples", but those are built without coverage information and so don't get counted as having any coverage.

I would be inclined to not run the samples when collecting coverage data, since I'm more interested in getting a coverage report for the test suite, and what code happens to get hit by the samples doesn't really matter as much.

@speth
Copy link
Member

speth commented Jun 12, 2020

I'd prefer to leave adding those other tests and/or cleaning up the tests that are here for another pull request, if that's alright with you.

Yes, we can certainly continue adding more tests after merging this.

@bryanwweber
Copy link
Member Author

Removing the samples has taken the coverage to -1%, so that's good. I'll work on the docs uploading ASAP

@ischoegl
Copy link
Member

ischoegl commented Jun 21, 2020

@bryanwweber ... as far as I can tell, your current CI update will build C++/Fortran samples, which is great (however, I do not think that they are run, despite the names of your CI actions). Just building examples wouldn't necessarily catch issues (see #774).

Given the numerous broken Python examples I found, I believe it may make sense to actually run examples during CI (although I'm not sure that this should be part of this PR). See #821, #872, #873, #875, and #877 (one additional example, onedim/ion_burner_flame.py shows extremely poor convergence, where I'm unsure that it will actually run through). These issues are just for the onedim and reactor folders, but there may be others.

PS: The same rationale holds for jupyter notebooks which are currently not distributed in the main repo but managed separately (i.e. Cantera /cantera-jupyter). It is easily conceivable to test them if they are included as a git submodule (see thoughts in Cantera/enhancements#51).

@bryanwweber
Copy link
Member Author

@ischoegl

as far as I can tell, your current CI update will build C++/Fortran samples,

Yeah, this is correct. This replicates the existing behavior of the Travis CI scripts. Unfortunately, since the programs that are built are not necessarily the same name as the folder they're in, it's a little tricky to actually run the samples, and I gave up when I tried adding it to the Travis builds.

I believe it may make sense to actually run examples during CI (although I'm not sure that this should be part of this PR)

Yes, this is 100% the intention (along with the Jupyter examples) but as you note is out of scope for this PR.

run: python3 `which scons` build f90_interface=n python_package='full' -j2
- name: Test Cantera
run: python3 `which scons` test
- name: Run Samples
Copy link
Member

@ischoegl ischoegl Jun 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment ... this is ‘Build samples’, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I fixed this

@ischoegl
Copy link
Member

I believe it may make sense to actually run examples during CI (although I'm not sure that this should be part of this PR)

Yes, this is 100% the intention (along with the Jupyter examples) but as you note is out of scope for this PR.

Excellent!

GitHub Actions offers a number of benefits over TravisCI and Appveyor.
Related to Cantera/enhancements#37
Remove samples from coverage build. The tests are what provide the
coverage of the code.
The KaTeX version is no longer configurable in the sphinxcontrib-katex
extension. There is now a Python script in the website repository
api-docs folder that can bump the KaTeX version in the built
documentation to match the version supplied by sphinxcontrib-katex.
bryanwweber and others added 6 commits June 26, 2020 16:28
Uses the SSH action to add the deploy account SSH key to the
configuration and add the Cantera server to the known_hosts.
The shared library should be linked with the Fortran linker, not the C++
linker. On macOS, the C++ stdlib also has to be linked to the shared
library to resolve all the symbols at link time.
@bryanwweber
Copy link
Member Author

@speth this is done from my end now 👍

@speth speth merged commit f4da0bd into master Jun 26, 2020
@speth speth deleted the add-github-workflows branch June 26, 2020 22:00
@ischoegl
Copy link
Member

There still may be a left-over glitch, see #882

bryanwweber added a commit to Cantera/conda-recipes that referenced this pull request Jul 1, 2020
For many of the same reasons as the change at the main repo (see
Cantera/cantera#775), this commit changes the CI to use GitHub actions
instead of Travis CI and Appveyor.
bryanwweber added a commit to Cantera/conda-recipes that referenced this pull request Jul 1, 2020
For many of the same reasons as the change at the main repo (see
Cantera/cantera#775), this commit changes the CI to use GitHub actions
instead of Travis CI and Appveyor.
bryanwweber added a commit to Cantera/conda-recipes that referenced this pull request Jul 1, 2020
For many of the same reasons as the change at the main repo (see
Cantera/cantera#775), this commit changes the CI to use GitHub actions
instead of Travis CI and Appveyor.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants