Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stabilize simulations to avoid differences between CI and local runs #2007

Merged
merged 27 commits into from
Sep 20, 2024

Conversation

bennibolm
Copy link
Contributor

@bennibolm bennibolm commented Jul 9, 2024

There were always problems with failing tests (mostly the subcell blast waves simulations) with different setups (e.g. local, CI and also different julia versions 1.9 vs. 1.10 vs. 1.11).
Moreover, simulations on macOS were completely different to windows and Linux (see trixi-framework/Trixi2Vtk.jl#67 (comment)).

I realized that most of the time it's the same elixirs with local limiting (see trixi-framework/Trixi2Vtk.jl#67 (comment)).
Looking into those simulations showed me that in these simulations there are also deviations from the calculated subcell bounds.
I adapted parameters to hopefully stabilize the simulations and hope that this fixes also the different results with different setups.

Updates are following...

Copy link
Contributor

github-actions bot commented Jul 9, 2024

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

  • The PR has a single goal that is clear from the PR title and/or description.
  • All code changes represent a single set of modifications that logically belong together.
  • No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

  • The code can be understood easily.
  • Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
  • There are no redundancies that can be removed by simple modularization/refactoring.
  • There are no leftover debug statements or commented code sections.
  • The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

  • New functions and types are documented with a docstring or top-level comment.
  • Relevant publications are referenced in docstrings (see example for formatting).
  • Inline comments are used to document longer or unusual code sections.
  • Comments describe intent ("why?") and not just functionality ("what?").
  • If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

  • The PR passes all tests.
  • New or modified lines of code are covered by tests.
  • New or modified tests run in less then 10 seconds.

Performance

  • There are no type instabilities or memory allocations in performance-critical parts.
  • If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

  • The correctness of the code was verified using appropriate tests.
  • If new equations/methods are added, a convergence test has been run and the results
    are posted in the PR.

Created with ❤️ by the Trixi.jl community.

Copy link

codecov bot commented Jul 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.32%. Comparing base (cde00a8) to head (4b24b74).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2007      +/-   ##
==========================================
- Coverage   96.32%   96.32%   -0.00%     
==========================================
  Files         470      470              
  Lines       37485    37483       -2     
==========================================
- Hits        36106    36104       -2     
  Misses       1379     1379              
Flag Coverage Δ
unittests 96.32% <ø> (-<0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@bennibolm
Copy link
Contributor Author

After c1d0c5e, which used the locally calculated errors, only the sedov blast elixir fails on CI. With 1.11, all tests work locally.

@bennibolm
Copy link
Contributor Author

bennibolm commented Jul 10, 2024

In c47e09a, I added CI tests on macOS, windows and with julia1.11.
Open problems:

  • elixir_euler_sedov_blast_wave_sc_subcell.jl, which was initialized with numbers from local run (and works on roci), fails everywhere (linux, macos, windows) within the CI run using julia 1.10 (with errors of about 1e-5). The CI run with julia 1.11 on linux works 😮
    Ideas:

  • Just pass the Linux CI numbers and look if the tests work for macOS and windows. But they will probably fail with julia 1.11 and need to be adapted again in the future.

  • X, Adding positivity limiting of rho. Maybe it stabilizes the simulation further. Although, it should be stable enough with deviations of only 3.55e-15 for entropy --> still failing 😞 (see here)

  • elixir_eulermulti_shock_bubble_shockcapturing_subcell_minmax.jl, structured/elixir_euler_sedov_blast_wave_sc_subcell.jl (local bounds) are failing on macOS with large differences partly (e.g. here).
    Even for this run on macOS, there are no large subcell bounds deviations which indicate instability (~1e-17) for these simulations.

  • Good point. The instability(?)/differences for elixir_euler_blast_wave_sc_subcell_nonperiodic.jl was removed in this PR. Therefore, the test in Trixi2Vtk should run through 💪

  • Not so good point. Still large differences between different machines.

@bennibolm
Copy link
Contributor Author

In c3e6469, added tests which use the pure FV version of the code. These tests are working throughout all machines. E.g. here.

@bennibolm
Copy link
Contributor Author

Removing the use of @muladd in subcell_limiters_2d.jl does not change the result. See d5ec1f6.

@bennibolm
Copy link
Contributor Author

bennibolm commented Jul 15, 2024

I checked if the maximum number of iterations (maxIterations) is reached for the newton method. Elixir elixir_euler_sedov_blast_wave_sc_subcell. All of these setups have different results within the CI run. I will add the results here (each tspan=(0.0, 1.0), initial_refinement_level=4, `CFL=0.4):

  • sedov blast, tols=(1.0e-13, 1.0e-15), max_iterations=60: maxIterations not reached; MaxDeviation (4.4e-15 Entr)
  • sedov blast, tols=(1.0e-13, 1.0e-14), max_iterations=60: maxIterations not reached; MaxDeviation (3.3e-14 Entr)
  • sedov blast, tols=(1.0e-13, 1.0e-15), max_iterations=50: maxIterations reached (2 times); MaxDeviation (5.3e-15 Entr)
  • sedov blast, tols=(1.0e-13, 1.0e-15), max_iterations=30: maxIterations reached (3 times); MaxDeviation (3.5e-15 Entr)
  • sedov blast, tols=(1.0e-13, 1.0e-14), max_iterations=30: maxIterations not reached; MaxDeviation (3.3e-14 Entr)

And CFL test (Not tested if CI test fails)

  • sedov blast, tols=(1.0e-13, 1.0e-15), max_iterations=60, CFL=0.04: maxIterations reached (2x); MaxDeviation (1.0e-14 Entr)
  • sedov blast, tols=(1.0e-16, 1.0e-16), max_iterations=600, CFL=0.04: maxIterations reached (Often); MaxDeviation (3.5e-15 Entr)

Why are there still deviations at all?

@bennibolm
Copy link
Contributor Author

bennibolm commented Jul 17, 2024

I checked the results of

trixi_include("examples/structured_2d_dgsem/elixir_euler_sedov_blast_wave_sc_subcell.jl", tspan=(0.0, 0.5), max_iterations_newton=40, newton_tolerances=(1.0e-13, 1.0e-15), cfl=0.6, interval=5)

on a local macOS machine (arm64-apple-darwin22.4.0 with an 10 × Apple M2 Pro CPU). The results are identical (up to the 8th digit) to my local results (ubuntu) and the ubuntu CI results. And therefore different to the macOS CI results.
That's a good sign - I think.

@bennibolm bennibolm closed this Jul 19, 2024
@bennibolm bennibolm reopened this Jul 19, 2024
@bennibolm
Copy link
Contributor Author

bennibolm commented Jul 22, 2024

Because I received the "correct"(=same as on ubuntu) results on a local macOS machine (see comment) for the structured sedov blast, I added a macOS test using aarch64 instead of x64 in ed77e0f.

The tree_part2 job just didn't at first (This seems to happen elsewhere as well (see here)).
Then, it started and ran through (same expected failing tests as for ubuntu, windows and locally 💪 ).
The same happens for the structured test case, which failed with x64 is working now.

@bennibolm
Copy link
Contributor Author

bennibolm commented Jul 22, 2024

So, as a conclusion

  • there is only 1 test case left, where the errors are different between the CI run and my local runs (differences of about 1e-5).
    Of course, I can (and probably will) just adapt the numbers and use the ones from the CI. Although, it is a potential cause of failing tests in the future (e.g. when updating to julia 1.11.)

  • The differences between macOS and all other runs only occur on the one CI macOS infrastructure. Since we don't even test these things on macOS normally, I'd just accept it. Maybe change the infrastructure tested with in Trixi2VTk?

@bennibolm
Copy link
Contributor Author

bennibolm commented Aug 19, 2024

@sloede In my opinion, this process is finished.
As I wrote above, it's only 1 test (elixir_euler_sedov_blast_wave_sc_subcell) left, which seems to be susceptible to small errors. There are different results on my local run and on the CI. Adapting parameters didn't fix those differences. I will just accept those and add the CI numbers to the tests, if that's okay with you.

  1. The differences for the macOS CI runs (for elixir_euler_sedov_blast_wave_sc_subcell.jl (local bounds) and elixir_eulermulti_shock_bubble_shockcapturing_subcell_minmax.jl) only appear for the x64 infrastructure and doesn't appear for e.g. aarch64 and a local macOS machine.
    Since those elixirs are not even tested in the Trixi2Vtk PR, which was the reason I started this investigation at first, those tests should run through now.
    However, it is still possible to change the infrastructure there if wanted.

I still have to remove many tests from the CI runs and do some clean up, so the PR is still in draft mode. Anyway, getting a review from you would still help a lot @sloede. Thank you!

@bennibolm bennibolm requested a review from sloede August 19, 2024 08:45
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many tests are still disabled and should be re-enabled before merging, but the changes to the elixirs seem to be reasonable.

However, does this also mean that the subcell limiting methods as implemented here is not unconditionally safe and that one has to choose "correct" parameters? Or was this already known and just surfaced here in an unsuspecting way?

@bennibolm
Copy link
Contributor Author

Many tests are still disabled and should be re-enabled before merging, but the changes to the elixirs seem to be reasonable.

Yes of course, I just wanted to keep it with all the different variants until it decided which one I will keep

However, does this also mean that the subcell limiting methods as implemented here is not unconditionally safe and that one has to choose "correct" parameters? Or was this already known and just surfaced here in an unsuspecting way?

Since we use an iterative method to find a stable amount of limiting for nonlinear variables, you can say that, yes.
But it's like with the cfl number. From some point, all parameters are "correct".
To find this point, we implemented the local bounds checking routines, where you can see whether a limiting is working as expected.
Or what do you say @amrueda?

When one only has to look at the bounds checking to see if the limiting works properly, why did we have these issues here? This is basically just because I didn't pay too much attention to this before. And of course, when changing setups within the tests, this has to be checked again.

@amrueda
Copy link
Contributor

amrueda commented Aug 28, 2024

However, does this also mean that the subcell limiting methods as implemented here is not unconditionally safe and that one has to choose "correct" parameters? Or was this already known and just surfaced here in an unsuspecting way?

Since we use an iterative method to find a stable amount of limiting for nonlinear variables, you can say that, yes.
But it's like with the cfl number. From some point, all parameters are "correct".
To find this point, we implemented the local bounds checking routines, where you can see whether a limiting is working as expected.
Or what do you say @amrueda?

I agree. It seems that the major issue was the choice of the Newton method parameters and the CFL. The subcell-limiting methods are safe when the CFL is below a threshold and when the bounds are selected properly for the equation being solved. For instance, positivity of density and pressure is "safe" for the Euler equations. In the case of non-linear constraints, such as pressure, the subcell-limited method converges to the right/safe scheme only if the non-linear solver manages to converge (related to the Newton parameters).

@sloede
Copy link
Member

sloede commented Aug 29, 2024

OK, thanks for the clarification! If not already in there, I think it would be good if you could add such an (extended) description to the docs for future reference.

@bennibolm
Copy link
Contributor Author

Again, I tested different setups for the sedov blast elixir with TreeMesh:
Screenshot from 2024-08-29 14-53-00
where the the numbers are (max_iterations_newton and newton_tolerances.

Since all the test seems to have different results in the CI run and on my local maschine, I decided for the standard parameters from the elixir (60, 1.0e-13, 1.0e-15).

Therefore, I will reset all temporary changes to make this PR mergeable.

@bennibolm bennibolm closed this Aug 29, 2024
@bennibolm bennibolm reopened this Aug 29, 2024
@bennibolm
Copy link
Contributor Author

bennibolm commented Aug 30, 2024

So, a final conclusion before I remove all additional tests.
The simulations with local limiting are expectedly more susceptible to small differences. Especially, the elixirs elixir_eulermulti_shock_bubble_shockcapturing_subcell_minmax.jl, basically all elixir_euler_sedov_blast_wave_sc_subcell.jl and sometimes also the elixir_euler_blast_wave_sc_subcell_nonperiodic.jl.

All these elixirs are susceptible to differences with local limiting (bounds checking functionality only shows errors in machine precision). That includes failing CI tests for macOS, x64, but also possible differences between runs on different systems/architectures/Julia versions etc.

@bennibolm bennibolm marked this pull request as ready for review August 30, 2024 11:47
@bennibolm bennibolm requested a review from sloede August 30, 2024 12:07
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nearly done 👍 Just one small comment

@sloede sloede enabled auto-merge (squash) September 20, 2024 04:33
Copy link
Member

@sloede sloede left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sloede sloede merged commit 1d410ca into main Sep 20, 2024
38 checks passed
@sloede sloede deleted the bb/stable-subcell-tests branch September 20, 2024 04:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants