Stabilize simulations to avoid differences between CI and local runs #2007

bennibolm · 2024-07-09T11:03:30Z

There were always problems with failing tests (mostly the subcell blast waves simulations) with different setups (e.g. local, CI and also different julia versions 1.9 vs. 1.10 vs. 1.11).
Moreover, simulations on macOS were completely different to windows and Linux (see trixi-framework/Trixi2Vtk.jl#67 (comment)).

I realized that most of the time it's the same elixirs with local limiting (see trixi-framework/Trixi2Vtk.jl#67 (comment)).
Looking into those simulations showed me that in these simulations there are also deviations from the calculated subcell bounds.
I adapted parameters to hopefully stabilize the simulations and hope that this fixes also the different results with different setups.

Updates are following...

github-actions · 2024-07-09T11:03:47Z

Review checklist

This checklist is meant to assist creators of PRs (to let them know what reviewers will typically look for) and reviewers (to guide them in a structured review process). Items do not need to be checked explicitly for a PR to be eligible for merging.

Purpose and scope

The PR has a single goal that is clear from the PR title and/or description.
All code changes represent a single set of modifications that logically belong together.
No more than 500 lines of code are changed or there is no obvious way to split the PR into multiple PRs.

Code quality

The code can be understood easily.
Newly introduced names for variables etc. are self-descriptive and consistent with existing naming conventions.
There are no redundancies that can be removed by simple modularization/refactoring.
There are no leftover debug statements or commented code sections.
The code adheres to our conventions and style guide, and to the Julia guidelines.

Documentation

New functions and types are documented with a docstring or top-level comment.
Relevant publications are referenced in docstrings (see example for formatting).
Inline comments are used to document longer or unusual code sections.
Comments describe intent ("why?") and not just functionality ("what?").
If the PR introduces a significant change or new feature, it is documented in NEWS.md with its PR number.

Testing

The PR passes all tests.
New or modified lines of code are covered by tests.
New or modified tests run in less then 10 seconds.

Performance

There are no type instabilities or memory allocations in performance-critical parts.
If the PR intent is to improve performance, before/after time measurements are posted in the PR.

Verification

The correctness of the code was verified using appropriate tests.
If new equations/methods are added, a convergence test has been run and the results
are posted in the PR.

Created with ❤️ by the Trixi.jl community.

codecov · 2024-07-09T11:37:26Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.32%. Comparing base (cde00a8) to head (4b24b74).
Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2007      +/-   ##
==========================================
- Coverage   96.32%   96.32%   -0.00%     
==========================================
  Files         470      470              
  Lines       37485    37483       -2     
==========================================
- Hits        36106    36104       -2     
  Misses       1379     1379

Flag	Coverage Δ
unittests	`96.32% <ø> (-<0.01%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

bennibolm · 2024-07-09T12:23:27Z

After c1d0c5e, which used the locally calculated errors, only the sedov blast elixir fails on CI. With 1.11, all tests work locally.

…f tests

bennibolm · 2024-07-10T08:30:51Z

In c47e09a, I added CI tests on macOS, windows and with julia1.11.
Open problems:

elixir_euler_sedov_blast_wave_sc_subcell.jl, which was initialized with numbers from local run (and works on roci), fails everywhere (linux, macos, windows) within the CI run using julia 1.10 (with errors of about 1e-5). The CI run with julia 1.11 on linux works 😮
Ideas:
Just pass the Linux CI numbers and look if the tests work for macOS and windows. But they will probably fail with julia 1.11 and need to be adapted again in the future.
X, Adding positivity limiting of rho. Maybe it stabilizes the simulation further. Although, it should be stable enough with deviations of only 3.55e-15 for entropy --> still failing 😞 (see here)
elixir_eulermulti_shock_bubble_shockcapturing_subcell_minmax.jl, structured/elixir_euler_sedov_blast_wave_sc_subcell.jl (local bounds) are failing on macOS with large differences partly (e.g. here).
Even for this run on macOS, there are no large subcell bounds deviations which indicate instability (~1e-17) for these simulations.
Good point. The instability(?)/differences for elixir_euler_blast_wave_sc_subcell_nonperiodic.jl was removed in this PR. Therefore, the test in Trixi2Vtk should run through 💪
Not so good point. Still large differences between different machines.

…tests

bennibolm · 2024-07-15T10:09:53Z

In c3e6469, added tests which use the pure FV version of the code. These tests are working throughout all machines. E.g. here.

bennibolm · 2024-07-15T13:10:07Z

Removing the use of @muladd in subcell_limiters_2d.jl does not change the result. See d5ec1f6.

bennibolm · 2024-07-15T15:16:14Z

I checked if the maximum number of iterations (maxIterations) is reached for the newton method. Elixir elixir_euler_sedov_blast_wave_sc_subcell. All of these setups have different results within the CI run. I will add the results here (each tspan=(0.0, 1.0), initial_refinement_level=4, `CFL=0.4):

sedov blast, tols=(1.0e-13, 1.0e-15), max_iterations=60: maxIterations not reached; MaxDeviation (4.4e-15 Entr)
sedov blast, tols=(1.0e-13, 1.0e-14), max_iterations=60: maxIterations not reached; MaxDeviation (3.3e-14 Entr)
sedov blast, tols=(1.0e-13, 1.0e-15), max_iterations=50: maxIterations reached (2 times); MaxDeviation (5.3e-15 Entr)
sedov blast, tols=(1.0e-13, 1.0e-15), max_iterations=30: maxIterations reached (3 times); MaxDeviation (3.5e-15 Entr)
sedov blast, tols=(1.0e-13, 1.0e-14), max_iterations=30: maxIterations not reached; MaxDeviation (3.3e-14 Entr)

And CFL test (Not tested if CI test fails)

sedov blast, tols=(1.0e-13, 1.0e-15), max_iterations=60, CFL=0.04: maxIterations reached (2x); MaxDeviation (1.0e-14 Entr)
sedov blast, tols=(1.0e-16, 1.0e-16), max_iterations=600, CFL=0.04: maxIterations reached (Often); MaxDeviation (3.5e-15 Entr)

Why are there still deviations at all?

bennibolm · 2024-07-17T14:49:34Z

I checked the results of

trixi_include("examples/structured_2d_dgsem/elixir_euler_sedov_blast_wave_sc_subcell.jl", tspan=(0.0, 0.5), max_iterations_newton=40, newton_tolerances=(1.0e-13, 1.0e-15), cfl=0.6, interval=5)

on a local macOS machine (arm64-apple-darwin22.4.0 with an 10 × Apple M2 Pro CPU). The results are identical (up to the 8th digit) to my local results (ubuntu) and the ubuntu CI results. And therefore different to the macOS CI results.
That's a good sign - I think.

bennibolm · 2024-07-22T08:19:36Z

Because I received the "correct"(=same as on ubuntu) results on a local macOS machine (see comment) for the structured sedov blast, I added a macOS test using aarch64 instead of x64 in ed77e0f.

The tree_part2 job just didn't at first (This seems to happen elsewhere as well (see here)).
Then, it started and ran through (same expected failing tests as for ubuntu, windows and locally 💪 ).
The same happens for the structured test case, which failed with x64 is working now.

bennibolm · 2024-07-22T11:13:11Z

So, as a conclusion

there is only 1 test case left, where the errors are different between the CI run and my local runs (differences of about 1e-5).
Of course, I can (and probably will) just adapt the numbers and use the ones from the CI. Although, it is a potential cause of failing tests in the future (e.g. when updating to julia 1.11.)
The differences between macOS and all other runs only occur on the one CI macOS infrastructure. Since we don't even test these things on macOS normally, I'd just accept it. Maybe change the infrastructure tested with in Trixi2VTk?

bennibolm · 2024-08-19T08:44:33Z

@sloede In my opinion, this process is finished.
As I wrote above, it's only 1 test (elixir_euler_sedov_blast_wave_sc_subcell) left, which seems to be susceptible to small errors. There are different results on my local run and on the CI. Adapting parameters didn't fix those differences. I will just accept those and add the CI numbers to the tests, if that's okay with you.

The differences for the macOS CI runs (for elixir_euler_sedov_blast_wave_sc_subcell.jl (local bounds) and elixir_eulermulti_shock_bubble_shockcapturing_subcell_minmax.jl) only appear for the x64 infrastructure and doesn't appear for e.g. aarch64 and a local macOS machine.
Since those elixirs are not even tested in the Trixi2Vtk PR, which was the reason I started this investigation at first, those tests should run through now.
However, it is still possible to change the infrastructure there if wanted.

I still have to remove many tests from the CI runs and do some clean up, so the PR is still in draft mode. Anyway, getting a review from you would still help a lot @sloede. Thank you!

sloede

Many tests are still disabled and should be re-enabled before merging, but the changes to the elixirs seem to be reasonable.

However, does this also mean that the subcell limiting methods as implemented here is not unconditionally safe and that one has to choose "correct" parameters? Or was this already known and just surfaced here in an unsuspecting way?

bennibolm · 2024-08-28T08:21:27Z

Many tests are still disabled and should be re-enabled before merging, but the changes to the elixirs seem to be reasonable.

Yes of course, I just wanted to keep it with all the different variants until it decided which one I will keep

However, does this also mean that the subcell limiting methods as implemented here is not unconditionally safe and that one has to choose "correct" parameters? Or was this already known and just surfaced here in an unsuspecting way?

Since we use an iterative method to find a stable amount of limiting for nonlinear variables, you can say that, yes.
But it's like with the cfl number. From some point, all parameters are "correct".
To find this point, we implemented the local bounds checking routines, where you can see whether a limiting is working as expected.
Or what do you say @amrueda?

When one only has to look at the bounds checking to see if the limiting works properly, why did we have these issues here? This is basically just because I didn't pay too much attention to this before. And of course, when changing setups within the tests, this has to be checked again.

amrueda · 2024-08-28T13:52:31Z

However, does this also mean that the subcell limiting methods as implemented here is not unconditionally safe and that one has to choose "correct" parameters? Or was this already known and just surfaced here in an unsuspecting way?

Since we use an iterative method to find a stable amount of limiting for nonlinear variables, you can say that, yes.
But it's like with the cfl number. From some point, all parameters are "correct".
To find this point, we implemented the local bounds checking routines, where you can see whether a limiting is working as expected.
Or what do you say @amrueda?

I agree. It seems that the major issue was the choice of the Newton method parameters and the CFL. The subcell-limiting methods are safe when the CFL is below a threshold and when the bounds are selected properly for the equation being solved. For instance, positivity of density and pressure is "safe" for the Euler equations. In the case of non-linear constraints, such as pressure, the subcell-limited method converges to the right/safe scheme only if the non-linear solver manages to converge (related to the Newton parameters).

sloede · 2024-08-29T04:30:03Z

OK, thanks for the clarification! If not already in there, I think it would be good if you could add such an (extended) description to the docs for future reference.

bennibolm · 2024-08-29T12:56:26Z

Again, I tested different setups for the sedov blast elixir with TreeMesh:

where the the numbers are (max_iterations_newton and newton_tolerances.

Since all the test seems to have different results in the CI run and on my local maschine, I decided for the standard parameters from the elixir (60, 1.0e-13, 1.0e-15).

Therefore, I will reset all temporary changes to make this PR mergeable.

bennibolm · 2024-08-30T10:14:26Z

So, a final conclusion before I remove all additional tests.
The simulations with local limiting are expectedly more susceptible to small differences. Especially, the elixirs elixir_eulermulti_shock_bubble_shockcapturing_subcell_minmax.jl, basically all elixir_euler_sedov_blast_wave_sc_subcell.jl and sometimes also the elixir_euler_blast_wave_sc_subcell_nonperiodic.jl.

All these elixirs are susceptible to differences with local limiting (bounds checking functionality only shows errors in machine precision). That includes failing CI tests for macOS, x64, but also possible differences between runs on different systems/architectures/Julia versions etc.

sloede

Nearly done 👍 Just one small comment

examples/tree_2d_dgsem/elixir_euler_sedov_blast_wave_sc_subcell.jl

sloede

LGTM!

Adapt parameters in elixirs to stabilize simulations

c1d0c5e

bennibolm added 6 commits July 9, 2024 14:24

Adapt sedov blast parameters

a3d207e

Add global limiting of pressure

6b3c7f4

Stabilize again

16abb93

Add ci testing on macOS, Windows and with julia 1.11; Reduce amount o…

600bd0c

…f tests

Adapt ci file

190095c

Fix ci file (hopefully)

c47e09a

bennibolm added 4 commits July 10, 2024 10:44

Add global limiting to sedov blast test; Comment out some more other …

2603b0c

…tests

Test setups

dd80ff7

Adapt structured sedov elixir to decrease deviations

0cea55d

Add pure FV tests for the 3 difficult test cases

c3e6469

bennibolm added 2 commits July 15, 2024 13:53

Test different computation of Qs

dd80abd

Remove muladd from limiter files

d5ec1f6

Add macOS CI runs on different architecture

ed77e0f

bennibolm closed this Jul 19, 2024

bennibolm reopened this Jul 19, 2024

Merge branch 'main' into bb/stable-subcell-tests

7fb3e37

bennibolm requested a review from sloede August 19, 2024 08:45

Merge branch 'main' into bb/stable-subcell-tests

d43a885

Restore old version

a9a2a6b

sloede reviewed Aug 28, 2024

View reviewed changes

bennibolm added 2 commits August 29, 2024 14:11

Extend explanation about parameters of newton method in the docs

063569e

Adapt test setups to find most useful

b702b40

Remove multiple temporary tests

f228806

bennibolm closed this Aug 29, 2024

bennibolm reopened this Aug 29, 2024

bennibolm added 2 commits August 29, 2024 15:59

Activate all tests within files

dc3722d

Remove two more pure fv tests

fa35527

Activate all disabled tests

25b77d2

bennibolm marked this pull request as ready for review August 30, 2024 11:47

Merge branch 'main' into bb/stable-subcell-tests

36ee2c6

bennibolm requested a review from sloede August 30, 2024 12:07

sloede reviewed Sep 5, 2024

View reviewed changes

examples/tree_2d_dgsem/elixir_euler_sedov_blast_wave_sc_subcell.jl Show resolved Hide resolved

Merge branch 'main' into bb/stable-subcell-tests

9020783

bennibolm requested a review from sloede September 10, 2024 08:35

bennibolm and others added 2 commits September 17, 2024 12:19

Merge branch 'main' into bb/stable-subcell-tests

b1ab650

Merge branch 'main' into bb/stable-subcell-tests

4b24b74

sloede enabled auto-merge (squash) September 20, 2024 04:33

sloede approved these changes Sep 20, 2024

View reviewed changes

sloede merged commit 1d410ca into main Sep 20, 2024
38 checks passed

sloede deleted the bb/stable-subcell-tests branch September 20, 2024 04:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stabilize simulations to avoid differences between CI and local runs #2007

Stabilize simulations to avoid differences between CI and local runs #2007

bennibolm commented Jul 9, 2024 •

edited

Loading

github-actions bot commented Jul 9, 2024

codecov bot commented Jul 9, 2024 •

edited

Loading

bennibolm commented Jul 9, 2024

bennibolm commented Jul 10, 2024 •

edited

Loading

bennibolm commented Jul 15, 2024

bennibolm commented Jul 15, 2024

bennibolm commented Jul 15, 2024 •

edited

Loading

bennibolm commented Jul 17, 2024 •

edited

Loading

bennibolm commented Jul 22, 2024 •

edited

Loading

bennibolm commented Jul 22, 2024 •

edited

Loading

bennibolm commented Aug 19, 2024 •

edited

Loading

sloede left a comment

bennibolm commented Aug 28, 2024

amrueda commented Aug 28, 2024

sloede commented Aug 29, 2024

bennibolm commented Aug 29, 2024

bennibolm commented Aug 30, 2024 •

edited

Loading

sloede left a comment

sloede left a comment

Stabilize simulations to avoid differences between CI and local runs #2007

Stabilize simulations to avoid differences between CI and local runs #2007

Conversation

bennibolm commented Jul 9, 2024 • edited Loading

github-actions bot commented Jul 9, 2024

Review checklist

Purpose and scope

Code quality

Documentation

Testing

Performance

Verification

codecov bot commented Jul 9, 2024 • edited Loading

Codecov Report

bennibolm commented Jul 9, 2024

bennibolm commented Jul 10, 2024 • edited Loading

bennibolm commented Jul 15, 2024

bennibolm commented Jul 15, 2024

bennibolm commented Jul 15, 2024 • edited Loading

bennibolm commented Jul 17, 2024 • edited Loading

bennibolm commented Jul 22, 2024 • edited Loading

bennibolm commented Jul 22, 2024 • edited Loading

bennibolm commented Aug 19, 2024 • edited Loading

sloede left a comment

Choose a reason for hiding this comment

bennibolm commented Aug 28, 2024

amrueda commented Aug 28, 2024

sloede commented Aug 29, 2024

bennibolm commented Aug 29, 2024

bennibolm commented Aug 30, 2024 • edited Loading

sloede left a comment

Choose a reason for hiding this comment

sloede left a comment

Choose a reason for hiding this comment

bennibolm commented Jul 9, 2024 •

edited

Loading

codecov bot commented Jul 9, 2024 •

edited

Loading

bennibolm commented Jul 10, 2024 •

edited

Loading

bennibolm commented Jul 15, 2024 •

edited

Loading

bennibolm commented Jul 17, 2024 •

edited

Loading

bennibolm commented Jul 22, 2024 •

edited

Loading

bennibolm commented Jul 22, 2024 •

edited

Loading

bennibolm commented Aug 19, 2024 •

edited

Loading

bennibolm commented Aug 30, 2024 •

edited

Loading