Implement Cholesky for covariance handling #408

veni-vidi-vici-dormivi · 2024-02-28T15:40:10Z

Closes #402 and #168

codecov · 2024-02-28T16:58:16Z

Codecov Report

Attention: Patch coverage is 90.90909% with 1 lines in your changes are missing coverage. Please review.

Project coverage is 88.01%. Comparing base (9b0b76b) to head (c5fb678).
Report is 17 commits behind head on main.

❗ Current head c5fb678 differs from pull request most recent head a486a8e. Consider uploading reports for the commit a486a8e to get more accurate results

Files	Patch %	Lines
mesmer/stats/_auto_regression.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #408      +/-   ##
==========================================
+ Coverage   87.90%   88.01%   +0.10%     
==========================================
  Files          40       40              
  Lines        1745     1760      +15     
==========================================
+ Hits         1534     1549      +15     
  Misses        211      211

Flag	Coverage Δ
unittests	`88.01% <90.90%> (+0.10%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mathause · 2024-02-28T18:21:39Z

Interesting 😉 can you merge main (after #409). If you add ${{ matrix.os }} to the name on L16 we'll directly see which OS fails.

veni-vidi-vici-dormivi · 2024-03-01T09:22:50Z

I wanted to test with a new dataset but can't save a new one, see: pydata/xarray#8743 .

veni-vidi-vici-dormivi · 2024-03-01T13:49:01Z

Well now I am really confused 😅
Edit: Context - the tests on the dataset I created on MacOs failed on Ubuntu and MacOs but succeeded on windows.

veni-vidi-vici-dormivi · 2024-03-04T10:50:20Z

macOS-latest is macOS 12 and I created the dataset on macOS 14 so let's try that.

veni-vidi-vici-dormivi · 2024-03-04T10:55:55Z

Well would you look at that. The dataset from macos 14 succeeds on macOS 14 and windows latest.

mathause · 2024-03-04T11:03:12Z

Well would you look at that. The dataset from macos 14 succeeds on macOS 14 and windows latest.

Interesting that macOS-latest does not seem to be the latest macOS 🤷‍♂️

veni-vidi-vici-dormivi · 2024-03-04T11:04:42Z

Yes, but they are open about it:
From https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources:
Note: The -latest runner images are the latest stable images that GitHub provides, and might not be the most recent version of the operating system available from the operating system vendor.

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi · 2024-03-04T15:05:12Z

It bugs me that I cannot reproduce the large difference of the datasets we get between ubuntu and macOS. The test passed locally when I test the macOS dataset on atmos (openSUSE). GitHub Actions doesn't provide openSuse but one can set up self hosted runners, see here. However, this is not recommended for public repositories for obvious reasons. Do we have an ubuntu machine where we could test where things are trailing off?

Here are some observations I have made:

The test passes on Mac and atmos locally with data generated on Mac
Between atmos and Mac I find differences for all emulation steps:
- maximum differences in the global variability emulations are on the order of e-16
- maximum differences in the local trend response are on the order of e-15
- maximum differences in the local variability are on the order of e-13
- maximum differences in the resulting emulations are on the order of e-13
looking at the difference between the inverted matrices on the two systems (using numpy.linalg.inv and scipy.linalg.inv) I find maximum differences of e-14. The maximum difference between the numpy and scipy methods is also e-14.
The params of all emulation steps are equal
The fact we see small differences already for the global variability points to the fact that it is something about the random number generator as well not just the matrix inversion. Interestingly however, is that the differences in global variability (and subsequently local trend emulations) are just happening sometimes, the majority of the array is exactly equal. For the inverted matrices, the local variability emulations and the result, the whole array is slightly off.

All of this did not get me much closer to understand why the difference between the result on ubuntu and Mac is up to 0.8.

mathause · 2024-03-04T15:18:59Z

I tried some more stuff:

on cfc and on my laptop I get large differences
on argon I do not
on argon and cfc I use the same environment

I checked np.show_config(). What I saw it that argon has more simd extensions, which are maybe used for matrix inversions. But this is a wild guess at the moment.

``

AVX512F
AVX512CD
AVX512_SKX
AVX512_CLX
AVX512_CNL
AVX512_ICL

</details>

veni-vidi-vici-dormivi · 2024-03-04T15:28:59Z

Okay but that's something. Then I'll try to pin it down with output from cfc. Thanks!

mathause · 2024-03-04T16:39:36Z

In #168 (comment) I found that small differences (1e-15), can lead to one large difference for the "inverted" matrices (0.1), which lead to differences ind the samples.

Yes, using cfc can definitely get you faster feedback.

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi · 2024-03-05T16:27:25Z

Okay, saving A that is used to draw from a multivariate makes the emulations pass on different OS. However, this just shifts our problem to testing the calibration where we save A. Now this will not be the same between different OS but one could argue this is less important to the user since the calibration can always be exported.

A second issue here is that I was not actually able to reproduce the behavior of numpy.random.multivariate_normal, even though I think I copied the exact Code they use, but maybe I am wrong?

…ons" This reverts commit 5acaded.

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi · 2024-03-07T14:40:14Z

Uff sorry the commits are a mess here, actually "fix dimensions of innovations" also includes switching from the numpy multivariate normal to the scipy multivariate normal.

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi · 2024-03-08T11:13:38Z

So this looks pretty good! I see two thing to do now:

test this approach with actual emulations and see if they are plausible
also implement cholesky in the localized covariance module, but maybe in another PR? but then we would want to check again the emulations...

mathause

Very nice! Can you

rename the PR
open a issue that we re-consider the try ... except np.linalg.LinAlgError approach? (potentially require method="eigh")

test this approach with actual emulations and see if they are plausible

also implement cholesky in the localized covariance module, but maybe in another PR? but then we would want to check again the emulations...

Yes and yes. Maybe do 1. after 2.?

.github/workflows/ci-workflow.yml

mesmer/stats/_auto_regression.py

tests/integration/test_draw_realisations_from_bundle.py

mesmer/stats/_auto_regression.py

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

for more information, see https://pre-commit.ci

mathause

~~I'll assume the changelog entry is for the coming PR?~~

edit: sorry my bad, all good

veni-vidi-vici-dormivi · 2024-03-12T14:02:32Z

Happy to announce that the emulations look good and are approximately 10 times faster (for my small profiling script)! 😊😊

mathause · 2024-03-12T14:06:04Z

Hurray 😄 Did you test the training and the drawing?

veni-vidi-vici-dormivi · 2024-03-12T14:21:55Z

Did you test the training and the drawing?

Yes, so calibrating + emulating together is 10 times faster. Drawing from the autocorrelated process is 10 times faster and finding the localized covariance is 22 times faster. find_localized_empirical_covariance takes the most time in both cases.

mathause · 2024-03-12T14:31:09Z

awesome!

test if drwaing realizations is close

e437ce2

veni-vidi-vici-dormivi changed the title ~~Test if drwaing realizations is close~~ Test if drawing realizations is close Feb 28, 2024

ad mac os and windows to github testing

51f9ec5

mathause mentioned this pull request Feb 28, 2024

GHA: don't fail fast #409

Merged

veni-vidi-vici-dormivi added 2 commits February 29, 2024 10:23

Merge branch 'main' into test_draw_close

159a695

add OS to test job name

7bae77d

test with new (macos created) dataset

508cd92

test with macos 14

a66d9c6

veni-vidi-vici-dormivi and others added 2 commits March 4, 2024 15:02

print max difference between old and new dataset

b8b1232

[pre-commit.ci] auto fixes from pre-commit.com hooks

2c70de4

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi and others added 2 commits March 5, 2024 17:12

very dirty first try to reuse A for drawing innovations

5acaded

[pre-commit.ci] auto fixes from pre-commit.com hooks

14ff890

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi mentioned this pull request Mar 6, 2024

why are emulations not stable? #402

Closed

veni-vidi-vici-dormivi and others added 4 commits March 7, 2024 14:14

revert implementation of A and home baked multivariate normal

f41ef13

Actually revert "very dirty first try to reuse A for drawing innovati…

0ce6c23

…ons" This reverts commit 5acaded.

fix dimensions of innovations

e2a176a

[pre-commit.ci] auto fixes from pre-commit.com hooks

aa75eec

for more information, see https://pre-commit.ci

veni-vidi-vici-dormivi and others added 7 commits March 7, 2024 16:11

implement cholesky for drawing realizations

21f0b9e

[pre-commit.ci] auto fixes from pre-commit.com hooks

c9954bb

for more information, see https://pre-commit.ci

adjust tests

f813882

add test for eigh

d7426cc

[pre-commit.ci] auto fixes from pre-commit.com hooks

02bf602

for more information, see https://pre-commit.ci

fix test

871a32e

change mac os back to latest

4a8dcf4

Merge branch 'main' into test_draw_close

9748df7

mathause approved these changes Mar 8, 2024

View reviewed changes

.github/workflows/ci-workflow.yml Outdated Show resolved Hide resolved

mesmer/stats/_auto_regression.py Outdated Show resolved Hide resolved

tests/integration/test_draw_realisations_from_bundle.py Outdated Show resolved Hide resolved

mesmer/stats/_auto_regression.py Show resolved Hide resolved

REmove comment for default driver in eigh

c5fb678

Co-authored-by: Mathias Hauser <mathause@users.noreply.github.com>

veni-vidi-vici-dormivi changed the title ~~Test if drawing realizations is close~~ Stabilize drawing emulations Mar 8, 2024

veni-vidi-vici-dormivi mentioned this pull request Mar 11, 2024

Reconsider try Cholesky approach #413

Open

veni-vidi-vici-dormivi and others added 6 commits March 11, 2024 14:53

only run test for windows and macos for python 3.12

02ba682

change warning on eigh to LinAlgWarning

c496ef5

change test_draw to only use assert_close

4a6341e

use cholesky also in localized_covariance logpdf

2c4cc90

CHANGELOG

ecb6103

[pre-commit.ci] auto fixes from pre-commit.com hooks

a486a8e

for more information, see https://pre-commit.ci

mathause approved these changes Mar 12, 2024

View reviewed changes

veni-vidi-vici-dormivi changed the title ~~Stabilize drawing emulations~~ Implement Cholesky for covariance handling Mar 12, 2024

veni-vidi-vici-dormivi merged commit 79c8ee0 into MESMER-group:main Mar 12, 2024
8 checks passed

veni-vidi-vici-dormivi deleted the test_draw_close branch March 12, 2024 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement Cholesky for covariance handling #408

Implement Cholesky for covariance handling #408

veni-vidi-vici-dormivi commented Feb 28, 2024 •

edited

Loading

codecov bot commented Feb 28, 2024 •

edited

Loading

mathause commented Feb 28, 2024

veni-vidi-vici-dormivi commented Mar 1, 2024

veni-vidi-vici-dormivi commented Mar 1, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 4, 2024

mathause commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 4, 2024

mathause commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 4, 2024

mathause commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 5, 2024

veni-vidi-vici-dormivi commented Mar 7, 2024

veni-vidi-vici-dormivi commented Mar 8, 2024

mathause left a comment •

edited

Loading

mathause left a comment •

edited

Loading

veni-vidi-vici-dormivi commented Mar 12, 2024

mathause commented Mar 12, 2024

veni-vidi-vici-dormivi commented Mar 12, 2024

mathause commented Mar 12, 2024

Implement Cholesky for covariance handling #408

Implement Cholesky for covariance handling #408

Conversation

veni-vidi-vici-dormivi commented Feb 28, 2024 • edited Loading

codecov bot commented Feb 28, 2024 • edited Loading

Codecov Report

mathause commented Feb 28, 2024

veni-vidi-vici-dormivi commented Mar 1, 2024

veni-vidi-vici-dormivi commented Mar 1, 2024 • edited Loading

veni-vidi-vici-dormivi commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 4, 2024

mathause commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 4, 2024

mathause commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 4, 2024

mathause commented Mar 4, 2024

veni-vidi-vici-dormivi commented Mar 5, 2024

veni-vidi-vici-dormivi commented Mar 7, 2024

veni-vidi-vici-dormivi commented Mar 8, 2024

mathause left a comment • edited Loading

Choose a reason for hiding this comment

mathause left a comment • edited Loading

Choose a reason for hiding this comment

veni-vidi-vici-dormivi commented Mar 12, 2024

mathause commented Mar 12, 2024

veni-vidi-vici-dormivi commented Mar 12, 2024

mathause commented Mar 12, 2024

veni-vidi-vici-dormivi commented Feb 28, 2024 •

edited

Loading

codecov bot commented Feb 28, 2024 •

edited

Loading

veni-vidi-vici-dormivi commented Mar 1, 2024 •

edited

Loading

mathause left a comment •

edited

Loading

mathause left a comment •

edited

Loading