Refactor data step, inference step, Jupyter notebooks #97

raehik · 2023-10-17T11:37:00Z

This PR covers multiple "feature" changes.

refactor data step: replace CLI, rewrite some library internals, add more documentation
- See refactor data step script into library (API) and consumer (CLI) #85 for further details on work completed here.
refactor train step: replace CLI, refactor & document various data processes
add new infer step: load prepped data (low-res ocean velocities) and pre-trained model, predict forcings using model
update Jupyter notebooks to use new interfaces (code de-duplication, documenting)

Extra:

document parts of the model, training (mirroring some explanations from the paper)

Important to-dos:

re-add "testing" script cli/testing.py (needs better name! check?)
- currently broken due to MLflow param loading. needed to reproduce paper figures, where the 5% of input data we skip during training is predicted on. you can still produce those figures using predictions from cli/infer.py, but you would have to split the data again yourself for the same behaviour. we could rewrite the dataset splitting used in cli/train.py in some way that it is easy to obtain the correct data to use in cli/infer.py.
- also does fine-tuning. in previous discussions Arthur was OK to remove this, which was convenient for removing extracting MLflow dependency. if we re-added fine-tuning to cli/infer.py too, we could remove cli/testing.py
discuss making predictions (using pretrained model)
- somewhat avoided this-- pointed to paper for discussion on model usage
general code & user documentation on forcing dataset shape (co-ord, variable, dimension names)
Document how to use new inference mode: download pre-trained model from HuggingFace, load in dataset, run. Link from readme
finish last checked-over Jupyter notebook (test_global_control)
- This notebook needs two datasets:
  - Computed forcings from the CM2.6 dataset which are then coarsened
  - Output of inference (testing) step
- These are then compared
- ~~[ ] Provide option to use the ML flow style, or to load from file (e.g., xarray.open_zarr on the forcings and output of inference step).~~

Closes #87 , #90 , #98 , #4 .

Known bugs:

Training fails with IndexError when performing train/test dataset splitting #104
when training, IndexError: index x is out of bounds for axis 0 with size y where x > y
- somehow, the dataloaders are being asked for samples past their end?
- perhaps the custom indexing over xarray is problematic for short arrays?
- perhaps I broke the indexing, and it's using a coord for index? (the size is correct at least, but the index seems ~0-310)
Forcing generation is too memory hungry again #107
while training, I sometimes get a PyTorch Runtime Error: input type (double) and bias type (float) should be the same. It seems random. I can't tell what it's triggered by. Model parameter and training data types seem to match just fine -- I don't know where the double is coming from.

review-notebook-app · 2023-10-17T15:59:11Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

dorchard · 2023-11-30T14:13:18Z

Can I request that the module in the cli directory have their _cli_desc put as the first definition after imports with a comment that says # Description of this module or something as they provide a good explanation of what the file is about.

dorchard · 2023-11-30T14:17:54Z

The section marked ## Data on HuggingFace probably needs to say something about what kind of data and trained model (i.e., that the moment it is low res) and we can update this later if we can get the high-res.

Rewrite as a library (set of functions) and a CLI.

Cleaner subdomain configuration.

Also locks intake catalog to current HEAD.

No need to repeat sigma according to docs.

Also does more operations up front in the CLI for testing purposes.

Seems that Dask needs the explicit map_blocks to schedule properly. Otherwise memory usage balloons.

src/gz21_ocean_momentum/cli/train.py

MarionBWeinzierl

We discussed the changes previously, just a couple of tidy-up comments/questions below.

docs/2021-paper-reproduction.md

resources/cli-configs/train-subdomains-paper.yaml

src/gz21_ocean_momentum/train/base.py

src/gz21_ocean_momentum/lib/data.py

MarionBWeinzierl

This is ok to be merged in

Commented out since first commit. Uses the wrong variable for temperature in CM2.6 (surface_temp, not surface_temperature).

raehik force-pushed the data-step-refactor branch from b483fd0 to 87a0cae Compare October 17, 2023 11:41

raehik force-pushed the data-step-refactor branch 3 times, most recently from d5755c3 to 60813a6 Compare October 18, 2023 14:29

This was referenced Oct 25, 2023

Add more type annotations #57

Closed

Training step refactor #95

Closed

cmip26.py script name is mistaken (CM2.6, not CMIP) #87

Closed

raehik changed the title ~~Data step refactor~~ Refactor data step, inference step, Jupyter notebooks Oct 30, 2023

raehik mentioned this pull request Nov 8, 2023

Restructure from scripts #8

Closed

raehik force-pushed the data-step-refactor branch 2 times, most recently from 0c93ea2 to ebfcd52 Compare November 9, 2023 13:59

raehik mentioned this pull request Nov 22, 2023

Training fails with IndexError when performing train/test dataset splitting #104

Open

raehik mentioned this pull request Nov 29, 2023

Updates to test_global_control notebook #103

Closed

7 tasks

raehik force-pushed the data-step-refactor branch from 1a5daf0 to 998cf51 Compare November 30, 2023 12:02

raehik added 13 commits December 4, 2023 16:11

data step: refactor

4d56047

Rewrite as a library (set of functions) and a CLI.

use new code paths in training step; fix MLproject

21e0add

Cleaner subdomain configuration.

data: enable selecting Pangeo intake catalog

8cd24a4

Also locks intake catalog to current HEAD.

MLproject: main->data, rename project

8f35643

tweak MLproject, readme, data step CLI help

f9dec0a

cli/data: re-add logging

fdb9fa0

cli/data: +log bounding operation

5f9dab7

step/data: simplify gaussian_filter call

0748393

No need to repeat sigma according to docs.

cli/data: fix forcing compute call arg order

0346a4e

step/data: remove code copied from unused debug

c46a6e0

Also does more operations up front in the CLI for testing purposes.

step/data: fix coarsening scale args

eb8586a

step/data/coarsen: cleaning

ab1880d

step/data: simplify ufunc call

228ca93

raehik linked an issue Dec 5, 2023 that may be closed by this pull request

Inference step may not enable general "inference mode" usage #98

Closed

cli/data: match previous code closer

6d5d651

raehik mentioned this pull request Dec 6, 2023

Implemented some TODOs #89

Closed

raehik linked an issue Dec 6, 2023 that may be closed by this pull request

Remove MLFlow framework #4

Closed

This was referenced Dec 6, 2023

Fix and document remaining notebooks #93

Open

Remove MLFlow framework #4

Closed

raehik added 4 commits December 6, 2023 15:19

update Nix flake

0582c9f

forcing generation: return to map_blocks

caf9a7d

Seems that Dask needs the explicit map_blocks to schedule properly. Otherwise memory usage balloons.

tests: fix import error

7bddf76

cli/data: allow specifying Dask num_workers

5095ce7

raehik marked this pull request as ready for review December 6, 2023 16:05

mondus reviewed Dec 6, 2023

View reviewed changes

src/gz21_ocean_momentum/cli/train.py Outdated Show resolved Hide resolved

dorchard and others added 7 commits December 6, 2023 16:14

flag _cli_desc with comments pointing out that this describes the module

5a26e1b

additional comment

71f1fe9

notebooks: tweak docs

f3279c9

notebooks: tweak paths

63900af

cli/train: clean up

99baa8d

cli/test: re-add (WIP)

301f1bb

readme: +note on inference script

fe047da

raehik requested a review from MarionBWeinzierl December 8, 2023 14:36

MarionBWeinzierl reviewed Dec 8, 2023

View reviewed changes

raehik added 4 commits December 8, 2023 14:53

readme: +note on Hugging Face data res

a2dcdfd

docs/2021-paper-repro: describe training, predicting

8b4a3ba

resources/cli-configs/train-paper: note where from

25c834c

train/base: remove debug prints

68b4151

MarionBWeinzierl approved these changes Dec 8, 2023

View reviewed changes

raehik added 2 commits December 8, 2023 16:21

lib/data: clarify rechunking comment

56b2ee0

lib/data: remove commented-out temperature interp

e426891

Commented out since first commit. Uses the wrong variable for temperature in CM2.6 (surface_temp, not surface_temperature).

raehik merged commit 414aa8e into main Dec 11, 2023
4 of 6 checks passed

raehik deleted the data-step-refactor branch December 19, 2023 16:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor data step, inference step, Jupyter notebooks #97

Refactor data step, inference step, Jupyter notebooks #97

raehik commented Oct 17, 2023 •

edited

Loading

review-notebook-app bot commented Oct 17, 2023

dorchard commented Nov 30, 2023 •

edited

Loading

dorchard commented Nov 30, 2023

MarionBWeinzierl left a comment

MarionBWeinzierl left a comment

Refactor data step, inference step, Jupyter notebooks #97

Refactor data step, inference step, Jupyter notebooks #97

Conversation

raehik commented Oct 17, 2023 • edited Loading

review-notebook-app bot commented Oct 17, 2023

dorchard commented Nov 30, 2023 • edited Loading

dorchard commented Nov 30, 2023

MarionBWeinzierl left a comment

Choose a reason for hiding this comment

MarionBWeinzierl left a comment

Choose a reason for hiding this comment

raehik commented Oct 17, 2023 •

edited

Loading

dorchard commented Nov 30, 2023 •

edited

Loading