Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vector Fields Notebook #341

Merged

Conversation

ivanshalashilin
Copy link

Type of changes

  • Documentation / docstrings

Checklist

  • I've formatted the new code by running poetry run pre-commit run --all-files --show-diff-on-failure before committing.

Description

A notebook recreating the results of Berlinghieri et. al along with 4 csv files containing required data. The Helmholtz GP is consistent with the literature, but the velocity GP disagrees with their results. The implementation only works with stationary kernels.

Issue Number: N/A

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening your first PR into GPJax!

If you have not heard from us in a while, please feel free to ping @gpjax/developers or anyone who has commented on the PR. Most of our reviewers are volunteers and sometimes things fall through the cracks.

You can also join us on Slack for real-time discussion.

For details on testing, writing docs, and our review process, please see the developer guide

We strive to be a welcoming and open project. Please follow our Code of Conduct.

@henrymoss henrymoss self-requested a review August 3, 2023 07:07
Copy link
Contributor

@henrymoss henrymoss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic first notebook @ivanshalashilin . I've left a few comments

It would be great if we could get a real(ish) dataset in here instead/as well as the synthetic one.

docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@thomaspinder thomaspinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @ivanshalashilin! I've left some comments that I'd like to see resolved before we can merge - please ask if there's anything that's unclear.

docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
@thomaspinder thomaspinder added the documentation Improvements or additions to documentation label Aug 4, 2023
@thomaspinder thomaspinder added this to the v1.0.0 milestone Aug 4, 2023
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
docs/examples/vectorfields.py Outdated Show resolved Hide resolved
@ivanshalashilin
Copy link
Author

Sorry for changing the filename, I misunderstood Henry's first comment. I have made quite a lot of changes: placed a lot of code in functions, mostly changed the data to be inside csv and read in via pandas. Currently the metric for comparing NLPD for the two models is saying that the velocity gp is better than the helmholtz gp, which is because the optimiser is varying the mean away from 0.

Copy link
Member

@daniel-dodd daniel-dodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @ivanshalashilin. Just completed a first round of checking the text - noticed some typos. [EDIT: These can just be committed here on GitHub via the Sign off and commit suggestion - and let me know if you think I have made a mistake! :) ]

Will proceed to check the maths and then review your code in a subsequent review.

docs/examples/oceanmodelling.py Outdated Show resolved Hide resolved
Comment on lines 6 to 7
# application to real world ocean surface velocity data, collected via surface drifters.
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# application to real world ocean surface velocity data, collected via surface drifters.
#
# application to real-world ocean surface velocity data, collected via surface drifters.
#

# # Gaussian Processes for Vector Fields and Ocean Current Modelling
#
# In this notebook, we use Gaussian processes to learn vector valued functions. We will be
# recreating the results by [Berlinghieri et. al, (2023)](https://arxiv.org/pdf/2302.10364.pdf) by an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# recreating the results by [Berlinghieri et. al, (2023)](https://arxiv.org/pdf/2302.10364.pdf) by an
# recreating the results by [Berlinghieri et al., (2023)](https://arxiv.org/pdf/2302.10364.pdf) by an

# recreating the results by [Berlinghieri et. al, (2023)](https://arxiv.org/pdf/2302.10364.pdf) by an
# application to real world ocean surface velocity data, collected via surface drifters.
#
# Surface drifters are measurement devices that measure the dynamics and circulation patterns of the world's oceans. Studying and predicting ocean currents are important to climate research, for example forecasting and predicting oil spills, oceanographic surveying of eddies and upwelling, or providing information on the distribution of biomass in ecosystems. We will be using the [Gulf Drifters Open dataset](https://zenodo.org/record/4421585), which contains all publicly available surface drifter trajectories from the Gulf of Mexico spanning 28 years.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Surface drifters are measurement devices that measure the dynamics and circulation patterns of the world's oceans. Studying and predicting ocean currents are important to climate research, for example forecasting and predicting oil spills, oceanographic surveying of eddies and upwelling, or providing information on the distribution of biomass in ecosystems. We will be using the [Gulf Drifters Open dataset](https://zenodo.org/record/4421585), which contains all publicly available surface drifter trajectories from the Gulf of Mexico spanning 28 years.
# Surface drifters are measurement devices that measure the dynamics and circulation patterns of the world's oceans. Studying and predicting ocean currents are important to climate research, for example, forecasting and predicting oil spills, oceanographic surveying of eddies and upwelling, or providing information on the distribution of biomass in ecosystems. We will be using the [Gulf Drifters Open dataset](https://zenodo.org/record/4421585), which contains all publicly available surface drifter trajectories from the Gulf of Mexico spanning 28 years.

Comment on lines 108 to 109
# Our aim is to obtain estimates for $\mathbf{F}$ at the set of points $\left\{ \mathbf{x}_{0,i} \right\}_{i=1}^N$ using Gaussian processes, followed by a comparison of the latent model to the ground truth $D_0$. Note that $D_0$ is not passed into any functions used by GPJax, and is only used to compare against the two GP models at the end of the notebook.
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Our aim is to obtain estimates for $\mathbf{F}$ at the set of points $\left\{ \mathbf{x}_{0,i} \right\}_{i=1}^N$ using Gaussian processes, followed by a comparison of the latent model to the ground truth $D_0$. Note that $D_0$ is not passed into any functions used by GPJax, and is only used to compare against the two GP models at the end of the notebook.
#
# We aim to obtain estimates for $\mathbf{F}$ at the set of points $\left\{ \mathbf{x}_{0,i} \right\}_{i=1}^N$ using Gaussian processes, followed by a comparison of the latent model to the ground truth $D_0$. Note that $D_0$ is not passed into any functions used by GPJax, and is only used to compare against the two GP models at the end of the notebook.
#

Comment on lines 448 to 449
# We repeat the exact same steps as with the velocity GP model, but replacing `VelocityKernel` with `HelmholtzKernel`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# We repeat the exact same steps as with the velocity GP model, but replacing `VelocityKernel` with `HelmholtzKernel`.
# We repeat the same steps as with the velocity GP model, replacing `VelocityKernel` with `HelmholtzKernel`.

Comment on lines 472 to 473
# Visually, the Helmholtz model performs better than the velocity model, preserving the local structure of the $\mathbf{F}$. Since we placed priors on $\Phi$ and $\Psi$, the construction of $\mathbf{F}$ allows for correlations between the dimensions (non-zero off diagonal elements in the Gram matrix populated by $k_\text{Helm}\left(\mathbf{X},\mathbf{X}^{\prime}\right)$ ).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Visually, the Helmholtz model performs better than the velocity model, preserving the local structure of the $\mathbf{F}$. Since we placed priors on $\Phi$ and $\Psi$, the construction of $\mathbf{F}$ allows for correlations between the dimensions (non-zero off diagonal elements in the Gram matrix populated by $k_\text{Helm}\left(\mathbf{X},\mathbf{X}^{\prime}\right)$ ).
# Visually, the Helmholtz model performs better than the velocity model, preserving the local structure of the $\mathbf{F}$. Since we placed priors on $\Phi$ and $\Psi$, the construction of $\mathbf{F}$ allows for correlations between the dimensions (non-zero off-diagonal elements in the Gram matrix populated by $k_\text{Helm}\left(\mathbf{X},\mathbf{X}^{\prime}\right)$ ).

Comment on lines 477 to 478
# Lastly, we directly compare the velocity and Hemlholtz models by computing the [negative log predictive densities](https://en.wikipedia.org/wiki/Negative_log_predictive_density) for each model. This is a quantitative metric that measures the probability of the ground truth given the data.
#
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Lastly, we directly compare the velocity and Hemlholtz models by computing the [negative log predictive densities](https://en.wikipedia.org/wiki/Negative_log_predictive_density) for each model. This is a quantitative metric that measures the probability of the ground truth given the data.
#
# Lastly, we directly compare the velocity and Helmholtz models by computing the [negative log predictive densities](https://en.wikipedia.org/wiki/Negative_log_predictive_density) for each model. This is a quantitative metric that measures the probability of the ground truth given the data.
#

docs/examples/oceanmodelling.py Show resolved Hide resolved
# %% [markdown]
# <span id="fn1"></span>
# ## Footnote
# Kernels for vector valued functions have been studied in the literature, see [Alvarez et. al, (2012)](https://doi.org/10.48550/arXiv.1106.6251)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Kernels for vector valued functions have been studied in the literature, see [Alvarez et. al, (2012)](https://doi.org/10.48550/arXiv.1106.6251)
# Kernels for vector-valued functions have been studied in the literature see e.g., [Alvarez et al., (2012)](https://doi.org/10.48550/arXiv.1106.6251).

Comment on lines 70 to 75
gulf_data_train = pd.read_csv(
"https://raw.githubusercontent.com/JaxGaussianProcesses/gpjaxstatic/main/data/gulfdata_train.csv"
)
gulf_data_test = pd.read_csv(
"https://raw.githubusercontent.com/JaxGaussianProcesses/gpjaxstatic/main/data/gulfdata_test.csv"
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ivanshalashilin - we have an issue with the links here. (Docs build fail and can't load this on my local machine).

# $$
# where $\mathbf{x} = (x^{(0)}$,$x^{(1)})^\text{T}$, with a vector basis in the standard Cartesian directions (dimensions will be indicated by superscripts).
#
# We shall label the ground truth $D_0=\left\{ \left(\mathbf{x}_{0,i} , \mathbf{y}_{0,i} \right)\right\}_{i=1}^N$, where $\mathbf{y}_i$ is the 2 dimensional velocity vector at the $i$th location, $\mathbf{x}_i$. The training dataset contains simulated measurements from ocean drifters $D_T=\left\{\left(\mathbf{x}_{T,i}, \mathbf{y}_{T,i} \right)\right\}_{i=1}^{N_T}$, $N_T = 20$ in this case (the subscripts indicate the ground truth and the simulated measurements respectively).
Copy link
Member

@daniel-dodd daniel-dodd Sep 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will not render otherwise!

Suggested change
# We shall label the ground truth $D_0=\left\{ \left(\mathbf{x}_{0,i} , \mathbf{y}_{0,i} \right)\right\}_{i=1}^N$, where $\mathbf{y}_i$ is the 2 dimensional velocity vector at the $i$th location, $\mathbf{x}_i$. The training dataset contains simulated measurements from ocean drifters $D_T=\left\{\left(\mathbf{x}_{T,i}, \mathbf{y}_{T,i} \right)\right\}_{i=1}^{N_T}$, $N_T = 20$ in this case (the subscripts indicate the ground truth and the simulated measurements respectively).
# We shall label the ground truth $D_0=\left\{ \left(\mathbf{x}_{0,i} , \mathbf{y}_{0,i} \right)\right\}_{i=1}^N$, where $\mathbf{y}_i$ is the 2 dimensional velocity vector at the $i$-th location, $\mathbf{x}_i$. The training dataset contains simulated measurements from ocean drifters $D_T=\left\{\left(\mathbf{x}_{T,i}, \mathbf{y}_{T,i} \right)\right\}_{i=1}^{N_T}$, $N_T = 20$ in this case (the subscripts indicate the ground truth and the simulated measurements respectively).

Copy link
Member

@daniel-dodd daniel-dodd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ivanshalashilin this is outstanding work. :)

Please feel free to merge once the documentation tests pass.

@daniel-dodd daniel-dodd merged commit c927ce5 into JaxGaussianProcesses:main Sep 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants