-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Vector Fields Notebook #341
Vector Fields Notebook #341
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening your first PR into GPJax!
If you have not heard from us in a while, please feel free to ping @gpjax/developers
or anyone who has commented on the PR. Most of our reviewers are volunteers and sometimes things fall through the cracks.
You can also join us on Slack for real-time discussion.
For details on testing, writing docs, and our review process, please see the developer guide
We strive to be a welcoming and open project. Please follow our Code of Conduct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic first notebook @ivanshalashilin . I've left a few comments
It would be great if we could get a real(ish) dataset in here instead/as well as the synthetic one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice work @ivanshalashilin! I've left some comments that I'd like to see resolved before we can merge - please ask if there's anything that's unclear.
Sorry for changing the filename, I misunderstood Henry's first comment. I have made quite a lot of changes: placed a lot of code in functions, mostly changed the data to be inside csv and read in via pandas. Currently the metric for comparing NLPD for the two models is saying that the velocity gp is better than the helmholtz gp, which is because the optimiser is varying the mean away from 0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work @ivanshalashilin. Just completed a first round of checking the text - noticed some typos. [EDIT: These can just be committed here on GitHub via the Sign off and commit suggestion
- and let me know if you think I have made a mistake! :) ]
Will proceed to check the maths and then review your code in a subsequent review.
docs/examples/oceanmodelling.py
Outdated
# application to real world ocean surface velocity data, collected via surface drifters. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# application to real world ocean surface velocity data, collected via surface drifters. | |
# | |
# application to real-world ocean surface velocity data, collected via surface drifters. | |
# |
docs/examples/oceanmodelling.py
Outdated
# # Gaussian Processes for Vector Fields and Ocean Current Modelling | ||
# | ||
# In this notebook, we use Gaussian processes to learn vector valued functions. We will be | ||
# recreating the results by [Berlinghieri et. al, (2023)](https://arxiv.org/pdf/2302.10364.pdf) by an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# recreating the results by [Berlinghieri et. al, (2023)](https://arxiv.org/pdf/2302.10364.pdf) by an | |
# recreating the results by [Berlinghieri et al., (2023)](https://arxiv.org/pdf/2302.10364.pdf) by an |
docs/examples/oceanmodelling.py
Outdated
# recreating the results by [Berlinghieri et. al, (2023)](https://arxiv.org/pdf/2302.10364.pdf) by an | ||
# application to real world ocean surface velocity data, collected via surface drifters. | ||
# | ||
# Surface drifters are measurement devices that measure the dynamics and circulation patterns of the world's oceans. Studying and predicting ocean currents are important to climate research, for example forecasting and predicting oil spills, oceanographic surveying of eddies and upwelling, or providing information on the distribution of biomass in ecosystems. We will be using the [Gulf Drifters Open dataset](https://zenodo.org/record/4421585), which contains all publicly available surface drifter trajectories from the Gulf of Mexico spanning 28 years. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Surface drifters are measurement devices that measure the dynamics and circulation patterns of the world's oceans. Studying and predicting ocean currents are important to climate research, for example forecasting and predicting oil spills, oceanographic surveying of eddies and upwelling, or providing information on the distribution of biomass in ecosystems. We will be using the [Gulf Drifters Open dataset](https://zenodo.org/record/4421585), which contains all publicly available surface drifter trajectories from the Gulf of Mexico spanning 28 years. | |
# Surface drifters are measurement devices that measure the dynamics and circulation patterns of the world's oceans. Studying and predicting ocean currents are important to climate research, for example, forecasting and predicting oil spills, oceanographic surveying of eddies and upwelling, or providing information on the distribution of biomass in ecosystems. We will be using the [Gulf Drifters Open dataset](https://zenodo.org/record/4421585), which contains all publicly available surface drifter trajectories from the Gulf of Mexico spanning 28 years. |
docs/examples/oceanmodelling.py
Outdated
# Our aim is to obtain estimates for $\mathbf{F}$ at the set of points $\left\{ \mathbf{x}_{0,i} \right\}_{i=1}^N$ using Gaussian processes, followed by a comparison of the latent model to the ground truth $D_0$. Note that $D_0$ is not passed into any functions used by GPJax, and is only used to compare against the two GP models at the end of the notebook. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Our aim is to obtain estimates for $\mathbf{F}$ at the set of points $\left\{ \mathbf{x}_{0,i} \right\}_{i=1}^N$ using Gaussian processes, followed by a comparison of the latent model to the ground truth $D_0$. Note that $D_0$ is not passed into any functions used by GPJax, and is only used to compare against the two GP models at the end of the notebook. | |
# | |
# We aim to obtain estimates for $\mathbf{F}$ at the set of points $\left\{ \mathbf{x}_{0,i} \right\}_{i=1}^N$ using Gaussian processes, followed by a comparison of the latent model to the ground truth $D_0$. Note that $D_0$ is not passed into any functions used by GPJax, and is only used to compare against the two GP models at the end of the notebook. | |
# |
docs/examples/oceanmodelling.py
Outdated
# We repeat the exact same steps as with the velocity GP model, but replacing `VelocityKernel` with `HelmholtzKernel`. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# We repeat the exact same steps as with the velocity GP model, but replacing `VelocityKernel` with `HelmholtzKernel`. | |
# We repeat the same steps as with the velocity GP model, replacing `VelocityKernel` with `HelmholtzKernel`. | |
docs/examples/oceanmodelling.py
Outdated
# Visually, the Helmholtz model performs better than the velocity model, preserving the local structure of the $\mathbf{F}$. Since we placed priors on $\Phi$ and $\Psi$, the construction of $\mathbf{F}$ allows for correlations between the dimensions (non-zero off diagonal elements in the Gram matrix populated by $k_\text{Helm}\left(\mathbf{X},\mathbf{X}^{\prime}\right)$ ). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Visually, the Helmholtz model performs better than the velocity model, preserving the local structure of the $\mathbf{F}$. Since we placed priors on $\Phi$ and $\Psi$, the construction of $\mathbf{F}$ allows for correlations between the dimensions (non-zero off diagonal elements in the Gram matrix populated by $k_\text{Helm}\left(\mathbf{X},\mathbf{X}^{\prime}\right)$ ). | |
# Visually, the Helmholtz model performs better than the velocity model, preserving the local structure of the $\mathbf{F}$. Since we placed priors on $\Phi$ and $\Psi$, the construction of $\mathbf{F}$ allows for correlations between the dimensions (non-zero off-diagonal elements in the Gram matrix populated by $k_\text{Helm}\left(\mathbf{X},\mathbf{X}^{\prime}\right)$ ). | |
docs/examples/oceanmodelling.py
Outdated
# Lastly, we directly compare the velocity and Hemlholtz models by computing the [negative log predictive densities](https://en.wikipedia.org/wiki/Negative_log_predictive_density) for each model. This is a quantitative metric that measures the probability of the ground truth given the data. | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Lastly, we directly compare the velocity and Hemlholtz models by computing the [negative log predictive densities](https://en.wikipedia.org/wiki/Negative_log_predictive_density) for each model. This is a quantitative metric that measures the probability of the ground truth given the data. | |
# | |
# Lastly, we directly compare the velocity and Helmholtz models by computing the [negative log predictive densities](https://en.wikipedia.org/wiki/Negative_log_predictive_density) for each model. This is a quantitative metric that measures the probability of the ground truth given the data. | |
# |
docs/examples/oceanmodelling.py
Outdated
# %% [markdown] | ||
# <span id="fn1"></span> | ||
# ## Footnote | ||
# Kernels for vector valued functions have been studied in the literature, see [Alvarez et. al, (2012)](https://doi.org/10.48550/arXiv.1106.6251) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Kernels for vector valued functions have been studied in the literature, see [Alvarez et. al, (2012)](https://doi.org/10.48550/arXiv.1106.6251) | |
# Kernels for vector-valued functions have been studied in the literature see e.g., [Alvarez et al., (2012)](https://doi.org/10.48550/arXiv.1106.6251). |
docs/examples/oceanmodelling.py
Outdated
gulf_data_train = pd.read_csv( | ||
"https://raw.githubusercontent.com/JaxGaussianProcesses/gpjaxstatic/main/data/gulfdata_train.csv" | ||
) | ||
gulf_data_test = pd.read_csv( | ||
"https://raw.githubusercontent.com/JaxGaussianProcesses/gpjaxstatic/main/data/gulfdata_test.csv" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ivanshalashilin - we have an issue with the links here. (Docs build fail and can't load this on my local machine).
docs/examples/oceanmodelling.py
Outdated
# $$ | ||
# where $\mathbf{x} = (x^{(0)}$,$x^{(1)})^\text{T}$, with a vector basis in the standard Cartesian directions (dimensions will be indicated by superscripts). | ||
# | ||
# We shall label the ground truth $D_0=\left\{ \left(\mathbf{x}_{0,i} , \mathbf{y}_{0,i} \right)\right\}_{i=1}^N$, where $\mathbf{y}_i$ is the 2 dimensional velocity vector at the $i$th location, $\mathbf{x}_i$. The training dataset contains simulated measurements from ocean drifters $D_T=\left\{\left(\mathbf{x}_{T,i}, \mathbf{y}_{T,i} \right)\right\}_{i=1}^{N_T}$, $N_T = 20$ in this case (the subscripts indicate the ground truth and the simulated measurements respectively). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will not render otherwise!
# We shall label the ground truth $D_0=\left\{ \left(\mathbf{x}_{0,i} , \mathbf{y}_{0,i} \right)\right\}_{i=1}^N$, where $\mathbf{y}_i$ is the 2 dimensional velocity vector at the $i$th location, $\mathbf{x}_i$. The training dataset contains simulated measurements from ocean drifters $D_T=\left\{\left(\mathbf{x}_{T,i}, \mathbf{y}_{T,i} \right)\right\}_{i=1}^{N_T}$, $N_T = 20$ in this case (the subscripts indicate the ground truth and the simulated measurements respectively). | |
# We shall label the ground truth $D_0=\left\{ \left(\mathbf{x}_{0,i} , \mathbf{y}_{0,i} \right)\right\}_{i=1}^N$, where $\mathbf{y}_i$ is the 2 dimensional velocity vector at the $i$-th location, $\mathbf{x}_i$. The training dataset contains simulated measurements from ocean drifters $D_T=\left\{\left(\mathbf{x}_{T,i}, \mathbf{y}_{T,i} \right)\right\}_{i=1}^{N_T}$, $N_T = 20$ in this case (the subscripts indicate the ground truth and the simulated measurements respectively). |
c5483c9
to
b73e373
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ivanshalashilin this is outstanding work. :)
Please feel free to merge once the documentation tests pass.
a2bf888
to
ae86257
Compare
Type of changes
Checklist
poetry run pre-commit run --all-files --show-diff-on-failure
before committing.Description
A notebook recreating the results of Berlinghieri et. al along with 4 csv files containing required data. The Helmholtz GP is consistent with the literature, but the velocity GP disagrees with their results. The implementation only works with stationary kernels.
Issue Number: N/A