Skip to content

Commit

Permalink
docs: big update! (#66)
Browse files Browse the repository at this point in the history
  • Loading branch information
maawoo authored Oct 24, 2024
1 parent e1607ee commit 72d33bf
Show file tree
Hide file tree
Showing 21 changed files with 13,359 additions and 12,799 deletions.
34 changes: 20 additions & 14 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
@@ -1,19 +1,25 @@
format: jb-book
root: index
parts:
- caption: Introduction
chapters:
- file: content/01_Introduction/01_00_Installation
- file: content/01_Introduction/02_00_Overview
- caption: Getting Started
chapters:
- file: content/02_Getting_Started/01_00_Data_Access
sections:
- file: content/02_Getting_Started/01_01_Sentinel1
- file: content/02_Getting_Started/01_02_Sentinel2
- file: content/02_Getting_Started/01_03_SANLC
- file: content/02_Getting_Started/01_04_MSWEP
- file: content/02_Getting_Started/01_05_S1_SurfMI
- file: content/02_Getting_Started/01_06_S1_Coherence
- file: content/02_Getting_Started/01_07_Copernicus_DEM
- file: content/02_Getting_Started/02_00_How_to
- file: content/01/01_00_Installation
- file: content/01/02_00_Introduction
- file: content/01/03_00_Resources
- caption: Data Products
chapters:
- file: content/02/01_00_Sentinel2
- file: content/02/02_00_SANLC
- file: content/02/03_00_MSWEP
- file: content/02/04_00_Sentinel1
- file: content/02/05_00_S1_SurfMI
- file: content/02/06_00_S1_Coherence
- file: content/02/07_00_Copernicus_DEM
- caption: How to...
chapters:
- file: content/03/01_00_Override_Params
- file: content/03/02_00_Dask_Dashboard
- file: content/03/03_00_Clip_to_vec
- file: content/03/04_00_Spyndex
- file: content/03/05_00_Count_valid
- file: content/03/06_00_STAC_Data
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# Installation

Provided that a Conda-based package manager (e.g.
[Micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html))
[Micromamba](https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html))
is installed on your system, the most up-to-date version of the `sdc-tools`
package can be installed using the following steps:

Expand Down Expand Up @@ -29,9 +29,9 @@ specifying the version tag. It is important to specify the same version tag for
both the environment and the package installation.

```bash
micromamba create --file https://raw.githubusercontent.com/Jena-Earth-Observation-School/sdc-tools/v0.2.0/environment.yml
micromamba create --file https://raw.githubusercontent.com/Jena-Earth-Observation-School/sdc-tools/v0.6.0/environment.yml
micromamba activate sdc_env
pip install git+https://github.com/Jena-Earth-Observation-School/sdc-tools.git@v0.2.0
pip install git+https://github.com/Jena-Earth-Observation-School/sdc-tools.git@v0.6.0
```

See the [releases page](https://github.com/Jena-Earth-Observation-School/sdc-tools/releases)
Expand Down
71 changes: 71 additions & 0 deletions docs/content/01/02_00_Introduction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
(load_product-intro)=
# Using this package

Before continuing with the notebooks of the "Data Products"-section, it is important
to have a basic understanding of how to use the `sdc-tools` package. This section will
introduce you to the `load_product`-function. This function is the recommended main
entry point for working with `sdc-tools`. It is a wrapper around various other,
product-specific functions and its goal is to provide a unified and easy-to-use
interface for loading data from the SDC.

A lot happens in the background and certain parameters are set to default
values so that the function can be used with minimal effort. Most importantly,
all data products are loaded with the coordinate reference system (CRS)
[EPSG:4326](https://epsg.io/4326) and pixel spacing is set to 0.0002°, which corresponds
to approximately 20 x 20 m at the equator.

The following basic example shows how to load Sentinel-2 L2A data for the year
2020 of an area of interest, which has been saved locally as a vector file:

```{code-block} python
from sdc.load import load_product
s2_data = load_product(product="s2_l2a",
vec="/path/to/my_area_of_interest.geojson",
time_range=("2020-01-01", "2021-01-01))
```

The basic usage is to specify the following parameters:

- `product`: The name of the data product to load. The following strings are
supported at the moment:
- _"s1_rtc"_: Sentinel-1 Radiometric Terrain Corrected (RTC)
- _"s1_surfmi"_: Sentinel-1 Surface Moisture Index (SurfMI)
- _"s1_coh"_: Sentinel-1 Coherence (VV-pol, ascending)
- _"s2_l2a"_: Sentinel-2 Level 2A (L2A)
- _"sanlc"_: South African National Land Cover (SANLC)
- _"mswep"_: Multi-Source Weighted-Ensemble Precipitation (MSWEP) daily
- _"cop_dem"_: Copernicus Digital Elevation Model GLO-30
- `vec`: Filter the returned data spatially by either providing the name of a
SALDi site in the format _"siteXX"_, where XX is the site number (e.g.
_"site06"_), or a path to a vector file (any format [`GeoPandas`](https://geopandas.org/en/stable/index.html)
can handle, e.g. GeoJSON, GeoPackage or ESRI Shapefile) that defines an area of
interest as a subset of a SALDi site. Providing a vector file outside the
spatial extent of the SALDi sites will result in an empty dataset. Please note,
that the bounding box of the provided geometry will be used to load the
data (see {ref}`clip_to_vec` for how to clip to the exact geometry).
- `time_range`: Filter the returned data temporally by providing a tuple of
strings in the format _("YY-MM-dd", "YY-MM-dd")_, or _None_ to return all
available data. If you want to use a different date format, you can also provide
the parameter `time_pattern` with a string that specifies the format of the
provided time strings.

The following additional parameters are product-specific, as indicated by their
prefix (e.g. _s2_ for Sentinel-2 L2A):

- `s2_apply_mask`: Apply a quality and cloud mask to the Sentinel-2 L2A product by using
its Scene Classification Layer (SCL) band. The default value is _True_.
- `sanlc_year`: Select a specific year of the SANLC product by providing an
integer in the format _YYYY_. The default value is _None_, which will return the
product for all available years: 2018 & 2020.

```{warning}
While it is possible to load data for an entire SALDi site by providing the site
name (e.g. _"site06"_), please be aware that this will result in a large dataset
and will very likely result in performance issues if your workflow is not
optimized.
It is therefore recommended to load only a subset by providing a vector file
defining an area of interest (e.g., using https://geojson.io/). Develop your
workflow on a small subset of the data before scaling up!
```
74 changes: 74 additions & 0 deletions docs/content/01/03_00_Resources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Additional Resources

## Python and Jupyter Notebooks

If you want to get an introduction to [Python](https://www.python.org/) and/or
[Jupyter](https://jupyter.org/) Notebooks, I recommend the following resources from
Project Pythia:
- [Quickstart: Zero to Python](https://foundations.projectpythia.org/foundations/quickstart.html)
- [Getting Started with Jupyter](https://foundations.projectpythia.org/foundations/getting-started-jupyter.html)

[Project Pythia Foundations](https://foundations.projectpythia.org/landing-page.html)
also provides tutorials on various core scientific Python packages, such as NumPy,
Matplotlib and Pandas, which you will likely encounter at some point.

(xarray-dask-intro)=
## Xarray, Dask and lazy loading

The `load_product`-function returns an `xarray.Dataset` object, which is a
powerful data structure for working with multidimensional data. [Xarray](https://xarray.dev/)
is a Python library that _"[...] introduces labels in the form of dimensions,
coordinates and attributes on top of raw NumPy-like arrays, which allows for more
intuitive, more concise, and less error-prone user experience."_.

See the following resources for more information:
- [Overview: Why Xarray?](https://docs.xarray.dev/en/latest/getting-started-guide/why-xarray.html)
- [Tutorial: Xarray in 45 minutes](https://tutorial.xarray.dev/overview/xarray-in-45-min.html)
- [Xarray Documentation](https://docs.xarray.dev/en/latest/index.html) (Very important resource! 😉)

Xarray closely integrates with the [Dask](https://dask.org/) library, which is a
_"[...] flexible library for parallel computing in Python."_ and allows for
datasets to be loaded lazily, meaning that the data is not loaded into memory
until it is actually needed. This is especially useful when working with large
datasets that might not fit into the available memory. These large datasets are split
into smaller chunks that can then be efficiently processed in parallel.

Most of this is happening in the background, so you don't have to worry too much about
it. However, it is important to be aware of it, as it affects the way you need to
work with the data. For example, you need to be careful when applying certain
Xarray operations, such as calling [`.values`](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.values.html#xarray.DataArray.values),
as they might trigger the entire dataset to be loaded into memory and can result in
performance issues if the data has not been [aggregated](https://docs.xarray.dev/en/latest/api.html#aggregation)
or [indexed](https://docs.xarray.dev/en/latest/user-guide/indexing.html) beforehand.
Furthermore, you might reach a point where you need to use advanced techniques
to optimize your workflow, such as re-orienting the chunks or [persisting](https://docs.dask.org/en/latest/best-practices.html#persist-when-you-can)
intermediate results in memory. For now, just keep all of this in mind and reach
out to me if you have any questions or need help with optimizing your workflow.

The following resources provide more information:
- [User Guide: Using Dask with xarray](https://docs.xarray.dev/en/latest/user-guide/dask.html#using-dask-with-xarray)
- [Tutorial: Parallel computing with Dask](https://tutorial.xarray.dev/intermediate/xarray_and_dask.html#parallel-computing-with-dask)

## Digital Earth Africa

### Tutorials

The two main data products of the SDC, Sentinel-1 RTC and Sentinel-2 L2A, are direct
copies of the open and free "Analysis Ready Data" products provided by [Digital Earth Africa (DE Africa)](https://www.digitalearthafrica.org/).

The team of DE Africa provides a lot of very helpful tutorials as Jupyter Notebooks.
Some of these tutorials cover more advanced and analysis-specific topics to address
real-world problems. While the loading of the data differs between these tutorials and
the SDC, most of the analysis techniques can be directly applied to the SDC data
products as well. It is therefore highly recommended to have a look at the tutorials in
the course of your work with the SDC data products:
- [DE Africa Real World Examples](https://docs.digitalearthafrica.org/en/latest/sandbox/notebooks/Real_world_examples/index.html)

### `deafrica-tools` package

Some of these tutorials are using a package called `deafrica-tools`, which includes
useful functions and utilities, e.g. for the calculation of [vegetation phenology statistics](https://docs.digitalearthafrica.org/en/latest/sandbox/notebooks/Real_world_examples/Phenology_optical.html). You can find the package on GitHub:
- [Digital Earth Africa Tools Package](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks/tree/main/Tools)

If you want to use any functions of `deafrica-tools` and need assistance with the
installation or usage of the package, please let me know!
18 changes: 0 additions & 18 deletions docs/content/01_Introduction/02_00_Overview.md

This file was deleted.

Loading

0 comments on commit 72d33bf

Please sign in to comment.