docs: big update! (#66)

Jena-Earth-Observation-School · Oct 24, 2024 · 72d33bf · 72d33bf
1 parent e1607ee
commit 72d33bf
Show file tree

Hide file tree

Showing 21 changed files with 13,359 additions and 12,799 deletions.
diff --git a/docs/_toc.yml b/docs/_toc.yml
@@ -1,19 +1,25 @@
 format: jb-book
 root: index
 parts:
-  - caption: Introduction
-    chapters:
-    - file: content/01_Introduction/01_00_Installation
-    - file: content/01_Introduction/02_00_Overview
   - caption: Getting Started
     chapters:
-    - file: content/02_Getting_Started/01_00_Data_Access
-      sections:
-      - file: content/02_Getting_Started/01_01_Sentinel1
-      - file: content/02_Getting_Started/01_02_Sentinel2
-      - file: content/02_Getting_Started/01_03_SANLC
-      - file: content/02_Getting_Started/01_04_MSWEP
-      - file: content/02_Getting_Started/01_05_S1_SurfMI
-      - file: content/02_Getting_Started/01_06_S1_Coherence
-      - file: content/02_Getting_Started/01_07_Copernicus_DEM
-    - file: content/02_Getting_Started/02_00_How_to
+    - file: content/01/01_00_Installation
+    - file: content/01/02_00_Introduction
+    - file: content/01/03_00_Resources
+  - caption: Data Products
+    chapters:
+    - file: content/02/01_00_Sentinel2
+    - file: content/02/02_00_SANLC
+    - file: content/02/03_00_MSWEP
+    - file: content/02/04_00_Sentinel1
+    - file: content/02/05_00_S1_SurfMI
+    - file: content/02/06_00_S1_Coherence
+    - file: content/02/07_00_Copernicus_DEM
+  - caption: How to...
+    chapters:
+    - file: content/03/01_00_Override_Params
+    - file: content/03/02_00_Dask_Dashboard
+    - file: content/03/03_00_Clip_to_vec
+    - file: content/03/04_00_Spyndex
+    - file: content/03/05_00_Count_valid
+    - file: content/03/06_00_STAC_Data
diff --git a/...ent/01_Introduction/01_00_Installation.md → docs/content/01/01_00_Installation.md b/...ent/01_Introduction/01_00_Installation.md → docs/content/01/01_00_Installation.md
@@ -1,7 +1,7 @@
 # Installation
 
 Provided that a Conda-based package manager (e.g. 
-[Micromamba](https://mamba.readthedocs.io/en/latest/user_guide/micromamba.html))
+[Micromamba](https://mamba.readthedocs.io/en/latest/installation/micromamba-installation.html))
 is installed on your system, the most up-to-date version of the `sdc-tools` 
 package can be installed using the following steps:
 
@@ -29,9 +29,9 @@ specifying the version tag. It is important to specify the same version tag for
 both the environment and the package installation.
 
 ```bash
-micromamba create --file https://raw.githubusercontent.com/Jena-Earth-Observation-School/sdc-tools/v0.2.0/environment.yml
+micromamba create --file https://raw.githubusercontent.com/Jena-Earth-Observation-School/sdc-tools/v0.6.0/environment.yml
 micromamba activate sdc_env
-pip install git+https://github.com/Jena-Earth-Observation-School/sdc-tools.git@v0.2.0
+pip install git+https://github.com/Jena-Earth-Observation-School/sdc-tools.git@v0.6.0
 ```
 
 See the [releases page](https://github.com/Jena-Earth-Observation-School/sdc-tools/releases) 

diff --git a/docs/content/01/02_00_Introduction.md b/docs/content/01/02_00_Introduction.md
@@ -0,0 +1,71 @@
+(load_product-intro)=
+# Using this package
+
+Before continuing with the notebooks of the "Data Products"-section, it is important
+to have a basic understanding of how to use the `sdc-tools` package. This section will 
+introduce you to the `load_product`-function. This function is the recommended main 
+entry point for working with `sdc-tools`. It is a wrapper around various other, 
+product-specific functions and its goal is to provide a unified and easy-to-use 
+interface for loading data from the SDC.
+
+A lot happens in the background and certain parameters are set to default 
+values so that the function can be used with minimal effort. Most importantly,
+all data products are loaded with the coordinate reference system (CRS) 
+[EPSG:4326](https://epsg.io/4326) and pixel spacing is set to 0.0002°, which corresponds
+to approximately 20 x 20 m at the equator.
+
+The following basic example shows how to load Sentinel-2 L2A data for the year 
+2020 of an area of interest, which has been saved locally as a vector file:
+
+```{code-block} python
+from sdc.load import load_product
+
+s2_data = load_product(product="s2_l2a", 
+                       vec="/path/to/my_area_of_interest.geojson", 
+                       time_range=("2020-01-01", "2021-01-01))
+```
+
+The basic usage is to specify the following parameters:
+
+- `product`: The name of the data product to load. The following strings are 
+supported at the moment:
+    - _"s1_rtc"_: Sentinel-1 Radiometric Terrain Corrected (RTC)
+    - _"s1_surfmi"_: Sentinel-1 Surface Moisture Index (SurfMI)
+    - _"s1_coh"_: Sentinel-1 Coherence (VV-pol, ascending)
+    - _"s2_l2a"_: Sentinel-2 Level 2A (L2A)
+    - _"sanlc"_: South African National Land Cover (SANLC)
+    - _"mswep"_: Multi-Source Weighted-Ensemble Precipitation (MSWEP) daily
+    - _"cop_dem"_: Copernicus Digital Elevation Model GLO-30
+- `vec`: Filter the returned data spatially by either providing the name of a 
+SALDi site in the format _"siteXX"_, where XX is the site number (e.g. 
+_"site06"_), or a path to a vector file (any format [`GeoPandas`](https://geopandas.org/en/stable/index.html) 
+can handle, e.g. GeoJSON, GeoPackage or ESRI Shapefile) that defines an area of 
+interest as a subset of a SALDi site. Providing a vector file outside the 
+spatial extent of the SALDi sites will result in an empty dataset. Please note, 
+that the bounding box of the provided geometry will be used to load the 
+data (see {ref}`clip_to_vec` for how to clip to the exact geometry).
+- `time_range`: Filter the returned data temporally by providing a tuple of 
+strings in the format _("YY-MM-dd", "YY-MM-dd")_, or _None_ to return all 
+available data. If you want to use a different date format, you can also provide
+the parameter `time_pattern` with a string that specifies the format of the
+provided time strings.
+
+The following additional parameters are product-specific, as indicated by their 
+prefix (e.g. _s2_ for Sentinel-2 L2A):
+
+- `s2_apply_mask`: Apply a quality and cloud mask to the Sentinel-2 L2A product by using 
+its Scene Classification Layer (SCL) band. The default value is _True_.
+- `sanlc_year`: Select a specific year of the SANLC product by providing an
+integer in the format _YYYY_. The default value is _None_, which will return the
+product for all available years: 2018 & 2020.
+
+```{warning}
+While it is possible to load data for an entire SALDi site by providing the site 
+name (e.g. _"site06"_), please be aware that this will result in a large dataset 
+and will very likely result in performance issues if your workflow is not 
+optimized.
+
+It is therefore recommended to load only a subset by providing a vector file 
+defining an area of interest (e.g., using https://geojson.io/). Develop your 
+workflow on a small subset of the data before scaling up!
+```
diff --git a/docs/content/01/03_00_Resources.md b/docs/content/01/03_00_Resources.md
@@ -0,0 +1,74 @@
+# Additional Resources
+
+## Python and Jupyter Notebooks 
+
+If you want to get an introduction to [Python](https://www.python.org/) and/or 
+[Jupyter](https://jupyter.org/) Notebooks, I recommend the following resources from 
+Project Pythia:
+- [Quickstart: Zero to Python](https://foundations.projectpythia.org/foundations/quickstart.html)
+- [Getting Started with Jupyter](https://foundations.projectpythia.org/foundations/getting-started-jupyter.html)
+
+[Project Pythia Foundations](https://foundations.projectpythia.org/landing-page.html) 
+also provides tutorials on various core scientific Python packages, such as NumPy, 
+Matplotlib and Pandas, which you will likely encounter at some point.
+
+(xarray-dask-intro)=
+## Xarray, Dask and lazy loading
+
+The `load_product`-function returns an `xarray.Dataset` object, which is a 
+powerful data structure for working with multidimensional data. [Xarray](https://xarray.dev/) 
+is a Python library that _"[...] introduces labels in the form of dimensions, 
+coordinates and attributes on top of raw NumPy-like arrays, which allows for more 
+intuitive, more concise, and less error-prone user experience."_. 
+
+See the following resources for more information:
+- [Overview: Why Xarray?](https://docs.xarray.dev/en/latest/getting-started-guide/why-xarray.html)
+- [Tutorial: Xarray in 45 minutes](https://tutorial.xarray.dev/overview/xarray-in-45-min.html)
+- [Xarray Documentation](https://docs.xarray.dev/en/latest/index.html) (Very important resource! 😉)
+
+Xarray closely integrates with the [Dask](https://dask.org/) library, which is a 
+_"[...] flexible library for parallel computing in Python."_ and allows for 
+datasets to be loaded lazily, meaning that the data is not loaded into memory 
+until it is actually needed. This is especially useful when working with large 
+datasets that might not fit into the available memory. These large datasets are split 
+into smaller chunks that can then be efficiently processed in parallel. 
+
+Most of this is happening in the background, so you don't have to worry too much about 
+it. However, it is important to be aware of it, as it affects the way you need to 
+work with the data. For example, you need to be careful when applying certain 
+Xarray operations, such as calling [`.values`](https://docs.xarray.dev/en/latest/generated/xarray.DataArray.values.html#xarray.DataArray.values), 
+as they might trigger the entire dataset to be loaded into memory and can result in 
+performance issues if the data has not been [aggregated](https://docs.xarray.dev/en/latest/api.html#aggregation) 
+or [indexed](https://docs.xarray.dev/en/latest/user-guide/indexing.html) beforehand. 
+Furthermore, you might reach a point where you need to use advanced techniques 
+to optimize your workflow, such as re-orienting the chunks or [persisting](https://docs.dask.org/en/latest/best-practices.html#persist-when-you-can) 
+intermediate results in memory. For now, just keep all of this in mind and reach 
+out to me if you have any questions or need help with optimizing your workflow. 
+
+The following resources provide more information:
+- [User Guide: Using Dask with xarray](https://docs.xarray.dev/en/latest/user-guide/dask.html#using-dask-with-xarray)
+- [Tutorial: Parallel computing with Dask](https://tutorial.xarray.dev/intermediate/xarray_and_dask.html#parallel-computing-with-dask)
+
+## Digital Earth Africa
+
+### Tutorials
+
+The two main data products of the SDC, Sentinel-1 RTC and Sentinel-2 L2A, are direct 
+copies of the open and free "Analysis Ready Data" products provided by [Digital Earth Africa (DE Africa)](https://www.digitalearthafrica.org/).
+
+The team of DE Africa provides a lot of very helpful tutorials as Jupyter Notebooks. 
+Some of these tutorials cover more advanced and analysis-specific topics to address 
+real-world problems. While the loading of the data differs between these tutorials and 
+the SDC, most of the analysis techniques can be directly applied to the SDC data 
+products as well. It is therefore highly recommended to have a look at the tutorials in 
+the course of your work with the SDC data products: 
+- [DE Africa Real World Examples](https://docs.digitalearthafrica.org/en/latest/sandbox/notebooks/Real_world_examples/index.html)
+
+### `deafrica-tools` package
+
+Some of these tutorials are using a package called `deafrica-tools`, which includes 
+useful functions and utilities, e.g. for the calculation of [vegetation phenology statistics](https://docs.digitalearthafrica.org/en/latest/sandbox/notebooks/Real_world_examples/Phenology_optical.html). You can find the package on GitHub:
+- [Digital Earth Africa Tools Package](https://github.com/digitalearthafrica/deafrica-sandbox-notebooks/tree/main/Tools)
+
+If you want to use any functions of `deafrica-tools` and need assistance with the 
+installation or usage of the package, please let me know!
diff --git a/docs/content/01_Introduction/02_00_Overview.md b/docs/content/01_Introduction/02_00_Overview.md