Deltares · savente93 · Jul 18, 2023 · Jan 13, 2022 · Jan 13, 2022 · Apr 18, 2023
diff --git a/docs/api.rst b/docs/api.rst
@@ -786,9 +786,9 @@ PCRaster I/O
 
 .. _statistics:
 
-==========
-Statistics
-==========
+=====================================
+Statistics and Extreme Value Analysis
+=====================================
 
 Statistics and performance metrics
 ==================================
@@ -810,6 +810,25 @@ Statistics and performance metrics
    stats.skills.mse
    stats.skills.rmse
 
+Extreme Value Analysis
+=======================
+.. autosummary::
+   :toctree: _generated
+
+   stats.extremes.get_peaks
+   stats.extremes.fit_extremes
+   stats.extremes.get_return_value
+   stats.extremes.eva
+
+=============
+Design Events
+=============
+.. autosummary::
+   :toctree: _generated
+
+   stats.design_events.get_peak_hydrographs
+
+
 =========
 Utilities
 =========

diff --git a/docs/changelog.rst b/docs/changelog.rst
@@ -16,6 +16,8 @@ Added
 - New generic methods for ``GridModel``: ``setup_grid``, ``setup_grid_from_constant``, ``setup_grid_from_rasterdataset``, ``setup_grid_from_raster_reclass``, ``setup_grid_from_geodataframe``. PR #333
 - New ``grid`` workflow methods to support the setup methods in ``GridModel``: ``grid_from_constant``, ``grid_from_rasterdataset``, ``grid_from_raster_reclass``, ``grid_from_geodataframe``. PR #333
 - New raster method ``rasterize_geometry``.
+- New extreme valua analysis and design event (creation hydrographs) methods in stats submodule.
+  Note that these methods are experimental and may be moved elsewhere / change in signature. PR #85
 
 Changed
 -------

diff --git a/examples/data/discharge.nc b/examples/data/discharge.nc
diff --git a/examples/doing_extreme_value_analysis.ipynb b/examples/doing_extreme_value_analysis.ipynb
@@ -0,0 +1,268 @@
+{
+ "cells": [
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "87e8269e",
+   "metadata": {},
+   "source": [
+    "## Example: Doing Extreme Value Analysis (EVA) for time series "
+   ]
+  },
+  {
+   "attachments": {},
+   "cell_type": "markdown",
+   "id": "29a06345",
+   "metadata": {},
+   "source": [
+    "This example illustrates how to use to perform Extreme Value Analysis (EVA) based on time series stored in a netcdf, for example, this could be performed on a xarray Dataset. \n",
+    "Typical steps needed to perform EVA are:\n",
+    "- extract peaks (high extremes) from a continuous time series\n",
+    "- fit a EV distribution on these peaks \n",
+    "- obtain return values for given return periods based on the distribution fitted and its parameters\n",
+    "- plot the distribution and empirical data \n",
+    "\n",
+    "All these steps are also combined in the function `eva` (see also the end of the example)\n",
+    "\n",
+    "We cover these steps in this example using randomly generated data. More details on the functions used can be found in the API reference description of the [Statistics and Extreme Value Analysis](https://deltares.github.io/hydromt/latest/api.html#statistics)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe559e80",
+   "metadata": {},
+   "source": [
+    "Note that here are also other packages for extreme valua analysis. \n",
+    "\n",
+    "- [scipy.stats](https://docs.scipy.org/doc/scipy/tutorial/stats.html) Library of many statistical distributions including extreme value distributions.\n",
+    "- [pyextremes](https://georgebv.github.io/pyextremes/) Very rich package for univariate extreme value analysis with many different fitting methods, builds on scipy.stats.\n",
+    "\n",
+    "Compared to these packages The HydroMT methods add the following features, but is generally much less feature rich:\n",
+    "- support for performing extreme valua analysis on ND Xarray Datasets\n",
+    "- support for lmoments fitting"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "ee2e0013",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# import hydromt and functions needed for EVA\n",
+    "from hydromt.stats import extremes\n",
+    "import xarray as xr\n",
+    "import pandas as pd\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a75ad9b4",
+   "metadata": {},
+   "source": [
+    "We load a random discharge time seriesaas a xarray DataArray `da` for two stations (stations 1 and 2). "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dbd6017b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We create some random continuous time series with some extremes\n",
+    "da = xr.open_dataarray(r\"./data/discharge.nc\")\n",
+    "da"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6eac1dee",
+   "metadata": {},
+   "source": [
+    "Extreme value distribution fit is based either on block maxima (BM) or peaks-over-threshold (PoT). \n",
+    "\n",
+    "- If sampling of high extremes is done according to BM then, distributions to be fit are either Gumbel (later referred as `gumb`) or the Generalized Extreme Value (GEV) distribution (later referred as `gev`). Note that a GEV distribution with a shape paramater of 0 is equivalent to the Gumbel distribution.  \n",
+    "\n",
+    "- If sampling of high extremes is done according to PoT then, distributions to be fit are either exponential (later referred as `exp`) or the Generalized Pareto (GP) distribution (later referred as `gpd`). \n",
+    "\n",
+    "For this example, we will fit a GEV distribution based on annual maxima. "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "46f295ba",
+   "metadata": {},
+   "source": [
+    "### Step 1: Extracting peaks from continuous time series"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "65f73f48",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We use the get_peaks function\n",
+    "bm_peaks = extremes.get_peaks(da, ev_type=\"BM\", period=\"365.25D\")\n",
+    "\n",
+    "fig, ax = plt.subplots(figsize=(10, 3))\n",
+    "da.to_pandas().plot(\n",
+    "    ax=ax, xlabel=\"time\", ylabel=\"discharge [m3/s]\", color=[\"orange\", \"green\"]\n",
+    ")\n",
+    "bm_peaks.to_pandas().plot(\n",
+    "    ax=ax,\n",
+    "    marker=\"o\",\n",
+    "    linestyle=\"none\",\n",
+    "    legend=False,\n",
+    "    color=[\"darkorange\", \"darkgreen\"],\n",
+    "    markersize=4,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "bee648b3",
+   "metadata": {},
+   "source": [
+    "The function extracts peaks, here, annual maxima, and set the rest of the time series to NaN. Here, the `extremes_rate` is equal to 1 as we have sampled the highest value per year."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "20de0a80",
+   "metadata": {},
+   "source": [
+    "### Step 2: fit a EV distribution on these peaks "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0fdbefdb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "da_params = extremes.fit_extremes(bm_peaks, ev_type=\"BM\", distribution=\"gev\")\n",
+    "da_params.load()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9cf19ef8",
+   "metadata": {},
+   "source": [
+    "### Step 3: obtain return values for given return periods based on the distribution fitted and its parameters"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "76825d82",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We define the return periods for which we would like to know the return values\n",
+    "rps = np.array([2, 5, 25, 100, 500])\n",
+    "da_rps = extremes.get_return_value(da_params, rps=rps).load()\n",
+    "da_rps.to_pandas()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f0edc84e",
+   "metadata": {},
+   "source": [
+    "### Step 4: plot the distribution and empirical data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6ff014a0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# We plot the fit obtained using the function plot_return_values!\n",
+    "\n",
+    "fig, axes = plt.subplots(2, 1, figsize=(5, 7), sharex=True)\n",
+    "\n",
+    "for i, ax, color in zip(da.stations.values, axes, [\"orange\", \"green\"]):\n",
+    "    extremes.plot_return_values(\n",
+    "        bm_peaks.sel(stations=i),\n",
+    "        da_params.sel(stations=i),\n",
+    "        \"gev\",\n",
+    "        color=color,\n",
+    "        nsample=1000,  # number of samples for confidence interval\n",
+    "        rps=rps,\n",
+    "        extremes_rate=1.0,\n",
+    "        ax=ax,\n",
+    "    )\n",
+    "    ax.set_title(f\"Station {i}\")\n",
+    "    ax.set_ylabel(\"discharge [m3/s]\")\n",
+    "    if i == da.stations.values[-1]:\n",
+    "        ax.set_xlabel(\"return period [years]\")\n",
+    "    else:\n",
+    "        ax.set_xlabel(\"\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a862c05d",
+   "metadata": {},
+   "source": [
+    "## TL;DR\n",
+    "Note that the EV fit can be done within one function ([stats.extremes.eva]) that subsequently calls the functions used here ([stats.extremes.get_peaks], [stats.extremes.fit_extremes] and [stats.extremes.get_return_value])\n",
+    "\n",
+    "Steps 1 to 3 above could have been performed using [stats.extremes.eva] as follow. In this case, it will return an xarray.Dataset:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bcc9be10",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "da_bm_eva = extremes.eva(\n",
+    "    da, ev_type=\"BM\", period=\"365.25D\", distribution=\"gev\", rps=rps\n",
+    ").load()\n",
+    "da_bm_eva\n",
+    "\n",
+    "# Note that for this example:\n",
+    "# da_bm_eva['peaks'] is the same as bm_peaks\n",
+    "# da_bm_eva['parameters'] is the same as da_params\n",
+    "# da_bm_eva['return_values'] is the same as da_rps"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "hydromt-dev",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.4"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "8138b72b0e304ba55d14e4dcf8d650296065d1ee95f3f67a239b6fbf5f7328dd"
+   }
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/hydromt/stats/__init__.py b/hydromt/stats/__init__.py
@@ -1,28 +1,6 @@
 # -*- coding: utf-8 -*-
 """HydroMT statistics."""
 
-from .skills import (
-    bias,
-    kge,
-    kge_2012,
-    kge_non_parametric,
-    kge_non_parametric_flood,
-    pearson_correlation,
-    percentual_bias,
-    rmse,
-    rsquared,
-    spearman_rank_correlation,
-)
-
-__all__ = [
-    "bias",
-    "kge",
-    "kge_2012",
-    "kge_non_parametric",
-    "kge_non_parametric_flood",
-    "pearson_correlation",
-    "percentual_bias",
-    "rmse",
-    "rsquared",
-    "spearman_rank_correlation",
-]
+from .design_events import *
+from .extremes import *
+from .skills import *