Spe_gpp.Rmd

---
title: "NL_Spe GPP"

author:
  - Egor Prikaziuk ^[e.prikaziuk@utwente.nl]
  - Christiaan van der Tol

output:
  html_notebook:
    theme: united
    toc: yes
  html_document:
    df_print: paged
    toc: yes
    
abstract: | 
  This code was created for educational puproses: it shows how the eddy covariance data should be prepared for flux partitioning (separation of the net ecosystem exchage (NEE) to gross primary productivity (GPP) and ecosystem respiration (Reco)) and the flux partitioning itself with the [REddyProc R package](https://www.bgc-jena.mpg.de/bgi/index.php/Services/REddyProcWebRPackage) [(Wutzler et al., 2018)](https://doi.org/10.5194/bg-15-5015-2018).
  
  .

params:
  fluxes_path: 'W:\\Siteswrs\\Speuld\\data\\2_processed\\fluxes'
  flux_computed: 'Spe_2019_2020.rds'
  rad_path: 'W:\\Siteswrs\\Speuld\\data\\2_processed\\2020_radiation45.csv'
  rad_computed: 'Spe_2019_2020_rad_hh.rds'
  year: 2020
---
<center>
![](itc_ut.png){#id .class width=50% height=50%}
</center>

```{r setup, include = FALSE}

# install.packages('knitr')
# install.packages('REddyProc')
# install.packages('tidyverse')
# install.packages('tsibble')
# install.packages('lubridate')

required_packages <- c('knitr', 'REddyProc', 'tidyverse', 'tsibble', 'lubridate')
to_install <- required_packages[!(required_packages %in% installed.packages()[,"Package"])]
if(length(to_install)) install.packages(to_install)


knitr::opts_chunk$set(echo=TRUE, include = TRUE, warning=FALSE, message=FALSE)
library(REddyProc) 
library(tidyverse)
library(tsibble)
library(lubridate)
```


# Fluxes

## '_full_output_'

This function takes ~3 minutes to run.
It combines all .csv files that have ``_full_output_`` in them, drops duplicated lines and returns a data_frame.

```{r read_full_output, eval=FALSE}

df_full <- c()
for (f in list.files(params$fluxes_path)){
    if (grepl('_full_output_', f)) {
        print(f)
        tmp <- read_csv(file.path(params$fluxes_path, f), skip=1) %>%
            slice(-1) %>%
            filter(filename != 'not_enough_data') %>%
            mutate(
                dtime = paste(date, time)  # very bad as relies on type_convert() but as.POSIXct() refuesed
            ) %>%
            select(dtime, everything()) %>%
            type_convert() %>%
            mutate_if(is.numeric, ~ifelse(.==-9999, NA, .))  # na=c('-9999') did not work
        df_full <- bind_rows(df_full, tmp)
        
    }
}

df <- distinct(df_full)

sprintf('%d duplicated lines were dropped', dim(df_full)[1] - dim(df)[1])
# saveRDS(df, 'Spe_2020.rds')
```

### alternative to save time

We joined ``_full_output_`` files for 2019 and 2020 for you ``bind_rows(df2019, df2020)`` and saved the joined dataframe in ``params$flux_computed``: `r params$flux_computed`

```{r read_full_output_local}

# bind_rows(df2019, df2020) %>%
#   distinct() %>%
#   saveRDS(params$flux_computed)

df <- read_rds(params$flux_computed)

## csv alternative
# write_csv(df, sub('.rds', '.csv', params$flux_computed))

# df <- read_csv(sub('.rds', '.csv', params$flux_computed))

```


## making tsibble

``tsibble`` is a dataframe designed for time series, it knows how to:

* fill missing timestamps ``fill_gaps()``
    - very topical for us as we have to make evenly spaced dataframe
    - fills all columns besides timestamp to NA
* aggregated on various timestamps (half-hourly, daily) ``index_by() %>% summarize()``

Here the code is not neat to provide the reporting.

```{r tsibble_from_flux_verbose}

# tsibble does not like repeated dates
dup_i <- duplicated(df$dtime)
print(sprintf('%d still duplicated and will be dropped', sum(dup_i)))
print(df$dtime[dup_i])

df_ts <- df[!dup_i,] %>%
    as_tsibble(index=dtime)

nrow_before = nrow(df_ts)
df_ts <- fill_gaps(df_ts)

filled <- nrow(df_ts) - nrow_before
print(sprintf('%d HH steps (~%f DD) were filled with NaNs', filled, filled/48))
```

Non-reporting alternative of the code above:

```{r tsibble_from_flux, eval=FALSE}
df_ts <- df %>%
  filter(!duplicated(dtime)) %>%
  as_tsibble(index = dtime) %>%  # add regular=FALSE if fails
  fill_gaps()
```



### DD

Demonstration of aggregation to daily timestamp

```{r dd}

df_dd <- df_ts %>%
    index_by(day = date(lubridate::floor_date(dtime, "day"))) %>%  # round() and ceil() also exist
    summarise_all(mean, na.rm=TRUE)

df_dd %>%
    ggplot(aes(x=day, y=co2_flux)) + 
        geom_point(aes(color=as.factor(round(qc_co2_flux))))
```

# Radiation

Takes very long, because it is 150MB through the net

```{r read_rad, eval=FALSE}

rad <- read_csv(params$rad_path, col_names = c('ts', 'Rsi', 'Rso', 'Rli', 'Rlo')) %>%
    mutate(
        datetime = as.POSIXct((ts - 1) * (60*60*24), origin=paste0(params$year, '-01-01'), tz='UTC')
    ) %>%
    distinct()

# size = nrow(rad)
# rad = distinct(rad)
# print(sprintf('dopped %d ts', size - nrow(rad)))

```


## Making HH

This does not take much time to compute but since the original radiation files are big this phase will also be skipped.

```{r hh, eval=FALSE}
rad_hh <- rad %>%
    as_tsibble(regular=FALSE) %>%
    index_by(dtime = lubridate::round_date(datetime, "30 mins")) %>%  # or floor()?
    summarise_all(mean, na.rm=TRUE)

# saveRDS(rad_hh, 'Spe2020_rad_hh.csv')

```

### alternative to save time

Again I will use the precomputed version

```{r read_rad_local}

# saveRDS(rad_hh, params$rad_computed)

rad_hh <- read_rds(params$rad_computed)

## csv alternative
# write_csv(rad_hh, sub('.rds', '.csv', params$rad_computed))

# rad_hh <- read_csv(sub('.rds', '.csv', params$rad_computed)) %>%
#     as_tsibble()
```


#### half hourly (HH) radiation plot

```{r plot_rad}
rad_hh %>%
    select(-ts) %>%
    gather() %>%
    ggplot(aes(x=dtime, y=value)) + 
        geom_point() + 
        facet_wrap(.~key, scales = 'free') + 
        theme_bw() + 
        labs(
            title='Spe HH radiation 45m'
        ) + 
        theme(text=element_text(size=20))
```

# REddyProc

The code below is the modified version of the official vignette https://cran.r-project.org/web/packages/REddyProc/vignettes/useCase.html


Details are available at https://www.bgc-jena.mpg.de/bgi/index.php/Services/REddyProcWeb.

## Joining for REddyProc

```{r joining}
df_joined <- df_ts %>% 
    distinct() %>%
    full_join(rad_hh, by=c('dtime')) %>%
    as_tsibble(index='dtime') %>%
    fill_gaps()
```


## Praparing input for REddyProc

It does not matter how you prepare it but the dataframe ``df_joined`` must have:

1. First row - 00:30
2. Last row - 0:00
3. Monotonous evenly spaced no gaps in DateTime

Invalid values are reported by warnings from REddyProc:
1. NEE < -50 and NEE > 100
2. Rg < 0
3. VPD > 50

```{r prepare_input}
df_e <- df_joined %>%
    select(DateTime=dtime, NEE=co2_flux, VPD, Tair=air_temperature, Ustar=`u*`, Rg=Rsi) %>%
    filter(DateTime > as.Date(paste0('2019-01-01 00:00:30'), tz='UTC'),
           DateTime <= as.Date(paste0(Sys.Date() - 1, ' 00:00:00'), tz='UTC')
           ) %>%
    mutate(
        Tair = Tair - 273.15,  # K -> C
        VPD = VPD / 100,       # Pa -> hPa (mbar)
        NEE = ifelse(NEE < -50, NA, NEE),
        NEE = ifelse(NEE > 100, NA, NEE),
        Rg = ifelse(Rg < 0, 0, Rg)
    ) %>%
    filterLongRuns("NEE")
```

## Initializing class

```{r class, warning=TRUE}
Spe <- sEddyProc$new('Spe', as.data.frame(df_e), c('NEE','Tair', 'Rg', 'VPD', 'Ustar'))
```

This class has interesting methods, for example plotting

### NEE initial state (with gaps)

```{r plot_nee, fig.width = 10, fig.height = 7}
par(mfrow=1:2)
Spe$sPlotFingerprintY('NEE', Year = params$year)
Spe$sPlotFingerprintY('NEE', Year = params$year, onlyLegend = TRUE)
```


## Working

All the rest is done by REddyProc in 5-10 minutes.

To see more details run the code in the console (copy-paste), rather than in the chunk.

### U* distribution

```{r u_star_sampling}
Spe$sEstimateUstarScenarios(nSample = 100L, probs = c(0.05, 0.5, 0.95))  # gets seasons to results
# Spe$sGetEstimatedUstarThresholdDistribution()
# Spe$sGetUstarScenarios()
```

### Gap filling

This is the longest period

```{r gap_filling}
Spe$sMDSGapFillUStarScens('NEE')  # adds NEE, Ustar

Spe$sSetLocationInfo(LatDeg = 52.251185, LongDeg = 5.690051, TimeZoneHour = 0)  # 45 cols 
Spe$sMDSGapFill('Tair', FillAll = FALSE,  minNWarnRunLength = NA)     # 54 cols
Spe$sMDSGapFill('VPD', FillAll = FALSE,  minNWarnRunLength = NA)  # 63 col
```
#### NEE gap filled plot

```{r plot_nee_f, fig.width = 10, fig.height = 7}
par(mfrow=1:2)
Spe$sPlotFingerprintY('NEE_U50_f', Year = params$year)
Spe$sPlotFingerprintY('NEE_U50_f', Year = params$year, onlyLegend = TRUE)
```

### Partitioning of NEE to GPP and Reco

```{r partitioning}
Spe$sMRFluxPartitionUStarScens()  # 95 col
```

#### GPP plot

```{r plot_gpp_f, fig.width = 10, fig.height = 7}
par(mfrow=1:2)
Spe$sPlotFingerprintY('GPP_U50_f', Year = params$year)
Spe$sPlotFingerprintY('GPP_U50_f', Year = params$year, onlyLegend = TRUE)
```

## Results

```{r saving results}

## saving as a class with all methods
# saveRDS(Spe, 'Spe_2019_2020_REddyProc_class.rds')

## loading as a class
# Spe <- read_rds('Spe_2019_2020_REddyProc_class.rds')

df_res <- bind_cols(df_e, Spe$sExportResults())  # bind_cols(Spe$sDATA, Spe$sExportResults())

# df_res %>%
#   write_csv('Spe_2019_2020_results.csv')

# df_res <- read_csv('Spe_2019_2020_results.csv') %>%
#     as_tsibble()

```


### Daily GPP

```{r plot_dd_gpp}
ma_size = 8

df_res %>%
    index_by(day = date(lubridate::floor_date(DateTime, "day"))) %>%
    summarize(
        NEE = mean(NEE, na.rm=TRUE),
        GPP_U50_f = mean(GPP_U50_f, na.rm=TRUE)
    ) %>%
    mutate(
      GPP_ma = slide_dbl(GPP_U50_f, mean, .size = ma_size),
      GPP_ma = lead(GPP_ma, ma_size/2)  # align center did not work
    ) %>%
    ggplot(aes(x=day)) + 
        # geom_line(aes(y=-NEE, color='-NEE')) +
        geom_line(aes(y=GPP_U50_f), size=0.5) + 
        geom_point(aes(y=GPP_U50_f)) + 
        geom_line(aes(y=GPP_ma, color=sprintf('GPP_ma%dd', ma_size)), size=1.5) +
        guides(color=guide_legend(title='color')) + 
        scale_x_date(minor_breaks = "1 month") +
        labs(
            title = 'NL-Spe DD GPP',
            x = 'date'
        ) + 
        theme_bw() + 
        theme(text = element_text(size=20))

```