-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
add scallop mod example vignette back into package (#66)
- Loading branch information
1 parent
c6b992f
commit fd63967
Showing
1 changed file
with
319 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,319 @@ | ||
--- | ||
title: "Scallop Conditional Logit Model Example" | ||
author: "Bryce McManus" | ||
date: "`r Sys.Date()`" | ||
output: | ||
rmarkdown::html_vignette: | ||
fig_width: 6 | ||
fig_height: 4 | ||
vignette: > | ||
%\VignetteIndexEntry{Scallop Conditional Logit Model Example} | ||
%\VignetteEngine{knitr::rmarkdown} | ||
%\VignetteEncoding{UTF-8} | ||
editor_options: | ||
markdown: | ||
wrap: 72 | ||
--- | ||
|
||
```{r setup, include=FALSE} | ||
knitr::opts_chunk$set(echo = TRUE) | ||
``` | ||
|
||
## Introduction | ||
|
||
This is an example of a conditional logit model using the `scallop` data | ||
from the `FishSET` package. | ||
|
||
### Packages | ||
|
||
```{r} | ||
library(FishSET) | ||
``` | ||
|
||
```{r eval=TRUE, echo=FALSE} | ||
# this chunk is for vignette version only | ||
folderpath <- tempdir() | ||
``` | ||
|
||
|
||
<br> | ||
<br> | ||
|
||
### Load Data | ||
|
||
This analysis uses three of `FishSET`'s example datasets: the `scallop` | ||
dataframe which contains anonymized scallop data from the Northeast, | ||
`scallop_ports` which is a table of ports and their location, and | ||
`tenMNSQR` which is a spatial dataframe of Northeastern ten minute | ||
squares. | ||
|
||
The `scallop` dataframe contains a random sample of 10,000 trips | ||
from vessels in the Limited Access Days-at-Sea fleet when they | ||
are declared into either an Access Area or Open area Days-at-Sea | ||
fishing trip. | ||
|
||
Noise was added to fishing locations, landing quantities, and the | ||
value of catch. Permit, Operator, and trip identifiers were also anonymized. | ||
|
||
|
||
|
||
```{r} | ||
data("scallop") | ||
data('scallop_ports') | ||
data("tenMNSQR") | ||
scallop$LANDED_thousands<-scallop$LANDED_OBSCURED/1000 | ||
scallop$DOLLAR_2020_thousands<-scallop$DOLLAR_2020_OBSCURED/1000 | ||
vars_to_keep <- c('TRIPID', 'PERMIT.y', 'DATE_TRIP', 'DDLON', 'DDLAT', 'ZoneID', | ||
'LANDED_thousands', 'DOLLAR_2020_thousands', 'port_lon', 'port_lat', 'previous_namelsad', | ||
'previous_port_lon', 'previous_port_lat') | ||
scallop <- scallop[vars_to_keep] | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
Load each dataset into FishSET. Rescale the Landed weight to thousands of meat | ||
pounds and the Dollar value to thousands of Real dollars. Note: when running | ||
this chunk you may get a pop-up asking to identify the location for a new | ||
FishSET folder or an existing folder. | ||
|
||
```{r} | ||
load_maindata(scallop, project = "scallopMod", over_write = TRUE) | ||
load_port(dat = scallop_ports, port_name = "port_name", project = "scallopMod") | ||
load_spatial(spat = tenMNSQR, name = "TenMnSqr", project = "scallopMod") | ||
scallopModTenMnSqrSpatTable <- table_view('scallopModTenMnSqrSpatTable', "scallopMod") | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
### QAQC | ||
|
||
`summary_stats()` is a useful way to find `NA`s in the data. | ||
|
||
```{r} | ||
summary_stats(scallopModMainDataTable, project = "scallopMod") | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
Remove `NA`s by calling `na_filter()`. In this case, three variables | ||
which are important for modeling are being filtered: `ZoneID`, | ||
`previous_port_lon`, and `previous_port_lat`. | ||
|
||
```{r} | ||
scallopModMainDataTable <- | ||
na_filter(scallopModMainDataTable, | ||
project = "scallopMod", | ||
x = c("ZoneID", "previous_port_lon", "previous_port_lat"), | ||
remove = TRUE) | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
Plot the number of observations by zone. | ||
|
||
```{r} | ||
zone_summary(dat = scallopModMainDataTable, | ||
spat = scallopModTenMnSqrSpatTable, | ||
project = "scallopMod", | ||
zone.dat = "ZoneID", | ||
zone.spat = "TEN_ID", | ||
output = "tab_plot") | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
Check for sparsity. | ||
|
||
```{r} | ||
sparsetable(scallopModMainDataTable, 'scallopMod', | ||
timevar = 'DATE_TRIP', | ||
zonevar = 'ZoneID', | ||
var = 'LANDED_thousands') | ||
sparsplot('scallopMod') | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
### Create Centroids | ||
|
||
A centroid table is needed to create the distance matrix. It can be used | ||
as the choice occasion or the alternative choice. The simplest way to | ||
create a zonal centroid table is by passing it to `create_centroid()`, | ||
which saves the centroids to the FishSET database. | ||
|
||
```{r} | ||
create_centroid(spat = scallopModTenMnSqrSpatTable, | ||
project = "scallopMod", | ||
spatID = "TEN_ID", | ||
type = "zonal centroid", | ||
output = "centroid table") | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
### Alternative Choice | ||
|
||
For this example, the alternative choice list will use the longitude and | ||
latitude of the disembarking port (`previous_port_lon` and | ||
`previous_port_lat`) as the choice occasion and the zonal centroid of | ||
the fishing areas as the alternative. The minimum haul haul requirement | ||
is set to 90. | ||
|
||
```{r} | ||
create_alternative_choice(dat = scallopModMainDataTable, | ||
project = "scallopMod", | ||
occasion = "lon-lat", | ||
occasion_var = c("previous_port_lon", "previous_port_lat"), | ||
alt_var = "zonal centroid", | ||
zoneID = "ZoneID", | ||
zone.cent.name = "scallopModZoneCentroid", | ||
min.haul = 90 | ||
) | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
The plot below visualizes zone frequency after accounting for the | ||
minimum haul requirement from the alternative choice list. | ||
|
||
```{r} | ||
z_ind <- which(alt_choice_list('scallopMod')$dataZoneTrue == 1) | ||
zOut <- | ||
zone_summary(scallopModMainDataTable[z_ind, ], | ||
spat = scallopModTenMnSqrSpatTable, | ||
project = "scallopMod", | ||
zone.dat = "ZoneID", | ||
zone.spat = "TEN_ID", | ||
output = "tab_plot") | ||
pretty_tab(zOut$tab) | ||
zOut$plot | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
### Expected Catch | ||
|
||
This code chunk creates two different expected catch matrices: one using | ||
a window of seven days, lag of one and a window of 14 days, lag of two. They | ||
will be named `user1` and `user2` respectively. | ||
|
||
```{r} | ||
# user1 expected catch matrix | ||
create_expectations(dat = scallopModMainDataTable, | ||
project = "scallopMod", | ||
catch = "LANDED_thousands", | ||
temp.var = "DATE_TRIP", | ||
temp.window = 7, | ||
temp.lag = 1, | ||
year.lag = 0, | ||
temporal = 'daily', | ||
empty.catch = NA, | ||
empty.expectation = 1e-04, | ||
default.exp = FALSE, | ||
replace.output = TRUE) | ||
# user2 expected catch matrix | ||
create_expectations(dat = scallopModMainDataTable, | ||
project = "scallopMod", | ||
catch = "LANDED_thousands", | ||
temp.var = "DATE_TRIP", | ||
temporal = "daily", | ||
temp.window = 14, | ||
temp.lag = 2, | ||
empty.catch = NA, | ||
empty.expectation = 1e-04, | ||
default.exp = FALSE, | ||
replace.output = FALSE) | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
The data must be checked for common data quality issues before it can be | ||
used in the modeling functions (i.e. `make_model_design()` and | ||
`discretefish_subroutine()`). `check_model_data()` saves a new version | ||
of the primary data with the suffix `_final` added to indicate that the | ||
table is in its "final" state and ready to be used for modeling. | ||
|
||
```{r} | ||
check_model_data(scallopModMainDataTable, | ||
project = "scallopMod", | ||
uniqueID = "TRIPID", | ||
latlon = c("DDLON","DDLAT")) | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
### Model Design | ||
|
||
The model design file below will run two conditional logit models, each | ||
using one of the expected catch matrices created earlier (this is | ||
specified by using `'individual'` in the `expectcatchmodels` argument). | ||
|
||
```{r} | ||
make_model_design(project = "scallopMod", | ||
catchID = "LANDED_thousands", | ||
likelihood = "logit_c", | ||
initparams = c(0, 0), | ||
vars1 = NULL, | ||
vars2 = NULL, | ||
mod.name = 'lz', | ||
expectcatchmodels = list('individual') | ||
) | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
### Run Models | ||
|
||
Use `discretefish_subroutine()` to run all models in the model design | ||
file. | ||
|
||
```{r} | ||
discretefish_subroutine(project = "scallopMod", explorestarts = FALSE) | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
Use `model_params()` to see the model output. `user1` and `user2` are the | ||
expected catch parameters and `V1` the travel distance parameter. A reasonably | ||
specified model should find positive coefficients for `user1` and `user2` and | ||
negative coefficiencts for `V1`. | ||
|
||
```{r} | ||
model_params("scallopMod", output = 'print') | ||
``` | ||
|
||
<br> | ||
<br> | ||
|
||
Compare model fit. | ||
|
||
```{r} | ||
model_fit_summary("scallopMod") | ||
``` | ||
|
||
```{r eval=TRUE, echo=FALSE} | ||
# for vignette version only | ||
unlink(folderpath) | ||
``` |