-
Notifications
You must be signed in to change notification settings - Fork 3
/
08-preparea_augmented_data.Rmd
135 lines (114 loc) · 5.81 KB
/
08-preparea_augmented_data.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
# Prepare the vis-NIR augmented dataset
------------------------------------------------------------------------
Here we'll create the data that will be used for Spatial modelling. This dataset will contain
* Nr: The arbitrary sample number.
* ID: The `factor` indicating the sample IDs.
* POINT_X: The X (geographical) coordinate.
* POINT_Y: The Y (geographical) coordinate.
* layer: A `factor` indicating the depth layer at which the sample was collected (A: 0-20 cm and B: 80-100 cm).
* set: A `factor` indicating whether the sample was used for vis-NIR calibrations (`train`), for vis-NIR predictions (`prediction`) or if it belongs to model's validation (`validation`). The samples labeled as validation are the same samples initially labeled as validation in the original dataset.
* Ca: The exchangeable Calcium content in the sample ($mmol_{c}$ $kg^{−1}$, measured by conventional laboratory methods)
* Clay: The percentage of clay contnet in the soil sample (measured by conventional laboratory methods).
* Silt: The percentage of silt contnet in the soil sample (measured by conventional laboratory methods).
* Sand: The percentage of sand contnet in the soil sample (measured by conventional laboratory methods).
* alr_Clay: The additive log-ratio transformed clay contnets (measured by conventional laboratory methods).
* alr_Silt: The additive log-ratio transformed silt contnets (measured by conventional laboratory methods).
* Ca_spec: This is the vis-NIR augmented exchangeable Ca^2+^ contents.
* alr_Clay_spec: This is the vis-NIR augmented additive log-ratio transformed clay contnets.
* alr_Silt_spec: This is the vis-NIR augmented additive log-ratio transformed silt contnets.
For the vis-NIR augmented variables (alr_Clay_spc, alr_Silt_spc and Ca_spec) there are three classes of values:
* The values of the samples that are labeled as `train` come from the conventional laboratory methods (e.g. for Ca_spec the values of these samples for this variable are identical to the corresponding values in the variable Ca).
* The values of the samples that are labeled as `prediction` come from the predictions done with the respective vis-NIR model.
* The values of the samples that are labeled as `validation` are treated as missing (i.e. `NA`s).
```{r, eval = FALSE}
## samples for the set "prediction"
vnirpredictions
## samples for the set "train"
vnirtrain <- train[ ,c("ID","POINT_X","POINT_Y","set", "Ca", "Clay", "Silt", "Sand", "alr_Clay","alr_Silt")]
vnirtrain$set <- factor("train")
vnirtrain$Ca_spec <- vnirtrain$Ca
vnirtrain$alr_Clay_spec <- vnirtrain$alr_Clay
vnirtrain$alr_Silt_spec <- vnirtrain$alr_Silt
## samples for the set "validation"
vnirvalidation <- valida[ ,c("ID","POINT_X","POINT_Y","set", "Ca","Clay", "Silt", "Sand", "alr_Clay","alr_Silt")]
vnirvalidation$set <- factor(vnirvalidation$set)
vnirvalidation$Ca_spec <- NA
vnirvalidation$alr_Clay_spec <- NA
vnirvalidation$alr_Silt_spec <- NA
```
Now create a single `data.frame` containing the three data sets...
```{r, eval = FALSE}
vniraugmented <- rbind(vnirtrain,
vnirpredictions,
vnirvalidation)
vniraugmented$layer <- factor(substr(vniraugmented$ID, 1, 1))
## Reorganize the variables
vniraugmented <- vniraugmented[,c("ID",
"POINT_X",
"POINT_Y",
"layer",
"set",
"Ca",
"Clay",
"Silt",
"Sand",
"alr_Clay",
"alr_Silt",
"Ca_spec",
"alr_Clay_spec",
"alr_Silt_spec")]
```
Compute some statistics for the final data set...
```{r, eval = FALSE}
## Names of the properties
props <- c("Ca",
"Clay",
"Silt",
"Sand",
"alr_Clay",
"alr_Silt",
"Ca_spec",
"alr_Clay_spec",
"alr_Silt_spec")
## Compute the statistics: mean, standard deviation and
## the quantiles ("0%", "25%", "50%", "75%" and"100%")
statsprops <- aggregate(vniraugmented[, props],
by = list(set = vniraugmented$set,
layer = vniraugmented$layer),
FUN = function(x){
c(mean = mean(as.matrix(x), na.rm = TRUE),
sd = sd(as.matrix(x), na.rm = TRUE),
quantile(x, na.rm = TRUE))
})
## Reorganize the object containing the results of the statistics
statsprops <- lapply(props,
FUN = function(x, object, ids){
object <- cbind(object[,keep],
as.data.frame(statsquant[[x]]))
},
object = statsprops,
ids = c("set", "layer"))
names(statsprops) <- props
statsprops <- do.call("rbind", statsprops)
statsprops$property <- gsub(".[0-9]", "", rownames(statsprops))
statsprops[is.na(statsprops)] <- NA
## Reorganize the order of the variables
statsprops <- statsprops[, c("set",
"layer",
"property",
"mean",
"sd",
"0%",
"25%",
"50%",
"75%",
"100%")]
statsprops
```
Optionally, save this data in your working directory
```{r, eval = FALSE}
write.table(x = vniraugmented,
file = "vniraugmented.txt",
sep = "\t",
row.names = FALSE)
```