Each of the previous three metrics explicitly used p, the number of parameters, in their calculations. Thus, they all explicitly limit the size of models chosen when used to compare models. The calculations rely on an assumption of ”true” underlying model
Often times models are needed for out-of-sample prediction: we need to quantify the possible error.
LOOCV gives a measure of the possible out-of-sample prediction error
-
Fit any model to the dataset available without one observation
$i$ -
Predict
$\hat{y_{[i]}}$ :$y_{i}$ for the model in which$(x_{i}, y_{i})$ was not included in the model. - Now
$e_{[i]} = y_{i} -\hat{y_{[i]}}$ is the residual for the ith observation, when that observation is not used to fit the model:- Namely, evaluating how well the model estimates it
- Repeat the fitting of the model n times, once with each possible observation removed.
We can define the LOOCV RMSE to be:
However, for leave-one-out cross-validation and linear models, the equation can be rewritten as:
Where:
-
$h_{ii}$ are the leverages, the diagonal elements of the projection matrix:$H = X(X^{T}X)^{-1}X^{T}$ - They tell us something about the impact of each single value on the general value of the estimate. We will see how to quantify it.
- The Hat/Projection Matrix can help us going from the observed y to the
$\hat{y}$ , we project the values on a smaller dimensional space.
-
$e_{i}$ are the residuals
This way let us obtain the
In practice 5 or 10 fold cross-validation are much more popular. For example, in 5-fold cross-validation, the model is fit 5 times, each time leaving out a fifth of the data, then predicting on those values.
calc_loocv_rmse = function(model) {
sqrt(mean((resid(model) / (1 - hatvalues(model))) ˆ 2))
}
# no variable
calc_loocv_rmse( lm(target ~ 1, data = dataset))
# some variables
calc_loocv_rmse( lm(target ~ x1 + x2, data = dataset))
# model with all variables
calc_loocv_rmse( lm(target ~ x1 + x2 + x3 + x4, data = dataset))
The model selected by stepwise selection also has the lowest