Skip to content
This repository has been archived by the owner on Mar 2, 2024. It is now read-only.

Latest commit

 

History

History
47 lines (34 loc) · 2.37 KB

4.2 - VBS.md

File metadata and controls

47 lines (34 loc) · 2.37 KB

Validation Based Selection

Each of the previous three metrics explicitly used p, the number of parameters, in their calculations. Thus, they all explicitly limit the size of models chosen when used to compare models. The calculations rely on an assumption of ”true” underlying model

Often times models are needed for out-of-sample prediction: we need to quantify the possible error.

Leave-One-Out-Cross-Validation LOOCV

LOOCV gives a measure of the possible out-of-sample prediction error

  1. Fit any model to the dataset available without one observation $i$
  2. Predict $\hat{y_{[i]}}$: $y_{i}$ for the model in which $(x_{i}, y_{i})$ was not included in the model.
  3. Now $e_{[i]} = y_{i} -\hat{y_{[i]}}$ is the residual for the ith observation, when that observation is not used to fit the model:
    1. Namely, evaluating how well the model estimates it
  4. Repeat the fitting of the model n times, once with each possible observation removed.

We can define the LOOCV RMSE to be:

$$RMSE_{LOOCV} = \sqrt{\frac{1}{n}\sum_{i=1}^{n} e_{[i]}^{2}}$$

However, for leave-one-out cross-validation and linear models, the equation can be rewritten as:

$$RMSE_{LOOCV} = \sqrt{\frac{1}{n}\sum_{i=1}^{n} (\frac{e_{[i]}}{1-h_{ii}})^{2}}$$

Where:

  • $h_{ii}$ are the leverages, the diagonal elements of the projection matrix: $H = X(X^{T}X)^{-1}X^{T}$
    • They tell us something about the impact of each single value on the general value of the estimate. We will see how to quantify it.
    • The Hat/Projection Matrix can help us going from the observed y to the $\hat{y}$, we project the values on a smaller dimensional space.
  • $e_{i}$ are the residuals

This way let us obtain the $RMSE_{LOOCV}$ by fitting only one model and avoid expensive computations!

In practice 5 or 10 fold cross-validation are much more popular. For example, in 5-fold cross-validation, the model is fit 5 times, each time leaving out a fifth of the data, then predicting on those values.

calc_loocv_rmse = function(model) {
sqrt(mean((resid(model) / (1 - hatvalues(model))) ˆ 2))
}

# no variable 
calc_loocv_rmse( lm(target ~ 1, data = dataset))
# some variables
calc_loocv_rmse( lm(target ~ x1 + x2, data = dataset))
# model with all variables
calc_loocv_rmse( lm(target ~ x1 + x2 + x3 + x4, data = dataset)) 

The model selected by stepwise selection also has the lowest $RMSE_{LOOCV}$ (this is not always the case).