Sometimes we can focus our attention on the x: the linear model is termed linear not because the regression curve is a plane, but because the effects of the parameters are linear.
Rather than working with the sample
we consider the transformed sample
$$(\tilde{x}{1}, y{1}),...,(\tilde{x}{n}, y{n} )$$
Practically this adds a new column to the design matrix but does not change the fact that the model is still linear
Furthermore, the interpretation does not change that much.
For example consider these linear models and the possible transformed samples:
-
$Y = \beta_{0} + \beta_{1}x^{2} + \varepsilon$ - where we can work with $\tilde{x}{i}$ = $x{i}^{2}$
-
$Y = \beta_{0} + \beta_{1}\log(x) + \varepsilon$ - where we can work with $\tilde{x}{i}$ = $\log(x{i})$
-
$Y = \beta_{0} + \beta_{1}(x^{3}-\log(|x|)+ 2^{x}) + \varepsilon$ - where we can work with $\tilde{x}{i}$ = $x{i}^{3}-\log(|x_{i}|)+ 2^{x_{i}}$
This means, we are keeping the model linear for what concerns the estimates of the beta parameters, while the x can be in any form (quadratic, polynomial etc...) because they concern the design matrix.
Sometimes these transformations can help with violation of model assumptions and other times they can be used to simply fit a more flexible model!
Let's use the car dataset to to model mpg
as a function of hp
We first attempt a SLR, but we see a rather obvious pattern in the fitted versus residuals plot, which includes increasing variance.
We attempt a log transform of the response (rough approximation)
After performing the log transform of the response, we still have some of the same issues with the fitted versus response. We try also log transforming the predictor.
Here, our fitted versus residuals plot looks good.
- If you apply a nonlinear transformation, namely
$f()$ , and fit the linear model$Y = \beta_{0} + \beta_{1}f(x) + \varepsilon$ , then there is no point in fit also the model resulting from the negative transformation$-f()$ .- The model with
$-f()$ is exactly the same as the one with$f()$ but with the sign of$\beta_{1}$ flipped!
- The model with
- As a rule of thumb, use the next figure with the transformations to compare it with the data pattern, then choose the most similar curve, and finally apply the corresponding function with positive sign.
A common ”transformation” of a predictor variable is the polynomial transformation Polynomials are very useful as they allow for more flexible models, but do not change the units of the variables.
Consider the non-linear transformation, namely
We make no global assumptions about the function
This is related to Taylor's theorem that says that any continuous function can be approximated with polynomial.
Taylor teorem Suppose
$f$ is a real function on$[a,b]$ ,$f^{K-1}$ is continuous on$[a,b]$ ,$f^{K}(x)$ is bounded for$x\in(a,b)$ then for any distinct points$x_{0}< x_{1}$ in$[a,b]$ there exists a point$\tilde{x}$ between$x_{0}< \tilde{x} < x_{1}$ such that
Notice: if we view
$$\mathcal{P}{K+1} = { f(x) = a{0} + a_{1}x + ... + a_{K}x^{K}, (a_{0},...,a_{K})^{\text{'}} \in \mathbb{R}^{K+1} }$$
Using polynomials to approximate the predictors can be useful when we are not in high dimensional space. The only problems comes with extrapolation, since the estimate becomes more variable.
We could fit a polynomial of an arbitrary order
and we can think of the polynomial model as the Taylor series expansion of the unknown function
Suppose you work for an automobile manufacturer which makes a large luxury sedan. You would like to know how the car performs from a fuel efficiency standpoint when it is driven at various speeds.
Instead of testing the car at every conceivable speed (which would be impossible) you create an experiment where the car is driven at speeds of interest in increments of 5 miles per hour (Response surface designs)
Our goal then, is to fit a model to this data in order to be able to predict fuel efficiency when driving at certain speeds
We see a pattern but it is no linear pattern! Let's say we are very stubborn and wish to fit a SLR to this data
econ <- read.csv("data/fuel-econ.csv")
fit1 <- lm(mpg ̃ mph, data = econ)
summary(fit1)
Call:
lm(formula = mpg ̃ mph, data = econ)
Residuals:
Min 1Q Median 3Q Max
-8.337 -4.895 -1.007 4.914 9.191
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 22.74637 2.49877 9.103 1.45e-09 ***
mph 0.03908 0.05312 0.736 0.469
...
Residual standard error: 5.666 on 26 degrees of freedom
Multiple R-squared: 0.02039, Adjusted R-squared: -0.01729
F-statistic: 0.5411 on 1 and 26 DF, p-value: 0.4686
Pretty clearly we can do better.
- Yes fuel efficiency does increase as speed increases, but only up to a certain point.
- We need to use include the quadratic trend of the data
We will now add polynomial terms until we fit a suitable fit
To add the second order term we need to use the I()
function in the model specification.
I()
is the identity function, which tells R “leave this alone”. It basically adds to the model matrix a new column with the transformation specified inside the brackets.
fit2 <- lm(mpg ̃ mph + I(mph ˆ 2), data = econ)
summary(fit2)
Call:
lm(formula = mpg ̃ mph + I(mphˆ2), data = econ)
Residuals:
Min 1Q Median 3Q Max
-2.8411 -0.9694 0.0017 1.0181 3.3900
Coefficients:
Estimate Std.Error t value Pr(>|t|)
(Intercept) 2.4444505 1.4241091 1.716 0.0984 .
mph 1.2716937 0.0757321 16.792 3.99e-15 ***
I(mphˆ2) -0.0145014 0.0008719 -16.633 4.97e-15 ***
---
...
Residual standard error: 1.663 on 25 degrees of freedom
Multiple R-squared: 0.9188, Adjusted R-squared: 0.9123
F-statistic: 141.5 on 2 and 25 DF, p-value: 2.338e-14
The model becomes very significant, the residuals are almost centered around the 0!
While this model clearly fits much better, and the second order term is significant, we still see a pattern in the fitted versus residuals plot which suggests higher order terms will help.
Also, we would expect the curve to flatten as speed increases or decreases, not go sharply downward as we see here. In this case if we extrapolate we will find that the curve keeps going down as it was centered around the quadratic transformation of the x.
Two degrees of freedom and and a quadratic polynomial are not enough!
- This is the case where there is a real (unknown) function dictated by the physics of the engines, but in our case we are just approximating it.
- We could use a polynomial of third degree, but it is not useful since we are focusing on polynomials of positive degree. In fact, we want to change how the curve reacts at the extremes, we do not want to change its direction.
Let's try a polynomial of fourth degree.
fit4 <- lm(formula = mpg ̃ mph + I(mphˆ2) + I(mphˆ3) + I(mphˆ4), data = econ)
# Omitting data on purpose
The fourth order term is significant with the other terms in the model.
Also we are starting to see what we expected for low and high speed.
However, there still seems to be a bit of a pattern in the residuals, so we will again try more higher order terms.
We will add the fifth and sixth together, since adding the fifth will be similar to adding the third.
fit6 <- lm(formula = mpg ̃ mph + I(mphˆ2) + I(mphˆ3) + I(mphˆ4) + I(mphˆ5) +
I(mphˆ6), data = econ)
# Omitting data on purpose
Again the sixth order term is significant with the other terms in the model and here we see less pattern in the residuals plot
Let’s now test for which of the previous two models we prefer. We will test:
ANOVA(fit4, fit6)
Analysis of Variance Table
Model 1: mpg ̃ mph + I(mphˆ2) + I(mphˆ3) + I(mphˆ4)
Model 2: mpg ̃ mph + I(mphˆ2) + I(mphˆ3) + I(mphˆ4) + I(mphˆ5) + I(mphˆ6)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 23 19.922
2 21 15.739 2 4.1828 2.7905 0.0842 .
This test does not reject the null hypothesis at a level of significance of
We could repeat this process one more time with fit8
(you know how to proceed by now)
ANOVA(fit6, fit8)
Analysis of Variance Table
Model 1: mpg ̃ mph + I(mphˆ2) + I(mphˆ3) + I(mphˆ4) + I(mphˆ5) + I(mphˆ6)
Model 2: mpg ̃ mph + I(mphˆ2) + I(mphˆ3) + I(mphˆ4) + I(mphˆ5) + I(mphˆ6) +
I(mphˆ7) + I(mphˆ8)
Res.Df RSS Df Sum of Sq F Pr(>F)
1 21 15.739
2 19 15.506 2 0.2324 0.1424 0.8682
The eighth order term is not significant with the other terms in the model and the F-test does not reject.
There is a quicker way to specify a model with many higher order terms. The method produces the same fitted values ...
fit6_alt <- lm(mpg ̃ poly(mph, 6), data = econ)
all.equal(fitted(fit6), fitted(fit6_alt))
[1] TRUE
... but the estimated coefficients are different because poly()
uses orthogonal polynomials.
These orthogonal polynomials are useful and implemented to avoid problems when using operations on matrices.
coef(fit6)
(Intercept) mph I(mphˆ2)
-4.206224e+00 4.203382e+00 -3.521452e-01
I(mphˆ3) I(mphˆ4)
1.579340e-02 -3.472665e-04
I(mphˆ5) I(mphˆ6)
3.585201e-06 -1.401995e-08
coef(fit6_alt)
(Intercept) poly(mph, 6)1 poly(mph, 6)2
24.40714286 4.16769628 -27.66685755
poly(mph, 6)3 poly(mph, 6)4
0.13446747 7.01671480
poly(mph, 6)5 poly(mph, 6)6
0.09288754 -2.04307796
To use poly()
to obtain the same results as using I()
repeatedly, we would need to set raw = TRUE
fit6_alt2 <- lm(mpg ̃ poly(mph, 6, raw = TRUE), data = econ)
coef(fit6_alt2)
(Intercept) poly(mph, 6, raw = TRUE)1 poly(mph, 6, raw = TRUE)2
-4.206224e+00 4.203382e+00 -3.521452e-01
poly(mph, 6, raw = TRUE)3 poly(mph, 6, raw = TRUE)4
1.579340e-02 -3.472665e-04
poly(mph, 6, raw = TRUE)5 poly(mph, 6, raw = TRUE)6
3.585201e-06 -1.401995e-08
Ta daaaan! This is still a linear relation in the polynomial sense of it.
Melting Arctic sea ice is monitored and used as an indicator for the impacts of climate change.
The summer ice in the Arctic Ocean reflects sunlight. As the ice melts, the much darker sea water absorbs sunlight. This feedback mechanism is understood as an important driver of climate change throughout geologic history.
Other effects: changes in ocean currents and atmospheric weather patterns as well as the possibility of releasing further greenhouse gases by accelerating the melting of Arctic permafrost on land and on the East Siberian Arctic Shelf.
September is the month when the ice stops melting each summer and reaches its minimum extent.
The data we will analyze is a time series of September Arctic sea ice extent from 1979 until 2012.
Let's model the relationship Extent-Time as:
Let's assume
Scientific question:
- ”Is the September Artic sea ice extent decreasing with time ?”
- The equivalent, mathematically speaking, is the question is
$\beta_{1} < 0$ ?
As always the model:
where
fit<-lm(extent ~ year);
plot(year,extent,ylab="1,000,000 km",
main="Evolution of sea ice extent",pch=20,cex.lab=1.05)
abline(fit,lwd=2,col="red")
Fit is not too-bad, except for a few points...
Let's check the residuals
The plot of residuals versus the time that reinforces this point. The residuals from the linear regression tend to be
- Negative in early and late years
- Positive in the middle years
Look back at the data: the decrease is larger in the later years
In fact we notice a depper slope as we proceed with time.
Intuitively we can pass to the supposition that
We then build simple model that allows for the slope to change at a constant rate:
A model with a quadratic term corresponds to a model in which the slope is allowed to change as a function of the predictor.
Now we are really facing a quadratic model, where we allow the angular coefficient to change!
In linear model the first derivative is constant, with quadratic terms the first derivative varies. Similar concepts apply to higher order polynomials.
Obviously, this is not an ultimate model to study the extent of the ice, but we are merely approximating it with the 30 observations we are given.