-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathlinear_regression.Rmd
80 lines (62 loc) · 1.97 KB
/
linear_regression.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
title: "Linear Regression"
author: "Ghinwa Moujaes"
date: "22/03/2022"
output: html_document
---
```{r}
rm(list = ls())
library(dplyr)
library(tidyverse)
library(texreg)
df <- read_csv("../data/Titanic_Data/titanic.csv")
```
```{r}
df$male[df$Sex=="male"] <- 1
df$male[df$Sex=="female"] <- 0
```
1. Estimate a linear model for Fare using Age as continuous predictor
```{r}
model_1 <- lm(Fare ~ Age, data=df)
summary(model_1)
```
Estimate a linear model only including the Sex as categorical predictor/dummy variable (Model 2)
```{r}
model_v2 <- lm(Fare ~ male, data = df)
summary(model_v2)
```
Estimate a linear model including both predictors simultaneously (Model 3)
```{r}
model_v3 <- lm(Fare ~ male + Age, data = df)
summary(model_v3)
```
What do the slope coefficients mean?
1. On average, male fares were 20 units cheaper then female fares
2. On average, one additional year in a passenger's age is correlated with a higher fare of 0.45 units
Compare slope coefficients across the models: Do they change? How?
- In the full model, both coefficients slightly change in value (age increases, male decreases) but their significance stays the same
Estimate a linear regression regressing the continuous predictor on the categorical predictor (dummy variable). (Model 4)
```{r}
model_v4 <- lm(Age ~ male, data = df)
summary(model_v4)
```
Calculate and save the residuals. Use these residuals as a predictor of the dependent variable Fare
```{r}
df$residual_age_male <- resid(model_v4)
model_v5 <- lm(Fare ~ residual_age_male, data = df)
summary(model_v5)
```
Estimate a linear regression regressing the dummy predictor on the continuous predictor
```{r}
model_v6 <- lm(male ~ Age, data = df)
summary(model_v6)
```
Calculate and save the residuals. Use these residuals as a predictor of the dependent variable Fare.
```{r}
df$residual_male_age <- resid(model_v6)
model_v7 <- lm(Fare ~ residual_male_age, data = df)
summary(model_v7)
```
```{r}
screenreg(c(model_v3, model_v5, model_v7))
```