-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathgss-colrac.Rmd
67 lines (50 loc) · 1.74 KB
/
gss-colrac.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
title: "Predicting Attitudes Towards Racist College Professors"
author: "Gustavo Arruda"
date: "`r lubridate::today()`"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```
## Load necessary libraries
```{r packages}
library(tidyverse)
library(rcfss)
library(randomForest)
library(knitr)
library(caret)
library(partykit)
theme_set(theme_minimal())
set.seed(1234)
```
```{r}
gss_colrac <- gss_colrac
gss_colrac_transformed <- gss_colrac %>%
mutate(colrac = if_else(colrac == FALSE, "NO",
if_else(colrac == TRUE, "YES", NA_character_))) %>%
mutate_if(is.character, as.factor)
train_control <- trainControl(method = "oob")
# Train the model
random_forest_model <- train(colrac ~., data = gss_colrac_transformed,
method = "rf",
ntree = 200,
trControl = train_control,
na.action = na.omit
)
random_forest_model$finalModel
varImpPlot(random_forest_model$finalModel)
colrac_selected <- ctree(colrac ~ tolerance + age + egalit_scale + wordsum + authoritarianism, data = gss_colrac_transformed)
plot(colrac_selected,
ip_args = list(
pval = TRUE,
id = FALSE),
tp_args = list(
id = FALSE)
)
```
I choose a random forest algorithm to build this model. Such algorithm iteratively select the best variables to construct a forest tree model, which does a good work to elicit relationships between categorical variables in a data set. The random_forest_model$finalModel chart shows that 'tolerance', 'age', 'egalit_scale', 'wordsum' and 'authoritarianism' are the most important predicting variables.
## Session info
```{r}
devtools::session_info()
```