11-example_autism.Rmd

# MLM, Longitudinal: Autism

```{r, message=FALSE, error=FALSE}
library(tidyverse)    # all things tidy
library(pander)       # nice looking general tabulations
library(furniture)    # nice table1() descriptive
library(texreg)       # Convert Regression Output to LaTeX or HTML Tables
library(psych)        # contains some useful functions, like headTail
library(performance)
library(sjstats)      # ICC calculations

library(sjPlot)       # Quick predictive and diagnostic plots 
library(effects)      # Estimated Marginal Means

library(VIM)          # Visualization and Imputation of Missing Values 
library(naniar)       # Summaries and Visualizations for Missing Data

library(lme4)         # Linear, generalized linear, & nonlinear mixed models
library(optimx)
library(interactions)
library(ggResidpanel)

library(HLMdiag)      # package with the dataset
```

## Background

The source: <http://www-personal.umich.edu/~kwelch/>

This data was collected by researchers at the University of Michigan [@anderson2007, [@anderson2009]] as part of a prospective longitudinal study of 214 children. The children were divided into three diagnostic groups (`bestest2`) when they were 2 years old: Autism (`autism`), Pervasive Developmental Disorder (`pdd`), and non-spectrum children (none in this sample). The study was designed to collect information on each child at approximately 2, 3, 5, 9, and 13 years of age, although not all children were measured for each age. One of the study objectives was to assess the relative influence of the initial diagnostic category, language proficiency at age 2, and other covariates on the developmental trajectories of the socialization (`vsae`) of these children.

Study participants were children who had had consecutive referrals to one of two autism clinics before the age of 3 years. Social development was assessed at each age using the Vineland Adaptive Behavior Interview survey form, a parent-reported measure of socialization. **VSAE** (Vineland Socialization Age Equivalent), was a combined score that included assessments of interpersonal relationships, play/leisure time activities, and coping skills. Initial language development was assessed using the Sequenced Inventory of Communication Development (SICD) scale; children were placed into one of three groups (`sicdegp`) based on their initial SICD scores on the expressive language subscale at age 2.

-   `childid` Child's identification number for this study

-   `sicdegp` Sequenced Inventory of Communication Development group *(an assessment of expressive language development)* - a factor. Levels are `low`, `med`, and `high`

-   `age2` Age (in years) centered around age 2 (age at diagnosis)

-   `vsae` Vineland Socialization Age Equivalent, Parent-reported measure of socialization, including measures of:

    -   interpersonal relationships
    -   play/leisure time activities
    -   coping skills

-   `gender` Child's gender - a factor. Levels are `male` and `female`

-   `race` Child's race - a factor. Levels are `white` and `nonwhite`

-   `bestest2` Diagnosis at age 2 - a factor. Levels are `autism` and `pdd` (pervasive developmental disorder)

```{r}
data(autism, package = "HLMdiag")   # make the dataset 'active' from this package
```

```{r}
tibble::glimpse(autism)             # first look at the dataset and its varaibles
```

### Long Format

```{r}
data_long <- autism %>%                                    # save the dataset as a new name
  dplyr::mutate(childid = childid %>% factor) %>%          # declare grouping var a factor
  dplyr::mutate(age = 2 + age2) %>%                        # create the original age variable (unequally spaced)
  dplyr::mutate(obs = age %>% factor %>% as.numeric) %>%   # Observation Number = 1, 2, 3, 4, 5 (equally spaced)
  dplyr::select(childid,                                   # choose variables and order to keep
                gender, race, bestest2, sicdegp, 
                obs, age, age2, vsae) %>% 
  dplyr::arrange(childid, age2)                            # sort observations
```

```{r}
data_long %>% 
  psych::headTail(top = 11, bottom = 6)
```

### Wide Format

```{r}
data_wide <- data_long %>%                            # save the dataset as a new name
  dplyr::select(-age2, -obs) %>%                      # delete (by deselecting) this variable 
  tidyr::pivot_wider(names_from  = age,               # repeated indicator
                     values_from = vsae,              # variable repeated
                     names_prefix = "vsae_") %>%      # prefix in from of the 
  dplyr::arrange(childid)                             # sort observations

tibble::glimpse(data_wide)
```

Notice the missing values, displayed as `NA`.

```{r}
data_wide %>% 
  psych::headTail()
```

## Exploratory Data Analysis

### Demographic Summary

#### Using the WIDE formatted dataset

Each person's data is only stored on a single line

```{r}
data_wide %>% 
  dplyr::group_by("Initial Language Development" = sicdegp) %>%     # how split into columns
  furniture::table1("Diagnosis" = bestest2, 
                    "Gender"    = gender, 
                    "Race"      = race,  
                    digits = 2,
                    na.rm  = FALSE,  
                    total  = TRUE,
                    test   = TRUE,               # compare the groups
                    output = "markdown")           
```

#### Using the LONG formatted dataset

Each person's data is stored on multiple lines, one for each time point. To ensure the summary table is correct, you must choose a single time point per person.

```{r}
# Note: One person is missing Age 2
data_long %>% 
  dplyr::filter(age == 2) %>%                                      # restrict to one-line per person
  dplyr::group_by("Initial Language Development" = sicdegp) %>%    # how split into columns
  furniture::table1("Diagnosis" = bestest2, 
                    "Gender"    = gender, 
                    "Race"      = race, 
                    digits = 2,
                    na.rm  = FALSE,
                    total  = TRUE,
                    test   = TRUE,               # compare the groups
                    output = "markdown")           
```

### Baseline Summary

#### Using the WIDE formatted dataset

Each person's data is only stored on a single line

```{r}
data_wide %>%
  dplyr::group_by("Initial Language Development" = sicdegp) %>%    # how split into columns
  furniture::table1(vsae_2,
                    digits = 2,
                    na.rm  = FALSE,
                    total  = TRUE,
                    test   = TRUE,               # compare the groups
                    output = "markdown")
```

#### Using the LONG formatted dataset

Each person's data is stored on multiple lines, one for each time point. To ensure the summary table is correct, you must choose a single time point per person.

```{r, results="asis"}
data_long %>% 
  dplyr::filter(age == 2) %>% 
  dplyr::group_by("Initial Language Development" = sicdegp) %>%    # how split into columns
  furniture::table1(vsae, 
                    test = TRUE,
                    output = "markdown")
```

### Missing Data & Attrition

#### `VIM` package

Plot the amount of missing vlaues and the amount of each patter of missing values.

```{r}
data_wide %>% 
  VIM::aggr(numbers = TRUE,   # shows the number to the far right
            prop    = FALSE)  # shows counts instead of proportions
```

#### `naniar` package

```{r}
data_wide %>% 
  naniar::vis_miss()
```

```{r}
data_wide %>% 
  naniar::gg_miss_var()
```

```{r}
data_wide %>% 
  naniar::gg_miss_var(show_pct = TRUE,     # x-axis is PERCENT, not count
                      facet = sicdegp) +   # create seperate panels
  theme_bw()                               # add ggplot layers as normal
```

```{r}
data_wide %>% 
  naniar::gg_miss_upset() 
```

```{r}
data_wide %>% 
  naniar::gg_miss_upset(sets = c("vsae_13_NA",
                                 "vsae_9_NA",
                                 "vsae_5_NA",
                                 "vsae_3_NA",
                                 "vsae_2_NA"),
                        keep.order = TRUE) 
```

### Means Across Time

#### Using the WIDE formatted dataset

> Default = COMPLETE CASES ONLY!!!

Note - the sample size at each time point is the same, but subjects are only included if they have data at every time point (total n = 41)

```{r}
data_wide %>%
  dplyr::group_by("Initial Language Development" = sicdegp) %>%    # how split into columns
  dplyr::select(starts_with("vsae_")) %>% 
  furniture::table1(digits = 2,
                    total  = TRUE,
                    test   = TRUE,               # compare the groups
                    output = "markdown")
```

> Specify All data:

note - that the smaple sizes will be different for each time point (total n = 155)

```{r, results="asis"}
data_wide %>%
  dplyr::group_by("Initial Language Development" = sicdegp) %>%    # how split into columns
  dplyr::select(starts_with("vsae_")) %>% 
  furniture::table1(digits = 2,
                    na.rm  = FALSE,
                    total  = TRUE,
                    test   = TRUE,               # compare the groups
                    output = "markdown")
```

#### Using the LONG formatted dataset

Each person's data is stored on multiple lines, one for each time point.

> FOR ALL DATA!

```{r}
data_sum_all <- data_long %>% 
  dplyr::group_by(sicdegp, age2) %>%                     # specify the groups
  dplyr::summarise(vsae_n    = n(),                      # count of valid scores
                   vsae_mean = mean(vsae),               # mean score
                   vsae_sd   = sd(vsae),                 # standard deviation of scores
                   vsae_sem  = vsae_sd / sqrt(vsae_n))   # stadard error for the mean of scores

data_sum_all
```

> FOR COMPLETE CASES ONLY!!!

```{r}
data_sum_cc <- data_long %>% 
  dplyr::group_by(childid) %>%        # group-by child
  dplyr::mutate(child_vsae_n = n()) %>%    # count the number of valid VSAE scores
  dplyr::filter(child_vsae_n == 5) %>%     # restrict to only thoes children with 5 valid scores
  dplyr::group_by(sicdegp, age2) %>%              # specify the groups
  dplyr::summarise(vsae_n    = n(),                      # count of valid scores
                   vsae_mean = mean(vsae),               # mean score
                   vsae_sd   = sd(vsae),                 # standard deviation of scores
                   vsae_sem  = vsae_sd / sqrt(vsae_n))   # stadard error for the mean of scores

data_sum_cc
```

### Person Profile Plots

Use the data in LONG format

#### Unequally Spaced

```{r, fig.width=8, fig.height=6}
data_long %>% 
  dplyr::mutate(sicdegp = fct_recode(sicdegp,
                                     "Low Communication" = "low",
                                     "Medium Communication" = "med",
                                     "High Communication" = "high")) %>% 
  ggplot(aes(x = age,
             y = vsae)) +
  geom_point(size = 0.75) +
  geom_line(aes(group = childid),
            alpha = .5,
            size = 1) +
  facet_grid(. ~ sicdegp)  +
  theme_bw() +
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13)) +
  labs(x = "Age of Child, Years",
       y = "Vineland Socialization Age Equivalent",
       color = "Sequenced Inventory of Communication Development") +
  geom_smooth(aes(color = "Flexible"),
              method = "loess",
              se = FALSE,) +
  geom_smooth(aes(color = "Linear"),
              method = "lm",
              se = FALSE) +
  scale_color_manual(name = "Smoother Type: ",
                     values = c("Flexible" = "blue", 
                                "Linear"    = "red")) +
  theme(legend.position = "bottom",
        legend.key.width = unit(2, "cm"))
```

#### Equally Spaced

```{r, fig.width=8, fig.height=6}
data_long %>% 
  dplyr::mutate(sicdegp = fct_recode(sicdegp,
                                     "Low Communication" = "low",
                                     "Medium Communication" = "med",
                                     "High Communication" = "high")) %>% 
  ggplot(aes(x = obs,
             y = vsae)) +
  geom_point(size = 0.75) +
  geom_line(aes(group = childid),
            alpha = .5,
            size = 1) +
  facet_grid(. ~ sicdegp)  +
  theme_bw() +
  labs(x = "Observation Number",
       y = "Vineland Socialization Age Equivalent",
       color = "Sequenced Inventory of Communication Development") +
  geom_smooth(aes(color = "Flexible"),
              method = "loess",
              se = FALSE,) +
  geom_smooth(aes(color = "Linear"),
              method = "lm",
              se = FALSE) +
  scale_color_manual(name = "Smoother Type: ",
                     values = c("Flexible" = "blue", 
                                "Linear"    = "red")) +
  theme(legend.position = "bottom",
        legend.key.width = unit(2, "cm"))
```

### Side-by-side Boxplots

```{r, fig.width=8, fig.height=6}
data_long %>% 
  ggplot(aes(x = sicdegp,
             y = vsae,
             fill = sicdegp)) +
  geom_boxplot() +
  theme_bw() +
  facet_grid(. ~ age, 
             labeller = "label_both") +
  theme(legend.position = "none") +
  labs(x = "Initial Language Development\nSequenced Inventory of Communication Development (SICD) at Age 2",
       y = "Parent-Reported Measure of Socialization\nVineland Socialization Age Equivalent")
```

### Means Plots

#### Default `stat_summary`

It is nice that the `stat_summary()` layer computes the standard error for the mean for you using the data in LONG format

```{r, fig.width=8, fig.height=6}
data_long %>% 
  ggplot(aes(x = age,
             y = vsae,
             color = sicdegp)) +
  stat_summary() +                       # default: points at MEAN and extend vertically 1 standard error for the mean
  stat_summary(fun = "mean",             # plot the means
               geom = "line") +          # ...and connect with lines
  theme_bw() +
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13)) 
```

```{r, fig.width=8, fig.height=6}
data_long %>% 
  ggplot(aes(x = obs,
             y = vsae,
             color = sicdegp)) +
  stat_summary() +                       # default: points at MEAN and extend vertically 1 standard error for the mean
  stat_summary(fun = "mean",           # plot the means
               geom = "line") +          # ...and connect with lines
  theme_bw() 
```

#### Manually Summarized

```{r, fig.width=8, fig.height=6}
data_sum_all %>% 
  dplyr::mutate(age = age2 + 2) %>% 
  ggplot() +
  aes(x = age,
      y = vsae_mean,
      color = sicdegp) +
  geom_errorbar(aes(ymin = vsae_mean - vsae_sem,   # mean +/- one SE for the mean
                    ymax = vsae_mean + vsae_sem),
                width = .25) +
  geom_point(aes(shape = sicdegp),
             size = 3) +
  geom_line(aes(group = sicdegp)) +
  theme_bw() +
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13)) +
  labs(x = "Age of Child, Years",
       y = "Vineland Socialization Age Equivalent",
       color = "Sequenced Inventory of Communication Development:",
       shape = "Sequenced Inventory of Communication Development:",
       linetype = "Sequenced Inventory of Communication Development:") +
  theme(legend.position = "bottom",
        legend.key.width = unit(2, "cm"))
```

## Model 1: Full model with 'loaded' mean structure

Take top-down approach: Quadratic regression model, describing `vsae` as a function of `age2`

> Each child has a unique parabolic trajectory over time, with coefficients that vary randomly around fixed-effects defining a mean growth curve for each SICD group. Since there is no `age` = 0 in our data, we will use the `age2` variables, which is `age` -2, so that intercept has meaning (mean at baseline age).

> `I()` denotes an internal calculated variable

-   Fixed-effects

    -   `I(age-2)` age
    -   `I((age-2)^2)` quadratic age or age-squared, 
    -   `sicdegp` SICD group (reference group = low)
    -   SICD group x age/age-squared interactions

-   Random-effects

    -   intercept
    -   age and age-squared

### Fit the Model

```{r}
fit_lmer_1_re <- lmerTest::lmer(vsae ~ I(age-2)*sicdegp + I(I(age-2)^2)*sicdegp + 
                                  (I(age-2) + I((age-2)^2) | childid), 
                                data = data_long,
                                REML = TRUE,
                                control = lmerControl(optimizer = "optimx",    # get it to converge
                                                 calc.derivs = FALSE,
                                                 optCtrl = list(method = "nlminb",
                                                                starttests = FALSE,
                                                                kkt = FALSE)))
```

### Table of Prameter Estimates

```{r, results='asis'}
texreg::knitreg(fit_lmer_1_re,
                caption            = "MLM: Full Model",
                caption.above      = TRUE,
                single.row         = TRUE)
```

### Plot of the Estimated Marginal Means

Note: the $x-axis$ is the `age` variable back on its original scale

```{r, fig.cap="Model 1: Loaded Means Structure"}
interactions::interact_plot(model = fit_lmer_1_re, # model name
                            pred = age,            # x-axis variable (must be continuous)
                            modx = sicdegp,        # seperate lines
                            interval = TRUE) +     # shaded bands
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13))
```

## Model 2A: Drop Random Intercepts

Note: There seems to be relatively little variation in baseline measurements of VSAE across individuals in the same SICD group, so the variation at age 2 can be attributed to random error, rather than between-subject variation.

This indicates we may want to try removing the random intercepts, while retaining the same fixed- and other random-effects.

This new model implies that children have common initial VSAE value at age 2, given their SICD group.

### Fit the Model

```{r}
fit_lmer_2a_re <- lmerTest::lmer(vsae ~ I(age-2)*sicdegp + I((age-2)^2)*sicdegp + 
                                   (0 + I(age-2) + I((age-2)^2) | childid), 
                                 data = data_long,
                                 REML = TRUE)
```

### Assess the Signifcance

```{r}
anova(fit_lmer_1_re, fit_lmer_2a_re, refit = FALSE)
```

The more complicated model (including random intercepts) does NOT fit better, thus the random intercepts may be removed from the model. Model 2a is better than Model 1

## Model 2B: Drop Random Quadratic Slope

We should formally test the necessity of quadratic age random-effect.

Comparison of nested models with REML-based LRT using a 50:50 mixture χ2-distribution with 1 and 2 df Difference of 2 covariance parameters

### Fit the Model

```{r}
fit_lmer_2b_re <- lmerTest::lmer(vsae ~ I(age-2)*sicdegp + I((age-2)^2)*sicdegp + 
                                   (0 + I(age-2)  | childid), 
                                 REML = TRUE, 
                                 data = data_long)
```

### Assess the Signifcance

```{r}
anova(fit_lmer_2a_re, fit_lmer_2b_re, refit = FALSE)
```

The more complicated model *(including random intercepts)* **DOES** fit better, thus the random slopes for both the linear AND the quadratic effect of age should be retained in the model. Model 2a is better than model 2b

## Model 3: Drop Quadratic Time Fixed Effect

Fit the previous 'best' model via ML, not REML to compare nested model that differe in terms of fixed effects only

### Fit the Models

```{r}
fit_lmer_2a_ml <- lmerTest::lmer(vsae ~ I(age-2)*sicdegp + I((age-2)^2)*sicdegp + 
                                   (0 + I(age-2) + I((age-2)^2) | childid),  
                                 data = data_long, 
                                 REML = FALSE)

fit_lmer_3_ml <- lmerTest::lmer(vsae ~ I(age-2)*sicdegp  + 
                                  (0 + I(age-2) + I((age-2)^2) | childid), 
                                data = data_long, 
                                REML = FALSE)
```

### Assess the Signifcance

```{r}
anova(fit_lmer_2a_ml, fit_lmer_3_ml)
```

The more complicated model *(including fixed interaction between quadratic time and SICD group)* **DOES** fit better, thus the higher level interaction should be retained in the model. Model 2a is better than model 3.

## Final Model

### Table of Parameter Esitmates


```{r, results='asis'}
texreg::knitreg(fit_lmer_2a_re,
                caption            = "MLM: Final Model",
                caption.above      = TRUE,
                single.row         = TRUE,
                digits             = 4)
```

### Interpretation of Fixed Effects

#### Reference Group: `low` SICD group

-   $\gamma_{0}$ = `r  fixef(fit_lmer_2a_re)["(Intercept)"] %>% round(3)` is the estimated marginal mean VSAE score for children in the `low` SICD, at 2 years of age

-   $\gamma_{a}$ = `r  fixef(fit_lmer_2a_re)["I(age - 2)"] %>% round(3)` and $\gamma_{a^2}$ = `r  fixef(fit_lmer_2a_re)["I((age - 2)^2)"] %>% round(3)` are the fixed effects for age and age-squared on VSAE for children in the `low` SICD group (change over time)

Thus the equation for the estimated marginal mean VASE trajectory for the `low` SICD group is:

$$
\begin{align*}
VASE =& \gamma_{0} + 
        \gamma_{a}   (AGE - 2) + 
        \gamma_{a^2} (AGE - 2)^2 \\
     =& `r  fixef(fit_lmer_2a_re)["(Intercept)"]      %>% round(3)` + 
        `r  fixef(fit_lmer_2a_re)["I(age - 2)"]       %>% round(3)` (AGE - 2)  + 
        `r  fixef(fit_lmer_2a_re)["I((age - 2)^2)"]   %>% round(3)` (AGE - 2)^2 \\ 
\end{align*}
$$

#### First Comparison Group: `medium` SICD group

-   $\gamma_{med}$ = `r  fixef(fit_lmer_2a_re)["sicdegpmed"] %>% round(3)` is the DIFFERENCE in the estimated marginal mean VSAE score for children in the `medium` vs. the `low` SICD, at 2 years of age

-   $\gamma_{med:\;a}$ = `r  fixef(fit_lmer_2a_re)["I(age - 2):sicdegpmed"] %>% round(3)` and $\gamma_{med:\;a^2}$ = `r  fixef(fit_lmer_2a_re)["sicdegpmed:I((age - 2)^2)"] %>% round(3)` are the DIFFERENCE in the fixed effects for age and age-squared on VSAE for children in the `medium` vs. the `low` SICD group

Thus the equation for the estimated marginal mean VASE trajectory for the `medium` SICD group is:

$$
\begin{align*}
VASE =& (\gamma_{0}   + \gamma_{med}) + 
        (\gamma_{a}   + \gamma_{med:\;a})  (AGE - 2) + 
        (\gamma_{a^2} + \gamma_{med:\;a^2})(AGE - 2)^2 \\
     =& (`r  fixef(fit_lmer_2a_re)["(Intercept)"] %>% round(3)` + 
         `r  fixef(fit_lmer_2a_re)["sicdegpmed"] %>% round(3)`) + 
        (`r  fixef(fit_lmer_2a_re)["I(age - 2)"]            %>% round(3)` + 
         `r  fixef(fit_lmer_2a_re)["I(age - 2):sicdegpmed"] %>% round(3)`)  (AGE - 2) + 
        (`r  fixef(fit_lmer_2a_re)["I((age - 2)^2)"]            %>% round(3)` + 
         `r  fixef(fit_lmer_2a_re)["sicdegpmed:I((age - 2)^2)"] %>% round(3)`)(AGE - 2)^2 \\
     =& `r  (fixef(fit_lmer_2a_re)["(Intercept)"] + 
             fixef(fit_lmer_2a_re)["sicdegpmed"] ) %>% round(3)` + 
        `r  (fixef(fit_lmer_2a_re)["I(age - 2)"] + 
             fixef(fit_lmer_2a_re)["I(age - 2):sicdegpmed"]) %>% round(3)`  (AGE - 2) + 
        `r  (fixef(fit_lmer_2a_re)["I((age - 2)^2)"] + 
             fixef(fit_lmer_2a_re)["sicdegpmed:I((age - 2)^2)"]) %>% round(3)` (AGE - 2)^2 \\
\end{align*}
$$

#### Second Comparison Group: `high` SICD group

-   $\gamma_{hi}$ = `r  fixef(fit_lmer_2a_re)["sicdegphigh"] %>% round(3)` is the DIFFERENCE in the estimated marginal mean VSAE score for children in the `high` vs. the `low` SICD, at 2 years of age

-   $\gamma_{hi:\;a}$ = `r  fixef(fit_lmer_2a_re)["I(age - 2):sicdegphigh"] %>% round(3)` and $\gamma_{hi:\;a^2}$ = `r  fixef(fit_lmer_2a_re)["sicdegphigh:I((age - 2)^2)"] %>% round(3)` are the DIFFERENCE in the fixed effects for age and age-squared on VSAE for children in the `high` vs. the `low` SICD group

Thus the equation for the estimated marginal mean VASE trajectory for the `high` SICD group is:

$$
\begin{align*}
VASE =& (\gamma_{0}   + \gamma_{hi}) + 
        (\gamma_{a}   + \gamma_{hi:\;a})  (AGE - 2) + 
        (\gamma_{a^2} + \gamma_{hi:\;a^2})(AGE - 2)^2 \\
     =& (`r  fixef(fit_lmer_2a_re)["(Intercept)"] %>% round(3)` + 
         `r  fixef(fit_lmer_2a_re)["sicdegphigh"] %>% round(3)`) + 
        (`r  fixef(fit_lmer_2a_re)["I(age - 2)"]            %>% round(3)` + 
         `r  fixef(fit_lmer_2a_re)["I(age - 2):sicdegphigh"] %>% round(3)`)  (AGE - 2) + 
        (`r  fixef(fit_lmer_2a_re)["I((age - 2)^2)"]            %>% round(3)` + 
         `r  fixef(fit_lmer_2a_re)["sicdegphigh:I((age - 2)^2)"] %>% round(3)`)(AGE - 2)^2 \\
     =& `r  (fixef(fit_lmer_2a_re)["(Intercept)"] + 
             fixef(fit_lmer_2a_re)["sicdegphigh"] ) %>% round(3)` + 
        `r  (fixef(fit_lmer_2a_re)["I(age - 2)"] + 
             fixef(fit_lmer_2a_re)["I(age - 2):sicdegphigh"]) %>% round(3)`  (AGE - 2) + 
        `r  (fixef(fit_lmer_2a_re)["I((age - 2)^2)"] + 
             fixef(fit_lmer_2a_re)["sicdegphigh:I((age - 2)^2)"]) %>% round(3)` (AGE - 2)^2 \\
\end{align*}
$$

### Interpretation of Random Effects

```{r}
lme4::VarCorr(fit_lmer_2a_re)%>% 
  print(comp = c("Variance", "Std.Dev"),
        digits = 3)
```

```{r, include=FALSE}
lme4::VarCorr(fit_lmer_2a_re) %>% 
  data.frame()

fit_lmer_2a_re_sigmas <- lme4::VarCorr(fit_lmer_2a_re) %>% 
  data.frame()

fit_lmer_2a_re_sigmas
```

**Here a group of observations = a CHILD**

#### Residual Varaince

**Within-child-variance**

-   $e_{ti}$ the residuals associated with observation at time $t$ on child $i$

```{r, echo=FALSE, results="hide"}
sigma2_e <- lme4::VarCorr(fit_lmer_2a_re) %>% 
  data.frame %>% 
  dplyr::filter(grp == "Residual") %>% 
  dplyr::select(vcov)

sigma2_e
```

```{r, echo=FALSE, results="hide"}
sigma_e <- fit_lmer_2a_re_sigmas %>% 
  dplyr::filter(grp == "Residual") %>% 
  dplyr::pull(vcov)

sigma_e
```

$$
\sigma^2 = \sigma^2_e 
         = VAR[e_{ti}] 
         = `r round(sigma2_e, 3)`
$$

#### 2 Variance Components

**Between-children slope variances**

Random LINEAR effect of age variance

-   $u_{1i}$ the DIFFERENCE between child $i$'s specific linear component for age and the fixed linear component for age, given their SICD group

```{r, echo=FALSE, results="hide"}
sigma2_u1 <- lme4::VarCorr(fit_lmer_2a_re) %>% 
  data.frame %>% 
  dplyr::filter(var1 == "I(age - 2)", is.na(var2)) %>% 
  dplyr::select(vcov)

sigma2_u1
```

```{r, echo=FALSE, results="hide"}
tau_11 <- fit_lmer_2a_re_sigmas %>% 
  dplyr::filter(grp == "childid", var1 == "I(age - 2)", is.na(var2)) %>% 
  dplyr::pull(vcov)

tau_11
```

$$
\tau_{11} = \sigma^2_{u1} 
          = VAR[u_{1i}] 
          = `r tau_11 %>% round(2)`
$$

Random QUADRATIC effect of age variance

-   $u_{2i}$ the DIFFERENCE between child $i$'s specific quadratic component for age and the fixed quadratic component for age, given their SICD group

```{r, echo=FALSE, results="hide"}
sigma2_u2 <- lme4::VarCorr(fit_lmer_2a_re) %>% 
  data.frame %>% 
  dplyr::filter(var1 == "I((age - 2)^2)", is.na(var2)) %>% 
  dplyr::select(vcov)

sigma2_u2
```

```{r, echo=FALSE, results="hide"}
tau_22 <- fit_lmer_2a_re_sigmas %>% 
  dplyr::filter(grp == "childid", var1 == "I((age - 2)^2)", is.na(var2)) %>% 
  dplyr::pull(vcov)

tau_22
```

$$
\tau_{22} = \sigma^2_{u2} 
          = VAR[u_{2i}] 
          = `r tau_22 %>% round(2)`
$$

#### 1 Covariance (or correlation) Components

**Slope-slope covariance**

Random LINEAR and Quadratic effect of age covariance:

```{r, echo=FALSE, results="hide"}
sigma2_t12 <- lme4::VarCorr(fit_lmer_2a_re) %>% 
  data.frame %>% 
  dplyr::filter(var1 == "I(age - 2)", var2 == "I((age - 2)^2)") %>% 
  dplyr::select(vcov)

sigma2_t12
```

```{r, echo=FALSE, results="hide"}
tau_12 <- fit_lmer_2a_re_sigmas %>% 
  dplyr::filter(grp == "childid", var1 == "I(age - 2)", var2 == "I((age - 2)^2)") %>% 
  dplyr::pull(vcov)

tau_12
```

$$
\tau_{12} = \sigma^2_{u12} 
          = COV[u_{1i}, u_{2i}] 
          = `r tau_12 %>% round(2)`
$$
$$
Correlation(u_{1i}, u_{2i}) = -0.324
$$





### Assumption Checking

The residuals are:

-   Assumed to be normally, independently, and identically distributed (conditional on other random-effects)

-   Assumed independent of random-effects

$$
e_{ti}  \sim  N(0, \sigma^2)
$$

#### The `ggResidpanel` package

```{r}
ggResidpanel::resid_panel(fit_lmer_2a_re)  # default = pearson residuals
```

#### Manually with `HLMdiag` and `ggplot2`

```{r}
fit_lmer_2a_re %>% 
  HLMdiag::hlm_augment(level = 1,
                       include.ls = FALSE,
                       standardized = TRUE) %>%  
  ggplot(aes(x = .resid)) +
  geom_histogram(bins = 40,
                 color = "gray20",
                 alpha = .2) +
  theme_bw() +
  labs(main = "Histogram")
```

```{r}
fit_lmer_2a_re %>% 
  ranef() %>% 
  data.frame() %>% 
  ggplot(aes(sample = condval)) +
  geom_qq() +
  geom_qq_line() +
  facet_wrap(~ term, scale = "free_y")+
  theme_bw() +
  labs(main = "Random Slopes")
```



```{r}
fit_lmer_2a_re %>% 
  HLMdiag::hlm_augment(level = 1,
                       include.ls = FALSE,
                       standardized = TRUE) %>% 
  ggplot(aes(x = .fitted,
             y = .resid)) +
  geom_hline(yintercept = 0, color = "red") +
  geom_point() +
  geom_smooth() +
  theme_bw() +
  labs(main = "Residual Plot")
```

```{r}
fit_lmer_2a_re %>% 
  HLMdiag::hlm_augment(level = 1,
                       include.ls = FALSE,
                       standardized = TRUE) %>% 
  ggplot(aes(sample = .resid)) +
  geom_qq()+
  stat_qq_line() +
  theme_bw()
```


#### The `sjPlot` package

```{r}
sjPlot::plot_model(fit_lmer_2a_re, 
                   type = "diag")
```

### Plot of the Estimated Marginal Means

#### Quick and Default

Note: the $x-axis$ is the `age` variable, back on its original scale

```{r, fig.cap="Final Model (2a)"}
interactions::interact_plot(model = fit_lmer_2a_re, # model name
                            pred = age,            # x-axis variable (must be continuous)
                            modx = sicdegp,        # seperate lines
                            interval = TRUE) +     # shaded bands
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13))
```



```{r}
interactions::interact_plot(model = fit_lmer_2a_re, # model name
                            pred = age,             # x-axis variable (must be continuous)
                            modx = sicdegp,         # separate lines
                            interval = TRUE,
                            x.label = "Age of Child, in years",
                            y.label = "Estimated Marginal Mean\nVineland Socialization Age Equivalent",
                            legend.main = "Initial Communication (SICD)",
                            modx.labels = c("Low", "Medium", "High"),
                            colors = c("black", "black", "black")) + 
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13)) +
  scale_y_continuous(breaks = seq(from = 0, to = 120, by = 20)) +
  theme_bw() +
  theme(legend.position = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(2, "cm"))
```









#### More Customized - Color

This version would look better on a poster or in a slide presentation.

```{r}
effects::Effect(focal.predictors = c("age","sicdegp"),  # variables involved in interactions
                mod = fit_lmer_2a_re,
                xlevels = list(age2 = seq(from = 2, to = 13, by = .1))) %>%   # add more values to smooth out the prediction lines and ribbons
  data.frame() %>% 
  dplyr::mutate(sicdegp = factor(sicdegp,
                                 levels = c("high", "med", "low"),
                                 labels = c("High", "Medium", "Low"))) %>% 
  ggplot(aes(x = age,
             y = fit,
             group = sicdegp)) +
  geom_ribbon(aes(ymin = lower,        # 95% Confidence Intervals
                  ymax = upper,
                  fill = sicdegp),
              alpha = .3) +
  geom_line(aes(linetype = sicdegp,
                color = sicdegp),
            size = 1) +
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13)) +                  # mark values that were actually measured
  scale_y_continuous(breaks = seq(from = 0, to = 120, by = 20)) +
  scale_linetype_manual(values = c("solid", "longdash", "dotted")) +
  theme_bw() +
  theme(legend.position = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(2.5, "cm")) +
  labs(x = "Age of Child, in years",
       y = "Estimated Marginal Mean\nVineland Socialization Age Equivalent",
       linetype = "Initial Communication (SICD)",
       fill     = "Initial Communication (SICD)",
       color    = "Initial Communication (SICD)")
```

#### More Customized - Black and White

This version would be better for a publication.

```{r}
effects::Effect(focal.predictors = c("age","sicdegp"),
                mod = fit_lmer_2a_re,
                xlevels = list(age2 = seq(from = 2, to = 13, by = .1))) %>% 
  data.frame() %>% 
  dplyr::mutate(sicdegp = factor(sicdegp,
                                 levels = c("high", "med", "low"),
                                 labels = c("High", "Medium", "Low"))) %>% 
  ggplot(aes(x = age,
             y = fit,
             group = sicdegp)) +
  geom_ribbon(aes(ymin = lower,
                  ymax = upper,
                  fill = sicdegp),
              alpha = .4) +
  geom_line(aes(linetype = sicdegp),
            size = .7) +
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13)) +
  scale_y_continuous(breaks = seq(from = 0, to = 120, by = 20)) +
  scale_linetype_manual(values = c("solid", "longdash", "dotted")) +
  scale_fill_manual(values = c("gray10", "gray40", "gray60")) +
  theme_bw() +
  theme(legend.position = c(0, 1),
        legend.justification = c(-0.1, 1.1),
        legend.background = element_rect(color = "black"),
        legend.key.width = unit(2, "cm")) +
  labs(x = "Age of Child, in years",
       y = "Estimated Marginal Mean\nVineland Socialization Age Equivalent",
       linetype = "Initial Communication (SICD)",
       fill     = "Initial Communication (SICD)",
       color    = "Initial Communication (SICD)")
```


### Post Hoc Compairisons

```{r}
fit_lmer_2a_re %>% 
  emmeans::emmeans(pairwise ~ sicdegp,
                   at = list(age = 13))
```





```{r}
totalSD <- VarCorr(fit_lmer_2a_re) %>% 
  data.frame() %>% 
  dplyr::summarise(tot_var = sum(vcov)) %>% 
  dplyr::pull(tot_var) %>% 
  sqrt()
  
totalSD
```

```{r}
fit_lmer_2a_re %>% 
  emmeans::emmeans(~ sicdegp,
                   at = list(age = 13)) %>% 
  emmeans::eff_size(sigma = totalSD,        # which SD to divide by???
                    edf = 50)               # df  
```








### Blups vs. Fixed Effects

**BLUP** = Best Linear Unbiased Predictor

A BLUP is the specific prediction for an individual supject, showin by black lines below. This includes the fixed effects as well as the specific random effects for a given individual.

Comparatively, the blue lines below display the predictions for fixed effects only.

```{r}
data_long %>% 
  dplyr::mutate(sicdegp = fct_recode(sicdegp,
                                     "Low Communication" = "low",
                                     "Medium Communication" = "med",
                                     "High Communication" = "high")) %>% 
  dplyr::mutate(pred_fixed = predict(fit_lmer_2a_re, re.form = NA)) %>% # fixed effects only
  dplyr::mutate(pred_wrand = predict(fit_lmer_2a_re)) %>%               # fixed and random effects together
  ggplot(aes(x = age,
             y = vsae)) +
  geom_line(aes(y = pred_wrand,           # BLUP = fixed and random effects together
                group = childid,
                color = "BLUP",
                size  = "BLUP")) +
  geom_line(aes(y = pred_fixed,           # fixed effects only
                group = sicdegp,
                color = "Fixed",
                size  = "Fixed")) +
  scale_color_manual(name = "Model: ",
                     values = c("BLUP"  = "black",
                                "Fixed" = "blue")) +
  scale_size_manual(name = "Model: ",
                    values = c("BLUP"  = .5,
                               "Fixed" = 1.5)) +
  facet_grid(. ~ sicdegp) +
  theme_bw() +
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13)) +
  labs(x = "Age, in years",
       y = "Estimated Marginal Mean\nVineland Socialization Age Equivalent") +
  theme(legend.position = "bottom",
        legend.key.width = unit(1.5, "cm"))
```

```{r, fig.height=8, fig.width=8}
data_long %>% 
  dplyr::mutate(pred_fixed = predict(fit_lmer_2a_re, re.form = NA)) %>% 
  dplyr::mutate(pred_wrand = predict(fit_lmer_2a_re)) %>% 
  dplyr::filter(childid %in% sample(levels(data_long$childid), 25)) %>%  # 25 randomly sampled children
  ggplot(aes(x = age,
             y = vsae)) +
  geom_point(aes(color = sicdegp),
             size = 3) +
  geom_line(aes(y = pred_wrand,
                linetype = "BLUP",
                size     = "BLUP"),
            color = "black") +
  geom_line(aes(y = pred_fixed,
                color = sicdegp,
                linetype = "Fixed",
                size     = "Fixed")) +
  scale_linetype_manual(name = "Model: ",
                        values = c("BLUP"  = "longdash",
                                   "Fixed" = "solid")) +
  scale_size_manual(name = "Model: ",
                    values = c("BLUP"  = .5,
                               "Fixed" = 1)) +
  facet_wrap(. ~ childid, labeller = "label_both") +
  theme_bw() +
  scale_x_continuous(breaks = c(2, 3, 5, 9, 13)) +
  theme(legend.position = "bottom",
        legend.key.width = unit(1.5, "cm")) +
  labs(x = "Age, in years",
       y = "Estimated Marginal Mean\nVineland Socialization Age Equivalent",
       color = "Communication:")
```