Skip to content
Brian Pho edited this page Jul 30, 2021 · 20 revisions

Age

For the PLS modeling, I first tried to replicate the results from Rudolph et al. 2017 and was successful. The replicated results are shown below for 594 subjects using 10-fold 10-repeated cross-validation. The PLS model used 4 components while the Ridge model used an alpha of 6900. This is a good first step as this confirms that our data is consistent and valid with the paper's data.

Model Train r^2 Test r^2
Baseline 0.806 0.427
PLS 0.714 0.438
Ridge 0.943 0.510

IQ

Next, I used the same method to model IQ but the results weren't as good as the age results. This is seen in the much lower test r-squared score (0.487 for age compared to 0.089 for IQ). Results are shown below.

IQ Train IQ Test

Multivariate Y

Here are the results for using a combination of WISC measures as the target for the PLS.

All WISC Measures

WISC Measure 2 3 4 5 6
WISC_FSIQ 0.35 0.38 0.40 0.38 0.38
WISC_VSI 0.33 0.36 0.38 0.36 0.37
WISC_FRI 0.27 0.30 0.33 0.31 0.31
WISC_WMI 0.25 0.27 0.27 0.27 0.26
WISC_PSI 0.10 0.13 0.18 0.15 0.15
WISC_VCI 0.35 0.37 0.38 0.36 0.36
WISC_BD_Scaled 0.32 0.34 0.37 0.34 0.35
WISC_Similarities_Scaled 0.31 0.33 0.34 0.32 0.32
WISC_MR_Scaled 0.22 0.25 0.28 0.26 0.27
WISC_DS_Scaled 0.25 0.27 0.27 0.26 0.26
WISC_Coding_Scaled 0.06 0.08 0.12 0.10 0.09
WISC_Vocab_Scaled 0.34 0.37 0.38 0.36 0.36
WISC_FW_Scaled 0.25 0.28 0.31 0.29 0.28
WISC_VP_Scaled 0.29 0.32 0.34 0.32 0.32
WISC_PS_Scaled 0.18 0.20 0.21 0.20 0.19
WISC_SS_Scaled 0.12 0.15 0.20 0.17 0.17

WISC Primary Indices

# With IQ
Target: WISC_FSIQ | r: 0.35
Target: WISC_VSI | r: 0.33
Target: WISC_FRI | r: 0.27
Target: WISC_WMI | r: 0.25
Target: WISC_PSI | r: 0.10
Target: WISC_VCI | r: 0.35
# Without IQ
Target: WISC_VSI | r: 0.33
Target: WISC_FRI | r: 0.27
Target: WISC_WMI | r: 0.25
Target: WISC_PSI | r: 0.10
Target: WISC_VCI | r: 0.35

Component Overlap

Cosine Similarity / Pearson Correlation

All Bin 1 Bin 2 Bin 3
All 1 0.0833 0.0821 0.0157
Bin 1 - 1 0.0172 0.0117
Bin 2 - - 1 0.0100
Bin 3 - - - 1

Positive and Negative Clipped Values

[[1. 0.06348429 0.09552818 0.01238531] [0.06348429 1. 0.02611998 0.01143025] [0.09552818 0.02611998 1. 0.00417967] [0.01238531 0.01143025 0.00417967 1. ]]

[[1. 0.10837503 0.07464222 0.02468051] [0.10837503 1. 0.03270236 0.01762623] [0.07464222 0.03270236 1. 0.03451587] [0.02468051 0.01762623 0.03451587 1. ]]

Spearman Correlation

All Bin 1 Bin 2 Bin 3
All 1 0.0484 0.0677 0.0255
Bin 1 - 1 0.0119 0.0056
Bin 2 - - 1 0.0031
Bin 3 - - - 1

Reducing the Difference Between Train and Test Scores

The main goal of this approach is to reduce the overfitting (train > test) of the MI + PLS model by reducing the noise in the data. By doing so, we can improve the performance of the model by making it more generalizable to unseen data (testing set). I tried two approaches: adding noisy samples and grouping samples.

Adding Noisy Samples

This approach is done by adding noise to the current samples and then adding those noisy samples to the dataset. Specifically, I added Gaussian noise with zero mean (we don't want to change the mean) and varied the standard deviation. The results are shown below.

Baseline 2x Noise, std / 10 2x noise, std Pure noise 5x noise
Num Train Samples 474 1896 1896 474 15168
Train Score 0.32 0.32 0.37 0.87 0.35
Test Score 0.13 0.13 0.13 -0.45 0.13

Grouping Samples

This approach is done by grouping some number of subjects and averaging each group to create a pseudo-subject. By doing this, we can reduce the noise by averaging it out and boosting the signal with redundant signals. The results are shown below.

Baseline Group 6 Group 6 IQ
Num Train Samples 474 113 113
Train Score 0.32 0.64 0.72
Test Score 0.13 0.04 0.45

All Ages - Groups of 3

WISC Primary Index Num Connections r^2 Pearson Spearman
Intelligence Quotient (FSIQ) 1000 0.331 0.593 0.622
Visual Spatial (VSI) 1000 0.302 0.569 0.588
Verbal Comprehension (VCI) 1500 0.349 0.609 0.641
Fluid Reasoning (FRI) 2000 0.225 0.498 0.499
Working Memory (WMI) 2000 0.201 0.477 0.493
Processing Speed (PSI) 3000 -0.024 0.231 0.232