generated from CogDisResLab/krsa_template
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path_comparison.Rmd
233 lines (176 loc) · 7.94 KB
/
_comparison.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
### {{ case }} vs {{ ctrl}}
```{r}
#| label: groupDiff{{random}}
# This function will run both QC steps (krsa_filter_lowPeps, krsa_filter_nonLinear) and krsa_filter_ref_pep
pep_passed_qc <- krsa_quick_filter(
data = data_pw_max, data2 = data_modeled$scaled,
signal_threshold = 5, r2_threshold = 0.8,
groups = c("{{case}}", "{{ctrl}}")
)
# This function calculates log2 fold change values between the defined groups
# The byChip argument lets you calculates the log2 fold change the results within each chip
differential_phosphorylated_peptides <- krsa_group_diff(data_modeled$scaled, c("{{case}}", "{{ctrl}}"), pep_passed_qc, byChip = T)
# save LFC table
write_csv(differential_phosphorylated_peptides, "results/{{if_else(exists('run_prefix'), str_c(run_prefix, '-'), '')}}dpp_{{case}}_{{ctrl}}-{{chip}}.csv")
# Extract top peptides based on the LFC cutoff using average of LFCs across chips
significant_peptides <-
krsa_get_diff(
differential_phosphorylated_peptides,
totalMeanLFC,
c(0.15, 0.2, 0.3, 0.4)
) %>% list("meanLFC" = .)
# Extract top peptides based on the LFC cutoff using per chip LFCs
sigPepsPerChip <- krsa_get_diff_byChip(differential_phosphorylated_peptides, LFC, c(0.15, 0.2, 0.3, 0.4))
# Combine the peptides hits in one list
sigPeps_total <- list(significant_peptides, sigPepsPerChip) %>%
unlist(recursive = F) %>%
unlist(recursive = F)
```
#### Heatmap
After applying the *Filtering Parameters* for this group comparison, only *`r length(significant_peptides$meanLFC[["0.2"]])`*/141 peptides carried forward in the analysis (i.e. *`r length(significant_peptides$meanLFC[["0.2"]])` hits*). Below are some figures to visualize the differences between these samples for considering these *hits*.
```{r}
#| label: heatmapInd{{random}}
#| fig-cap: "Violin plot of two groups"
# generates a heatmap using the selected groups and peptides
krsa_heatmap(data_modeled$normalized, significant_peptides$meanLFC$`0.2`, groups = c("{{case}}", "{{ctrl}}"), scale = "row")
```
#### Violin Plot
Below, the violin plot visualizes the distribution of selected peptides for the analysis.
```{r}
#| label: violinIndPlot{{random}}
#| fig-width: 6
#| fig-height: 6
#| fig-cap: "Violin plot of two groups"
# generates a violin plot using the selected groups and peptides
krsa_violin_plot(data_modeled$scaled, significant_peptides$meanLFC$`0.2`, "Barcode", groups = c("{{case}}", "{{ctrl}}"))
```
#### Waterfall Plot
This waterfall represents the log2 fold changes between the two groups at each peptide.
```{r}
#| label: waterfall{{random}}
#| fig-height: 8
#| fig-width: 6
#| fig-cap: "Waterfall Plot to show the distribution of change in peptide phosphorylation"
# generates a waterfall of the log2 fold change values for the selected peptide (top peptides)
krsa_waterfall(differential_phosphorylated_peptides, lfc_thr = 0.2, byChip = F)
```
#### Upstream Kinase Analysis
The lab carefully curated and mapped the kinases that can act and phosphorylate each peptide present on the chip. This was achieved by using multiple sources including GPS 3.0, Kinexus Phosphonet, PhosphoELM and PhosphoSite Plus. Based on that association between peptides and kinases, a random sampling analysis is performed for these hits. The basic idea of *KRSA* is: For each iteration (*2000* iterations performed in this analysis), the same number of hits are randomly selected from the total 141/or 193 peptides present on the chip. Predicted kinases are then mapped to this sample list of peptides and number of kinases are determined. The kinase count from the actual hits and random sampling is then compared to determine the significance.
```{r}
#| label: krsa{{random}}
# STK chip
if (params$chip_type == "STK") {
chipCov <- KRSA_coverage_STK_PamChip_87102_v2
KRSA_file <- KRSA_Mapping_STK_PamChip_87102_v1
} else if (params$chip_type == "PTK") {
chipCov <- KRSA_coverage_PTK_PamChip_86402_v1
KRSA_file <- KRSA_Mapping_PTK_PamChip_86402_v1
}
# For parallel computing, load the furrr package:
# opens multiple R sessions to run faster
plan(multisession)
# Run the KRSA function across the different sets of peptides using the furrr package for parallel computing
mutiple_krsa_outputs <-
future_map(sigPeps_total, krsa, .options = furrr_options(seed = T), return_count = TRUE)
saveRDS(mutiple_krsa_outputs, file = "datastore/{{if_else(exists('run_prefix'), str_c(run_prefix, '-'), '')}}multiple_krsa_output_{{case}}_{{ctrl}}_{{chip}}.RDS")
plan(sequential)
mutiple_krsa_outputs <- mutiple_krsa_outputs |>
map(~ pluck(.x, "KRSA_Table"))
# Tidy output
df <- data.frame(matrix(unlist(mutiple_krsa_outputs), ncol = max(lengths(mutiple_krsa_outputs)), byrow = TRUE))
df <- setNames(do.call(rbind.data.frame, mutiple_krsa_outputs), names(mutiple_krsa_outputs$meanLFC.0.2))
df <- df %>%
rownames_to_column("method") %>%
select(Kinase, Z, method) %>%
mutate(method = str_extract(method, "\\w+\\.\\w+\\.\\w+")) %>%
mutate(method = gsub("(^\\w+)[\\.]", "\\1>", method)) %>%
mutate_if(is.numeric, round, 2)
df %>%
pivot_wider(names_from = method, values_from = Z) -> df2
# Creates an average Z score table using the across chip analysis
df %>%
filter(grepl("mean", method)) %>%
select(Kinase, Z, method) %>%
group_by(Kinase) %>%
mutate(AvgZ = mean(Z)) -> AvgZTable
# Creates an average Z score table using the within chip analysis
df %>%
filter(!grepl("mean", method)) %>%
select(Kinase, Z, method) %>%
group_by(Kinase) %>%
mutate(AvgZ = mean(Z)) -> AvgZTable2
AvgZTable2 |>
write_csv("results/{{if_else(exists('run_prefix'), str_c(run_prefix, '-'), '')}}krsa_table_full_{{case}}_{{ctrl}}_{{chip}}.csv") |>
select(Kinase, AvgZ) |>
unique() |>
mutate(Direction = if_else(AvgZ >= 0, "Up", "Down")) |>
group_by(Direction) |>
nest() |>
mutate(filtered = map(data, \(x) {
x |>
mutate(AbsZ = abs(AvgZ)) |>
arrange(desc(AbsZ)) |>
slice_head(n = 10) |>
select(-AbsZ)
})) |>
select(-data) |>
unnest(filtered) |>
ungroup() |>
select(-Direction) |>
arrange(desc(AvgZ)) |>
knitr::kable()
# save file
# AvgZTable %>% write_delim("withinChip_KRSA_Table_comp1.txt", delim = "\t")
# Extract top kinases based on abs(Z) score
kinases_hits <- AvgZTable2 |>
select(Kinase, AvgZ) |>
unique() |>
ungroup() |>
slice_max(AvgZ, n = 10) |>
pull(Kinase)
# krsa_top_hits(AvgZTable2, 1.75)
# krsa_top_hits(AvgZTable2, 1.5)
# Show the number of peptides per each set in a table
krsa_show_peptides(sigPeps_total) |>
knitr::kable()
```
#### Z Scores Plot
We will plot the individual and averaged Z scores using both the across and within chip analyses.
```{r}
#| label: zscoresPlot{{random}}
#| fig-height: 6
#| fig-width: 6
#| fig-cap: "Waterfall plot of the Z Scores of each kinase family"
# Generates Z scores waterfall plots
krsa_zscores_plot(AvgZTable2)
```
#### Reverse KRSA Plot
We will use the reverse KRSA plot function, to plot the log2 fold change values for all peptides mapped to kinase hits. This will help us examine the activity of the kinase
```{r}
#| label: revKRSAPlot{{random}}
#| fig-height: 6
#| fig-width: 6
#| fig-cap: "Kinase Activity summary for each kinase family based on peptide phosphorylation"
# plot the reverse KRSA figure for top kinases to determine their activity levels
krsa_reverse_krsa_plot(chipCov, differential_phosphorylated_peptides, kinases_hits, 0.2, byChip = FALSE)
```
#### Coverage Plot
To view the coverage of kinases across the full list of peptides on the chip, we will use the coverage plot function
```{r}
#| label: covPlot{{random}}
#| fig-height: 6
#| fig-width: 6
#| fig-cap: "Percentage of peptides each kinase family phosphorylates"
# generates a kinase coverage plot
krsa_coverage_plot(chipCov, AvgZTable2, chipType)
```
#### Ball Model Network
We will view the ball model network function, to generate a model representing the protein-protein interactions between kinases
```{r}
#| label: netPlot{{random}}
#| fig-width: 8
#| fig-height: 8
# Plot the network ball model
krsa_ball_model(kinases_hits, AvgZTable2, 10, 2.5, 4.8)
```
\newpage