-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathrecreating_enrichment_map_vignette.Rmd
259 lines (195 loc) · 10.3 KB
/
recreating_enrichment_map_vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
---
title: "Recreating EnrichmentMap tutorials with RCy3"
author: "Julia Gustavsen"
date: "23 September 2016"
output:
html_document:
keep_md: true
number_sections: yes
theme: cerulean
toc: yes
toc_depth: 6
pdf_document:
fig_caption: yes
keep_tex: yes
latex_engine: lualatex
number_sections: yes
toc: yes
toc_depth: 6
---
# Purpose:
* Recreating tutorials from [Bader lab](http://www.baderlab.org/Software/EnrichmentMap/Tutorial) using Rcy3 and cytoscape.
* Create and test out functions using RCy3 that make Enrichment Map easy to use in R.
# Enrichment Map
## Functional enrichment analysis
Many scientists use perform experiments to determine which biological pathways are expressed more in certain diseases or conditions. Using processed sequence data from RNAseq experiments, different treatments and/or samples can be visualized for genes that are more or less enriched compared to the baseline. This informs which genes are important for regulating processes related to a disease or a treatment. Visualizing which genes are statistically more highly expressed under certain conditions can be helpful for interpretation and further analysis of the data and it can help determine which pathways are present during the specific disease or treatment.
The functional enrichment analysis is done outside of this vignette. We will use already processed data and make a network from these data in Cytoscape using RCy3.
## Reproducible Functional enrichment analysis
The basic workflow is using R scripts with [RCy3](https://github.com/tmuetze/Bioconductor_RCy3_the_new_RCytoscape) to build [EnrichmentMaps](http://www.baderlab.org/Software/EnrichmentMap) that can be visualized and analysed in [Cytoscape](http://www.cytoscape.org/). The benefit of using R scripts is that it is easier to reproduce your analysis or to repeat it with different data.
# GSEA processed data
So what we will do today is to use data already processed in Gene Set Enrichment Analysis (GSEA which "determines whether an *a priori* defined set of genes shows statistically significant, concordant differences between two biological states").
We will use this processed data to make an Enrichment Map in Cytoscape from R.
## Load the appropriate libraries
```{r, message = FALSE}
library(RCy3)
library(httr)
library(RJSONIO)
```
## Important note
* Make sure Cytoscape is open before running the code below!
## Load functions for creating Enrichment map
```{r, cache = TRUE}
source("./functions_to_add_to_RCy3/working_with_EM.R")
```
Create the connection from R to Cytoscape
```{r}
# first, delete existing windows to save memory:
deleteAllWindows(CytoscapeConnection())
cy <- CytoscapeConnection ()
```
Examine the commands that are available in Enrichment Map.
```{r, cache = TRUE}
getEnrichmentMapCommandsNames(cy, "build")
getEnrichmentMapCommandsNames(cy, "gseabuild")
```
## Description of files that can be used
See the Bader lab website for full explanation:<https://github.com/BaderLab/EnrichmentMapApp/blob/EM_Cyto3_port/EnrichmentMapPlugin/doc/EM_wiki_manual.txt>
- **"gmtFile"**: Tab-separated file where the header has name, description and samples and each row is one gene in the geneset.
- **"analysisType"** Analysis type can be "generic", "GSEA", or "DAVID/BiNGO/Great""
- **"classDataset1"** Classes (optional) for dataset1
- **"classDataset2"** Classes (optional) for dataset2
- **"coeffecients"** Similarity Coefficient type (contains a typo in the name, but you must type it that way). Can be "OVERLAP","JACCARD", or "COMBINED" (both OVERLAP and JACCARD)
- **"similaritycutoff"** Similarity Cutoff for the coefficient chosen, 0.0 is least similar and 1.0 is most similar.
- **"enrichments2Dataset1"** Tab-separated File containing gene names and their p-values from enrichment condition 2 from dataset 1
- **"enrichments2Dataset2"** Tab-separated File containing gene names and their p-values from enrichment condition 2 from dataset 2
- **"enrichmentsDataset1"** Tab-separated File containing gene names and their p-values from enrichment condition 1 from dataset 1
- **"enrichmentsDataset2"** Tab-separated File containing gene names and their p-values from enrichment condition 1 from dataset 2
- **"expressionDataset1"** Expression (optional) tab-separated file containing gene name, description and expression values. For dataset1
- **"expressionDataset2"** Expression (optional) tab-separated file containing gene name, description and expression values. For dataset2
- **"phenotype1Dataset1"** Default is "Up" for Phenotype1. Can change to a descriptive word for enrichment 1
- **"phenotype2Dataset1"** Default is "Down" for Phenotype2. Can change to a descriptive word for enrichment 2
- **"phenotype1Dataset2"** same as above but for Dataset2
- **"phenotype2Dataset2"** same as above but for Dataset2
- **"pvalue"** P-value Cutoff for enrichment data to be used.
- **"qvalue"** False discovery rate (Q-value) cutoff. Used with multiple comparisons.
- **"ranksDataset1"** Ranks (optional) tab-separated file containing genes and their rank (from GSEA) for Dataset1
- **"ranksDataset2"** Ranks (optional) tab-separated file containing genes and their rank (from GSEA) for Dataset2
## Send data to the cytoscape network
So first we read in the data from the supplied files and set the parameters.
**Note on file paths**:
You cannot use relative paths when constructing EnrichmentMaps from R. The filenames need to be given as their absolute paths.
```{r, cache = TRUE}
path_to_file <- "/home/julia_g/windows_school/gsoc/EM-tutorials-docker/notebooks/data/"
enr_file <- paste0(path_to_file,
"gprofiler_results_mesenonly_ordered_computedinR.txt")
exp_file <- paste0(path_to_file,
"MesenchymalvsImmunoreactive_expression.txt")
```
## Set the parameters for use in the Enrichment Map.
```{r}
em_params <- list(analysisType = "generic",
enrichmentsDataset1 = enr_file,
pvalue = "1.0",
qvalue = "0.00001",
expressionDataset1 = exp_file,
similaritycutoff = "0.25",
coeffecients = "JACCARD")
```
No graph details are returned, this is just setting the parameters that will be sent to Cytoscape via RCy3.
Now build the enrichment map
```{r}
EM_1 <- setEnrichmentMapProperties(cy,
"build",
em_params)
```
These parameters can also be set in Cytoscape, but we are setting them here via script. The function `setEnrichmentMapProperties()` also attaches the window created in Cytoscape to our R session, so that we are able to manipulate the stylistic aspects of our network by using "EM_1".
## Save Enrichment map network image
```{r}
fitContent(EM_1)
Sys.sleep(10)
saveImage(EM_1,
"EM_1",
"png",
h = 2000)
```
![](./EM_1.png)
## Change the layout of the network
Can change any of the visual properties of "EM_1" now. For demonstration let's change the layout.
```{r, cache = TRUE}
layoutNetwork(EM_1,
'kamada-kawai')
Sys.sleep(5)
## save network visualization
saveImage(EM_1,
"EM_1_kamada-kawai",
"png",
h = 2000)
```
![](./EM_1_kamada-kawai.png)
We can also save our network file to use at a later date or to send to collaborators.
```{r, cache = TRUE}
saveNetwork(EM_1,
"EM_1") ## Creates "EM_1.cys" which can be reopened in Cytoscape
```
In our R session we are connected to the graph in Cytoscape and can manipulate the graph's visual properties, but we do not have the graph information in R.
```{r}
EM_1@graph
```
If we want the graph data pulled in to R then we can set the argument "copy.graph.to.R" to `TRUE` in `setEnrichmentMapProperties()`. This pulls in our EM analysis graph to Cytoscape and stores it as EM_1_2.
```{r}
EM_1_2 <- setEnrichmentMapProperties(cy,
"build",
em_params,
copy.graph.to.R = TRUE)
```
Now if we have a look at the graph part of the stored object we will see we have retrieved the information from the graph.
```{r}
EM_1_2@graph
print(noa.names(getGraph(EM_1_2))) # retrieves node attributes from the graph
```
# Summarizing Enrichment Results with Enrichment Maps
## Recreating [Protocol 4 - Summarize Enrichment Results with Enrichment Maps](https://github.com/BaderLab/EM-tutorials-docker/blob/master/notebooks/Protocol%204%20-%20Summarize%20Enrichment%20Results%20with%20Enrichment%20Maps.ipynb)
### Option 1: Load enrichment results from g:Profiler
Load in the datafiles
```{r, cache = TRUE}
path_to_file <- "/home/julia_g/windows_school/gsoc/EM-tutorials-docker/notebooks/data/"
enr_file <- paste0(path_to_file,
"gprofiler_results_mesenonly_ordered.txt")
exp_file <- paste0(path_to_file,
"MesenchymalvsImmunoreactive_expression.txt") # this one works.
#expression_RNA_seq <- paste0(path_to_file,
# "MesenchymalvsImmunoreactive_RNSseq_expression_ed.txt") ## not working
# ranks_file <- paste0(path_to_file,
# "MesenchymalvsImmunoreactive_RNA_seq_ranks.rnk") ## does not work
ranks_file <- paste0(path_to_file,
"MesenchymalvsImmunoreactive_edger_ranks.rnk")
classes_file <- paste0(path_to_file,
"MesenchymalvsImmunoreactive_RNAseq_classes.cls")
```
```{r}
em_params <- list(analysisType = "generic",
enrichmentsDataset1 = enr_file,
pvalue = "1.0",
qvalue = "0.0001",
expressionDataset1 = exp_file,
ranksDataset1 = ranks_file, ## not working from RNA seq data
classDataset1 = classes_file,
phenotype1Dataset1 = "Mesenchymal", # shows up as positive, red, nodes.
phenotype2Dataset1 = "Immunoreactive",
similaritycutoff = "0.25",
coeffecients = "JACCARD")
EM_ex_4 <- setEnrichmentMapProperties(cy,
"build",
em_params)
```
```{r}
fitContent(EM_ex_4)
Sys.sleep(10)
saveImage(EM_ex_4,
"EM_ex_4",
"png",
h = 2000)
```
![](./EM_ex_4.png)
## Helpful references:
Reference for the API. This site describes the cyREST API: http://idekerlab.github.io/cyREST/