This repository contains analysis code for the BrainSeq Phase II project from the BrainSeq Consortium carried out by researchers at the Lieber Institute for Brain Development.
If you wish to visualize the eQTL results described in this project, please use the LIBD eQTL browser.
Attribution-NonCommercial: CC BY-NCThis license lets others remix, tweak, and build upon our work non-commercially as long as they acknowledge our work.
View License Deed | View Legal Code
If you use anything in this repository please cite the following publication:
Collado-Torres L, Burke EE, Peterson A, Shin JH, Straub RE, Rajpurohit A, Semick SA, Ulrich WS, BrainSeq Consortium, Price AJ, Valencia C, Tao R, Deep-Soboslay A, Hyde TM, Kleinman JE, Weinberger DR, Jaffe AE. Regional Heterogeneity in Gene Expression, Regulation, and Coherence in the Frontal Cortex and Hippocampus across Development and Schizophrenia. Neuron. 2019. DOI 10.1016/j.neuron.2019.05.013.
Pre-print: bioRxiv, 2018, DOI 10.1101/426213.
directory | contents |
---|---|
brainspan |
Code for processing the BrainSpan data. |
browser |
Code for creating the files for the eQTL browser. Contains a detailed README file. |
bsp1 |
eQTL replication with BrainSeq Phase I DLPFC polyA+ data. See this README for the results. |
caseControl |
Initial (un-used) exploratory code for the SCZD case-control analysis. Final code is at the qsva_brain repository. |
caseControl_HIPPO_checks |
Code for checking the SCZD case-control HIPPO results. Final code at qsva_brain repo. |
casecontrolint |
Code for the brain region and SCZD diagnosis status interaction DE analysis. |
cellComp |
RNA cell fraction deconvolution. Contains a detailed README file. |
check_expr |
Check genes not expressed at other feature levels. Contains a detailed README file. |
check_noQsva |
Check gene-level SCZD vs control DEG results without adjusting for qSVs. Contains a detailed README file. |
check_protein_coding |
Check for protein coding and non-coding enrichment/depletion. Contains a detailed README file. |
check_sex/casecontrol |
Check SCZD case control results by sex. Contains a detailed README file. |
correlation |
DLPFC and HIPPO expression correlation analyses. |
demographics |
Code for exploring demographic variables such as RIN. |
development |
Code for the DE analysis across age using a linear spline model. |
eQTL_GWAS_riskSNPs |
eQTL analysis using PGC2 GWAS risk SNPs and neighboring SNPs. |
eQTL_full |
Genome wide eQTL analyses. |
eQTL_full_GTEx |
Replication eQTL analyses using GTEx data. |
expr_cutoff |
Code for filtering the features with low expression values and creating the RSE objects used throughout the project. |
gtex |
Code for processing the HIPPO GTEx data and preparing the genotype data for the eQTL analysis. |
gtex_both |
Code for merging the DLPFC and HIPPO GTEx data. |
gtex_dlpfc |
Code for processing the HIPPO GTEx data. |
preprocessed_data |
Code for processing the RNA-seq reads. Uses the LIBD RNA-seq pipeline developed by EE Burke, L Collado-Torres, and AE Jaffe. |
region_specific |
Code for the DE analyses between HIPPO and DLPFC for prenatal and adult controls. |
supp_tabs |
Code for creating some supplementary tables. |
twas |
Perform TWAS analysis using the FUSION TWAS software by Gusev et al., Nature Genetics, 2016. Contains a detailed README file. |
misc |
Files from early explorations including checking for some sample swaps and quality checks. |
KCNQ1_snp_check |
Random check. |
Supplementary Figures and Tables via Mendelay Data at DOI 10.17632/3j93ybf4md.1.
File ID | Details | Description | JHPCE path | URL |
---|---|---|---|---|
f1 | details | Table S15. Genome wide significant eQTL snp-feature pairs | /dcl01/lieber/ajaffe/lab/brainseq_phase2/supp_tabs/SupplementaryTableXX_eQTL.tar.gz |
AWS |
f2 | details | Unfiltered gene RangedSummarizedExperiment object | /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/unfiltered/rse_gene_unfiltered.Rdata |
AWS |
f3 | details | Unfiltered exon RangedSummarizedExperiment object | /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/unfiltered/rse_exon_unfiltered.Rdata |
AWS |
f4 | details | Unfiltered exon-exon junction RangedSummarizedExperiment object | /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/unfiltered/rse_jxn_unfiltered.Rdata |
AWS |
f5 | details | Unfiltered transcript RangedSummarizedExperiment object | /dcl01/lieber/ajaffe/lab/brainseq_phase2/expr_cutoff/unfiltered/rse_tx_unfiltered.Rdata |
AWS |
f6 | details | DLPFC vs HIPPO DEG objects (adult and prenatal, includes BrainSpan replication and cell RNA fraction sensitivity results) | /dcl01/lieber/ajaffe/lab/brainseq_phase2/region_specific/rda/RegionSpecificDEGobjects.tar.gz |
AWS |
f7 | details | Development DEG objects (includes BrainSpan replication and cell RNA fraction sensitivity results) | /dcl01/lieber/ajaffe/lab/brainseq_phase2/development/rda/DevelopmentDEGobjects.tar.gz |
AWS |
f8 | details | SCZD vs non-psychiatric control DEG objects (includes qSVs as well as results for the interaction and no-qSVA gene-level sensitivity analyses). See BrainSeq Phase I SCZD DE features for more. | /dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/SCZDvsControlDEGobjects.tar.gz |
AWS |
f9 | details | Demographic table including cell RNA fraction estimates | /dcl01/lieber/ajaffe/lab/brainseq_phase2/cellComp/methprop_pd.Rdata |
AWS |
f10 | details | TWAS DLPFC weights | /dcl01/lieber/ajaffe/lab/brainseq_phase2/twas/DLPFC/DLPFC_weights.tar.gz |
AWS |
f11 | details | TWAS HIPPO weights | /dcl01/lieber/ajaffe/lab/brainseq_phase2/twas/HIPPO/HIPPO_weights.tar.gz |
AWS |
f12 | details | TWAS results R objects | /dcl01/lieber/ajaffe/lab/brainseq_phase2/twas/rda/TWAS_results.tar.gz |
AWS |
f13 | details | FASTQ files for DLPFC | /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/brainseq/dlpfc/merged_fastq/ |
Globus Endpoint; collection: jhpce#bsp2-dlpfc |
f14 | details | FASTQ files for HIPPO | /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/brainseq/dlpfc/merged_fastq/ |
Globus Endpoint; collection: jhpce#bsp2-hippo |
f15 | details | BAM files for HIPPO and DLPFC | /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/DLPFC_RiboZero/HISAT2_out/ and /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/Hippo_RiboZero/HISAT2_out/ |
Globus Endpoints for DLPFC and HIPPO; collections: jhpce#bsp2-dlpfc-bam and jhpce#bsp2-hippo-bam |
f16 | details | BSP1 re-processed using hg38 at the gene, exon, junction, and transcript expression levels | /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/ |
gene (AWS), exon (AWS), jxn (AWS), tx (AWS) |
If the information below is insufficient, check the corresponding scripts or use GitHub's search feature to find where each of the R objects were created. If you have questions about the files, please open an issue.
RangedSummarizedExperiment
: RDocumentation, Bioconductorlimma::eBayes()
: RDocumentation, Bioconductorlimma::topTable()
RDocumentation, Bioconductorstats::prcomp()
: RDoducmentation
- Script that created it: subset_sig_eqtls.R
- Contents:
Tables with the significant eQTL associations (FDR < 1%) for DLPFC, HIPPO and the brain region interaction (DLPFC vs HIPPO) at the gene, exon, exon-exon junction and transcript expression levels.
Different expression levels and models include other columns depending on what other data was used for replication: GTEx, CAUC only sub-analysis, BrainSeq Phase 1 replication. Though the main ones are described here, and briefly these are:
snp
: SNP ID.feature_id
: expression feature ID. You might also want theEnsemblGeneID
column (Ensembl gene ID) or theSymbol
one (gene symbol, when available).statistic
:Â eQTL t-statistic computed by MatrixEQTL.pvalue
: p-value.FDR
:Â FDR adjusted p-value.beta
: eQTL beta coefficient.
For the reference and alternative alleles (note that some variants are insertions), check the newRef
and newCount
columns respectively in the SNP annotation file BrainSeqPhaseII_snp_annotation.txt
(the column names are lower case in that file) that you can match using the snp
column.
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
qSVA information for DLPFC (without the 'RiboZero Gold' HIPPO samples, just for file name consistency since no HIPPO samples were considered for this set of qSVs.)
- JHPCE path:
/dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/brainseq_phase2_qsvs_age17_noHGold_DLPFC.Rdata
- Script
- Contents:
object | class | description |
---|---|---|
qsvBonf | prcomp | qSVs in the original object format |
qSVs | matrix | Matrix of qSVs used for building the model matrices |
mod | matrix | Model matrix without qSVs |
modQsva | matrix | Model matrix with qSVs |
keepIndex | integer | Vector specifying which samples from the full RSE objects to keep |
- Details:
keepIndex : int [1:379] 1 2 3 4 5 6 7 8 9 10 ...
mod : num [1:379, 1:13] 1 1 1 1 1 1 1 1 1 1 ...
modQsva : num [1:379, 1:28] 1 1 1 1 1 1 1 1 1 1 ...
qsvBonf : List of 5
$ sdev : num [1:379] 15.07 4.28 2.98 2.49 2.15 ...
$ rotation: num [1:1000, 1:379] 0.034 0.0494 0.0322 0.0414 0.0319 ...
$ center : Named num [1:1000] 5.16 6.6 6.15 6.81 5.47 ...
$ scale : logi FALSE
$ x : num [1:379, 1:379] -23.32 4.42 14.46 13.68 -22.33 ...
qSVs : num [1:379, 1:15] -23.32 4.42 14.46 13.68 -22.33 ...
keepIndex mod modQsva qsvBonf qSVs
qSV information for HIPPO after dropping the 'RiboZero Gold' HIPPO samples.
- JHPCE path:
/dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/brainseq_phase2_qsvs_age17_noHGold_HIPPO.Rdata
- Script
- Contents:
object | class | description |
---|---|---|
qsvBonf | prcomp | qSVs in the original object format |
qSVs | matrix | Matrix of qSVs used for building the model matrices |
mod | matrix | Model matrix without qSVs |
modQsva | matrix | Model matrix with qSVs |
keepIndex | integer | Vector specifying which samples from the full RSE objects to keep |
- Details:
keepIndex : int [1:333] 454 455 456 457 458 459 460 461 462 463 ...
mod : num [1:333, 1:13] 1 1 1 1 1 1 1 1 1 1 ...
modQsva : num [1:333, 1:29] 1 1 1 1 1 1 1 1 1 1 ...
qsvBonf : List of 5
$ sdev : num [1:333] 15.09 5.45 4.18 3.05 2.77 ...
$ rotation: num [1:1000, 1:333] 0.0402 0.051 0.0343 0.0431 0.035 ...
$ center : Named num [1:1000] 4.47 5.62 5.34 6.27 4.55 ...
$ scale : logi FALSE
$ x : num [1:333, 1:333] -5.05 8.6 4.2 17.14 21.68 ...
qSVs : num [1:333, 1:16] -5.05 8.6 4.2 17.14 21.68 ...
keepIndex mod modQsva qsvBonf qSVs
Joint DLPFC and HIPPO qSVs without the HIPPO 'RiboZero Gold' samples.
- JHPCE path:
/dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/brainseq_phase2_qsvs_age17_noHGold.Rdata
- Script
- Contents:
object | class | description |
---|---|---|
qsvBonf | prcomp | qSVs in the original object format |
qSVs | matrix | Matrix of qSVs used for building the model matrices |
mod | matrix | Model matrix without qSVs |
modQsva | matrix | Model matrix with qSVs |
keepIndex | integer | Vector specifying which samples from the full RSE objects to keep |
- Details:
keepIndex : int [1:712] 1 2 3 4 5 6 7 8 9 10 ...
mod : num [1:712, 1:14] 1 1 1 1 1 1 1 1 1 1 ...
modQsva : num [1:712, 1:36] 1 1 1 1 1 1 1 1 1 1 ...
qsvBonf : List of 5
$ sdev : num [1:712] 18.05 5.07 3.77 3.47 2.34 ...
$ rotation: num [1:1000, 1:712] 0.0361 0.0499 0.0354 0.0376 0.0375 ...
$ center : Named num [1:1000] 4.83 6.14 5.77 6.56 5.04 ...
$ scale : logi FALSE
$ x : num [1:712, 1:712] -13.9 14.1 23.8 22.8 -12.5 ...
qSVs : num [1:712, 1:22] -13.9 14.1 23.8 22.8 -12.5 ...
keepIndex mod modQsva qsvBonf qSVs
SCZD case vs neurotypical controls interaction with brain region (DLPFC, HIPPO) analysis results at the exon feature level.
- JHPCE path:
/dcl01/lieber/ajaffe/lab/brainseq_phase2/casecontrolint/rda/limma_casecontrol_interaction_exon.Rdata
- Script
- Contents:
object | class | description |
---|---|---|
corfit | list | Output from limma::duplicateCorrelation() for taking into account repeated measures since some brains were sequenced in both DLPFC and HIPPO |
fit | MArrayLM | Output from limma::eBayes() with the DE model results |
top | data.frame | Output from limma::topTable() with the DE results for the interaction term |
exprsNorm | matrix | Normalized expression matrix used for the DE analysis |
- Details:
corfit : List of 3
$ consensus.correlation: num 0.174
$ cor : num 0.174
$ atanh.correlations : num [1:396579] 0.109 0.3918 0.0957 0.1391 0.263 ...
exprsNorm : num [1:396579, 1:712] -7.127 -1.77 -1.455 -0.978 -0.209 ...
fit : Formal class 'MArrayLM' [package "limma"] with 1 slot
top : 'data.frame': 396579 obs. of 6 variables:
$ logFC : num 0.1133 -0.1337 -0.0808 0.097 0.025 ...
$ AveExpr : num -5.61 -1.77 -2.9 -2.38 -1.97 ...
$ t : num 0.586 -1.131 -0.603 0.989 0.275 ...
$ P.Value : num 0.558 0.259 0.547 0.323 0.783 ...
$ adj.P.Val: num 0.895 0.749 0.891 0.79 0.958 ...
$ B : num -4.86 -5.08 -5.2 -5.07 -5.56 ...
corfit exprsNorm fit top
SCZD case vs neurotypical controls interaction with brain region (DLPFC, HIPPO) analysis results at the gene feature level.
- JHPCE path:
/dcl01/lieber/ajaffe/lab/brainseq_phase2/casecontrolint/rda/limma_casecontrol_interaction_gene.Rdata
- Script
- Contents:
object | class | description |
---|---|---|
corfit | list | Output from limma::duplicateCorrelation() for taking into account repeated measures since some brains were sequenced in both DLPFC and HIPPO |
fit | MArrayLM | Output from limma::eBayes() with the DE model results |
top | data.frame | Output from limma::topTable() with the DE results for the interaction term |
exprsNorm | matrix | Normalized expression matrix used for the DE analysis |
- Details:
corfit : List of 3
$ consensus.correlation: num 0.275
$ cor : num 0.275
$ atanh.correlations : num [1:24652] 0.601 0.298 0.357 0.548 0.824 ...
exprsNorm : num [1:24652, 1:712] 1.379 0.403 -3.751 0.983 2.263 ...
fit : Formal class 'MArrayLM' [package "limma"] with 1 slot
top : 'data.frame': 24652 obs. of 6 variables:
$ logFC : num 0.02036 0.04351 -0.06924 0.00926 -0.03694 ...
$ AveExpr : num 0.883 -2.08 -3.167 1.411 1.417 ...
$ t : num 0.289 0.362 -0.428 0.152 -0.439 ...
$ P.Value : num 0.773 0.717 0.669 0.879 0.661 ...
$ adj.P.Val: num 0.95 0.937 0.924 0.976 0.922 ...
$ B : num -5.7 -5.16 -5.02 -5.84 -5.76 ...
corfit exprsNorm fit top
SCZD case vs neurotypical controls interaction with brain region (DLPFC, HIPPO) analysis results at the junction feature level.
- JHPCE path:
/dcl01/lieber/ajaffe/lab/brainseq_phase2/casecontrolint/rda/limma_casecontrol_interaction_jxn.Rdata
- Script
- Contents:
object | class | description |
---|---|---|
corfit | list | Output from limma::duplicateCorrelation() for taking into account repeated measures since some brains were sequenced in both DLPFC and HIPPO |
fit | MArrayLM | Output from limma::eBayes() with the DE model results |
top | data.frame | Output from limma::topTable() with the DE results for the interaction term |
exprsNorm | matrix | Normalized expression matrix used for the DE analysis |
- Details:
corfit : List of 3
$ consensus.correlation: num 0.109
$ cor : num 0.109
$ atanh.correlations : num [1:297181] 0.06038 0.04695 0.18061 0.00298 0.02972 ...
exprsNorm : num [1:297181, 1:712] -4.97 -4.97 -4.97 -4.97 -4.97 ...
fit : Formal class 'MArrayLM' [package "limma"] with 1 slot
top : 'data.frame': 297181 obs. of 6 variables:
$ logFC : num -0.206 -0.3286 -0.0249 -0.3713 0.1121 ...
$ AveExpr : num -4.02 -3.72 -3.35 -4.2 -3.41 ...
$ t : num -1.393 -2.046 -0.131 -2.826 0.677 ...
$ P.Value : num 0.16415 0.04113 0.89607 0.00485 0.49882 ...
$ adj.P.Val: num 0.78 0.599 0.99 0.349 0.923 ...
$ B : num -4.56 -3.72 -5.25 -2.29 -5.1 ...
corfit exprsNorm fit top
SCZD case vs neurotypical controls interaction with brain region (DLPFC, HIPPO) analysis results at the transcript feature level.
- JHPCE path:
/dcl01/lieber/ajaffe/lab/brainseq_phase2/casecontrolint/rda/limma_casecontrol_interaction_tx.Rdata
- Script
- Contents:
object | class | description |
---|---|---|
corfit | list | Output from limma::duplicateCorrelation() for taking into account repeated measures since some brains were sequenced in both DLPFC and HIPPO |
fit | MArrayLM | Output from limma::eBayes() with the DE model results |
top | data.frame | Output from limma::topTable() with the DE results for the interaction term |
exprsNorm | matrix | Normalized expression matrix used for the DE analysis |
- Details:
corfit : List of 3
$ consensus.correlation: num 0.144
$ cor : num 0.144
$ atanh.correlations : num [1:92732] 0.356 0.111 0.418 0.315 0.266 ...
exprsNorm : num [1:92732, 1:712] 1.514 1.791 0.934 0.593 -1 ...
fit : Formal class 'MArrayLM' [package "limma"] with 1 slot
top : 'data.frame': 92732 obs. of 6 variables:
$ logFC : num -0.0749 -0.1086 -0.0715 -0.0267 -0.1306 ...
$ AveExpr : num 1.267 1.854 0.601 0.812 -0.251 ...
$ t : num -1.42 -2.15 -1.8 -1.05 -1.71 ...
$ P.Value : num 0.1553 0.032 0.0719 0.2932 0.0882 ...
$ adj.P.Val: num 0.702 0.507 0.597 0.799 0.625 ...
$ B : num -4.76 -3.63 -4.22 -5.16 -4.37 ...
corfit exprsNorm fit top
DLPFC SCZD case vs neurotypical control DE analysis at the gene level. Also contains results for models that either don't adjust for qSVs or don't adjust for any covariates at all (naive model).
- JHPCE path:
/dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/dxStats_dlpfc_filtered_qSVA_geneLevel_noHGoldQSV_matchDLPFC.rda
- Script
- Contents:
object | class | description |
---|---|---|
outGene | data.frame | Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
outGene0 | data.frame | Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model without adjusting for qSVs |
outGeneNoAdj | data.frame | Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model without adjusting for qSVs and any other variables |
- Details:
outGene : 'data.frame': 24652 obs. of 17 variables:
$ Length : int 1351 68 284 1982 4039 385 372 1044 1543 89 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
$ gene_type : chr "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
$ Symbol : chr "WASH7P" "MIR6859-1" "" "" ...
$ EntrezID : int NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 1.697 4.355 0.491 1.587 0.733 ...
$ NumTx : int 1 1 1 3 5 1 1 1 1 1 ...
$ gencodeTx :List of 24652
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num -0.02194 0.00771 0.14248 -0.0459 -0.04052 ...
$ AveExpr : num 1.35 -1.68 -2.82 1.88 1.88 ...
$ t : num -0.3856 0.0862 1.112 -0.9079 -0.5623 ...
$ P.Value : num 0.7 0.931 0.267 0.365 0.574 ...
$ adj.P.Val : num 0.91 0.982 0.661 0.739 0.855 ...
$ B : num -5.82 -5.29 -4.73 -5.62 -5.86 ...
outGene0 : 'data.frame': 24652 obs. of 17 variables:
$ Length : int 1351 68 284 1982 4039 385 372 1044 1543 89 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
$ gene_type : chr "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
$ Symbol : chr "WASH7P" "MIR6859-1" "" "" ...
$ EntrezID : int NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 1.697 4.355 0.491 1.587 0.733 ...
$ NumTx : int 1 1 1 3 5 1 1 1 1 1 ...
$ gencodeTx :List of 24652
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num -0.07 0.0164 0.335 -0.0653 -0.0333 ...
$ AveExpr : num 1.35 -1.68 -2.82 1.88 1.88 ...
$ t : num -1.238 0.158 2.634 -1.337 -0.485 ...
$ P.Value : num 0.21667 0.87469 0.00878 0.18209 0.62773 ...
$ adj.P.Val : num 0.556 0.959 0.109 0.516 0.852 ...
$ B : num -5.4 -5.61 -2.63 -5.36 -6.11 ...
outGeneNoAdj : 'data.frame': 24652 obs. of 17 variables:
$ Length : int 1351 68 284 1982 4039 385 372 1044 1543 89 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
$ gene_type : chr "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
$ Symbol : chr "WASH7P" "MIR6859-1" "" "" ...
$ EntrezID : int NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 1.697 4.355 0.491 1.587 0.733 ...
$ NumTx : int 1 1 1 3 5 1 1 1 1 1 ...
$ gencodeTx :List of 24652
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num -0.0793 0.0646 0.4536 -0.0197 0.0293 ...
$ AveExpr : num 1.35 -1.68 -2.82 1.88 1.88 ...
$ t : num -1.377 0.636 3.454 -0.393 0.441 ...
$ P.Value : num 0.169352 0.525159 0.000613 0.694524 0.659251 ...
$ adj.P.Val : num 0.3082 0.6674 0.00567 0.80175 0.77435 ...
$ B : num -5.24 -5.497 -0.556 -6.156 -6.139 ...
outGene outGene0 outGeneNoAdj
DLPFC SCZD case vs neurotypical control DE analysis for each of the feature levels (gene, exon, junction, transcript) adjusting for qSVs.
- JHPCE path:
/dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/dxStats_dlpfc_filtered_qSVA_noHGoldQSV_matchDLPFC.rda
- Script
- Contents:
object | class | description |
---|---|---|
outGene | data.frame | Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
outExon | data.frame | Output from limma::topTable() with the exon annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
outJxn | data.frame | Output from limma::topTable() with the junction annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
outTx | DataFrame | Output from limma::topTable() with the transcript annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
- Details:
outExon : 'data.frame': 396583 obs. of 17 variables:
$ Length : int 37 154 99 147 137 136 198 159 152 34 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000227232.5" "ENSG00000227232.5" "ENSG00000227232.5" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000227232" "ENSG00000227232" "ENSG00000227232" ...
$ gene_type : chr "unprocessed_pseudogene" "unprocessed_pseudogene" "unprocessed_pseudogene" "unprocessed_pseudogene" ...
$ Symbol : chr "WASH7P" "WASH7P" "WASH7P" "WASH7P" ...
$ EntrezID : int NA NA NA NA NA NA NA NA NA NA ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 0.638 2.354 1.577 1.498 2.112 ...
$ NumTx : int 1 1 1 1 1 1 1 1 1 1 ...
$ gencodeTx :List of 396583
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num -0.0388 -0.0452 -0.0867 -0.0564 -0.0758 ...
$ AveExpr : num -5.42 -1.53 -2.77 -2.2 -1.82 ...
$ t : num -0.248 -0.521 -0.837 -0.753 -1.059 ...
$ P.Value : num 0.805 0.603 0.403 0.452 0.29 ...
$ adj.P.Val : num 0.962 0.909 0.829 0.852 0.763 ...
$ B : num -4.94 -5.62 -5.11 -5.31 -5.16 ...
outGene : 'data.frame': 24652 obs. of 17 variables:
$ Length : int 1351 68 284 1982 4039 385 372 1044 1543 89 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
$ gene_type : chr "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
$ Symbol : chr "WASH7P" "MIR6859-1" "" "" ...
$ EntrezID : int NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 1.697 4.355 0.491 1.587 0.733 ...
$ NumTx : int 1 1 1 3 5 1 1 1 1 1 ...
$ gencodeTx :List of 24652
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num -0.02194 0.00771 0.14248 -0.0459 -0.04052 ...
$ AveExpr : num 1.35 -1.68 -2.82 1.88 1.88 ...
$ t : num -0.3856 0.0862 1.112 -0.9079 -0.5623 ...
$ P.Value : num 0.7 0.931 0.267 0.365 0.574 ...
$ adj.P.Val : num 0.91 0.982 0.661 0.739 0.855 ...
$ B : num -5.82 -5.29 -4.73 -5.62 -5.86 ...
outJxn : 'data.frame': 297181 obs. of 24 variables:
$ inGencode : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ inGencodeStart: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ inGencodeEnd : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ gencodeGeneID : chr NA NA NA NA ...
$ ensemblID : chr NA NA NA NA ...
$ Symbol : chr NA NA NA NA ...
$ gencodeStrand : chr NA NA NA NA ...
$ gencodeTx :List of 297181
$ numTx : int 0 0 0 0 0 0 0 0 0 0 ...
$ Class : chr "Novel" "Novel" "Novel" "Novel" ...
$ startExon : int NA NA NA NA NA NA NA NA NA NA ...
$ endExon : int NA NA NA NA NA NA NA NA NA NA ...
$ newGeneID : chr NA NA NA NA ...
$ newGeneSymbol : chr NA NA NA NA ...
$ isFusion : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ meanExprs : num 0.5 0.694 1.084 0.659 1.002 ...
$ Length : num 100 100 100 100 100 100 100 100 100 100 ...
$ passExprsCut : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num 0.1667 0.0291 -0.0933 -0.029 -0.1667 ...
$ AveExpr : num -4.35 -3.98 -3.53 -4.49 -3.7 ...
$ t : num 1.606 0.245 -0.647 -0.302 -1.38 ...
$ P.Value : num 0.109 0.807 0.518 0.763 0.169 ...
$ adj.P.Val : num 0.781 0.972 0.959 0.964 0.834 ...
$ B : num -4.29 -5.12 -5 -5.11 -4.51 ...
outTx : Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
outExon outGene outJxn outTx
HIPPO SCZD case vs neurotypical control DE analysis at the gene level. Also contains results for models that either don't adjust for qSVs or don't adjust for any covariates at all (naive model).
- JHPCE path:
/dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/dxStats_hippo_filtered_qSVA_geneLevel_noHGoldQSV_matchHIPPO.rda
- Script
- Contents:
object | class | description |
---|---|---|
outGene | data.frame | Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
outGene0 | data.frame | Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model without adjusting for qSVs |
outGeneNoAdj | data.frame | Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model without adjusting for qSVs and any other variables |
- Details:
outGene : 'data.frame': 24652 obs. of 17 variables:
$ Length : int 1351 68 284 1982 4039 385 372 1044 1543 89 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
$ gene_type : chr "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
$ Symbol : chr "WASH7P" "MIR6859-1" "" "" ...
$ EntrezID : int NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 1.697 4.355 0.491 1.587 0.733 ...
$ NumTx : int 1 1 1 3 5 1 1 1 1 1 ...
$ gencodeTx :List of 24652
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num 0.0326 0.0477 0.2136 -0.0298 -0.0338 ...
$ AveExpr : num 0.356 -2.536 -3.559 0.882 0.889 ...
$ t : num 0.46 0.396 1.383 -0.514 -0.405 ...
$ P.Value : num 0.646 0.692 0.168 0.608 0.685 ...
$ adj.P.Val : num 0.913 0.927 0.647 0.901 0.925 ...
$ B : num -5.64 -5.16 -4.49 -5.72 -5.77 ...
outGene0 : 'data.frame': 24652 obs. of 17 variables:
$ Length : int 1351 68 284 1982 4039 385 372 1044 1543 89 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
$ gene_type : chr "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
$ Symbol : chr "WASH7P" "MIR6859-1" "" "" ...
$ EntrezID : int NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 1.697 4.355 0.491 1.587 0.733 ...
$ NumTx : int 1 1 1 3 5 1 1 1 1 1 ...
$ gencodeTx :List of 24652
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num -0.0143 -0.0688 0.3141 -0.0869 -0.0372 ...
$ AveExpr : num 0.356 -2.536 -3.559 0.882 0.889 ...
$ t : num -0.195 -0.547 2.019 -1.326 -0.448 ...
$ P.Value : num 0.8452 0.5847 0.0443 0.1858 0.6542 ...
$ adj.P.Val : num 0.972 0.9 0.452 0.681 0.923 ...
$ B : num -5.81 -5.3 -3.77 -5.08 -5.81 ...
outGeneNoAdj : 'data.frame': 24652 obs. of 17 variables:
$ Length : int 1351 68 284 1982 4039 385 372 1044 1543 89 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
$ gene_type : chr "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
$ Symbol : chr "WASH7P" "MIR6859-1" "" "" ...
$ EntrezID : int NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 1.697 4.355 0.491 1.587 0.733 ...
$ NumTx : int 1 1 1 3 5 1 1 1 1 1 ...
$ gencodeTx :List of 24652
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num -0.0323 -0.021 0.3893 -0.0125 0.0582 ...
$ AveExpr : num 0.356 -2.536 -3.559 0.882 0.889 ...
$ t : num -0.477 -0.191 2.778 -0.201 0.777 ...
$ P.Value : num 0.6335 0.84875 0.00579 0.84089 0.43771 ...
$ adj.P.Val : num 0.7615 0.9091 0.0275 0.9039 0.6007 ...
$ B : num -5.97 -5.65 -2.3 -6.12 -5.86 ...
outGene outGene0 outGeneNoAdj
HIPPO SCZD case vs neurotypical control DE analysis for each of the feature levels (gene, exon, junction, transcript) adjusting for qSVs.
- JHPCE path:
/dcl01/ajaffe/data/lab/qsva_brain/brainseq_phase2_qsv/rdas/dxStats_hippo_filtered_qSVA_noHGoldQSV_matchHIPPO.rda
- Script
- Contents:
object | class | description |
---|---|---|
outGene | data.frame | Output from limma::topTable() with the gene annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
outExon | data.frame | Output from limma::topTable() with the exon annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
outJxn | data.frame | Output from limma::topTable() with the junction annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
outTx | DataFrame | Output from limma::topTable() with the transcript annotation information for the SCZD case vs neurotypical control model adjusting for qSVs |
- Details:
outExon : 'data.frame': 396583 obs. of 17 variables:
$ Length : int 37 154 99 147 137 136 198 159 152 34 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000227232.5" "ENSG00000227232.5" "ENSG00000227232.5" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000227232" "ENSG00000227232" "ENSG00000227232" ...
$ gene_type : chr "unprocessed_pseudogene" "unprocessed_pseudogene" "unprocessed_pseudogene" "unprocessed_pseudogene" ...
$ Symbol : chr "WASH7P" "WASH7P" "WASH7P" "WASH7P" ...
$ EntrezID : int NA NA NA NA NA NA NA NA NA NA ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 0.638 2.354 1.577 1.498 2.112 ...
$ NumTx : int 1 1 1 1 1 1 1 1 1 1 ...
$ gencodeTx :List of 396583
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num 0.1345 -0.1166 -0.1364 0.0472 -0.0376 ...
$ AveExpr : num -5.82 -2.06 -3.06 -2.58 -2.14 ...
$ t : num 0.87 -1.043 -1.154 0.52 -0.483 ...
$ P.Value : num 0.385 0.298 0.249 0.603 0.629 ...
$ adj.P.Val : num 0.85 0.807 0.778 0.924 0.931 ...
$ B : num -4.76 -5.12 -4.81 -5.34 -5.47 ...
outGene : 'data.frame': 24652 obs. of 17 variables:
$ Length : int 1351 68 284 1982 4039 385 372 1044 1543 89 ...
$ gencodeID : chr "ENSG00000227232.5" "ENSG00000278267.1" "ENSG00000269981.1" "ENSG00000279457.3" ...
$ ensemblID : chr "ENSG00000227232" "ENSG00000278267" "ENSG00000269981" "ENSG00000279457" ...
$ gene_type : chr "unprocessed_pseudogene" "miRNA" "processed_pseudogene" "protein_coding" ...
$ Symbol : chr "WASH7P" "MIR6859-1" "" "" ...
$ EntrezID : int NA 102466751 NA 102723897 NA NA NA NA NA 102465432 ...
$ Class : chr "InGen" "InGen" "InGen" "InGen" ...
$ meanExprs : num 1.697 4.355 0.491 1.587 0.733 ...
$ NumTx : int 1 1 1 3 5 1 1 1 1 1 ...
$ gencodeTx :List of 24652
$ passExprsCut: logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num 0.0326 0.0477 0.2136 -0.0298 -0.0338 ...
$ AveExpr : num 0.356 -2.536 -3.559 0.882 0.889 ...
$ t : num 0.46 0.396 1.383 -0.514 -0.405 ...
$ P.Value : num 0.646 0.692 0.168 0.608 0.685 ...
$ adj.P.Val : num 0.913 0.927 0.647 0.901 0.925 ...
$ B : num -5.64 -5.16 -4.49 -5.72 -5.77 ...
outJxn : 'data.frame': 297181 obs. of 24 variables:
$ inGencode : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ inGencodeStart: logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ inGencodeEnd : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ gencodeGeneID : chr NA NA NA NA ...
$ ensemblID : chr NA NA NA NA ...
$ Symbol : chr NA NA NA NA ...
$ gencodeStrand : chr NA NA NA NA ...
$ gencodeTx :List of 297181
$ numTx : int 0 0 0 0 0 0 0 0 0 0 ...
$ Class : chr "Novel" "Novel" "Novel" "Novel" ...
$ startExon : int NA NA NA NA NA NA NA NA NA NA ...
$ endExon : int NA NA NA NA NA NA NA NA NA NA ...
$ newGeneID : chr NA NA NA NA ...
$ newGeneSymbol : chr NA NA NA NA ...
$ isFusion : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
$ meanExprs : num 0.5 0.694 1.084 0.659 1.002 ...
$ Length : num 100 100 100 100 100 100 100 100 100 100 ...
$ passExprsCut : logi TRUE TRUE TRUE TRUE TRUE TRUE ...
$ logFC : num -0.0328 -0.1762 -0.1837 -0.24 0.0422 ...
$ AveExpr : num -3.66 -3.41 -3.15 -3.88 -3.08 ...
$ t : num -0.239 -1.222 -1.114 -1.962 0.286 ...
$ P.Value : num 0.8116 0.2228 0.266 0.0506 0.7747 ...
$ adj.P.Val : num 0.983 0.873 0.89 0.727 0.98 ...
$ B : num -5.13 -4.64 -4.72 -3.86 -5.12 ...
outTx : Formal class 'DataFrame' [package "S4Vectors"] with 6 slots
outExon outGene outJxn outTx
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: export.sh
- Contents:
TODO
- Script that created it: .merge-brainseq.dlpfc.sh
- Contents:
$ du -sh /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/brainseq/dlpfc/merged_fastq/
6.0T /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/DLPFC_RiboZero/brainseq/dlpfc/merged_fastq/
These files are named after the SAMPLE_ID
CharacterList
values stored in the phenotype tables (see any RSE object or the phenotype table). The Globus endpoint also includes the FASTQ files for samples that were excluded by this R script.
- Script that created it: .merge-brainseq.hippo.sh
- Contents:
$ du -sh /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/Hippo_RiboZero/brainseq/hippo/merged_fastq/
5.5T /dcl01/lieber/ajaffe/lab/brainseq_phase2/preprocessed_data/Hippo_RiboZero/brainseq/hippo/merged_fastq/
These files are named after the SAMPLE_ID
CharacterList
values stored in the phenotype tables (see any RSE object or the phenotype table). The Globus endpoint also includes the FASTQ files for samples that were excluded by this R script.
- Scripts that created them: .step3-hisat2-HIPPO_RiboZero_BrainSeq_Phase2.LIBD.sh and .step3-hisat2-DLPFC_RiboZero_BrainSeq_Phase2.LIBD.sh
- Contents:
$ du -sh /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/HISAT2_out/
6.5T /dcl01/ajaffe/data/lab/brainseq_phase1/preprocessed_data/HISAT2_out/
This is the BrainSEQ Phase 1 data (DOI: 10.1038/s41593-018-0197-y) re-processed using hg38 (unlike the originally published data using hg19) that was subsetted to the genes, exons, exon-exon junctions, and transcripts expressed in BrainSEQ Phase 2 using the script bsp1/data/subset_bsp1.R.
- JHPCE path:
/dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/
- Contents:
$ du -sh /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1*
2.0G /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1_exon.Rdata
157M /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1_gene.Rdata
635M /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1_jxn.Rdata
373M /dcl01/lieber/ajaffe/lab/brainseq_phase2/bsp1/data/bsp1_tx.Rdata
JHPCE location: /dcl01/lieber/ajaffe/lab/brainseq_phase2
NOTE: since 2023 the updated internal location is /dcs04/lieber/lcolladotor/BrainSEQ_LIBD001/brainseq_phase2