Hello! Welcome to the analysis code for Ho et al. 2025
This folder contains data necessary for the MPRA analysis in Ho et al. 2025. There are 9 steps to the MPRA analysis which must be done in order. The CRISPR and V2G analyses are separate (steps 10 and 11)
Order of operations:
Use this to eliminate a SNP which isn't associated with any diseases in the study (I already elimianted these SNPs from the jurkat and t-cell files)
Use this to make some columns for the MPRA big table. This code incorporates data with TF binding site programs, motifbreakR and Ananastra.
Use this to create a big table for the primary t-cell MPRA. This code takes the basic MPRA information and contextualizes it with linkage disequillirium, epigenetic and transcription factor binding data. Human genome liftover data is incorporated to have seperate hg19 and 38 tables. To create all the plots in the other code files below this you need this table.
Use this to create a big table for the jurkat MPRA. To create all the plots in the other code files below this you need this table.
This code will analyse the enrichment for DHS sites for high activity variants and is used to estimate the cut-offs for high activity variants (p-CREs) and allelic-specific expression variants (emVars). It will create Supp. figure 2 c & d and supp. tables 32 & 33. If you're doing your own MPRA use this to calibrate the variant category filter aka. the mpra_sig column (made in mpra_big_table.Rmd). This code will have the variant category filters for Ho et al., but you can calculate your own!
Use this to create the UKBB finemapping enrichment plots in Supp. figure 5.
Use this to create most of the MPRA plots (figure 1 b-f, supp. figure 3 & 4)
This code is used to calculate the TF motif enrichments for MPRA variants and create the relevant plots. (Figure 2, Supp. figure 6)
Finally using all the tables which are relevant to the MPRA data created so far, I put the tables into the final format which appears in the paper's supplementary tables (not including anything having to do with CRISPR or V2G)
This Jupyter notebook generates variant-to-gene (V2G) mapping for rsIDs of interest. Key steps include:
Converting rsIDs to variant IDs using genopyc
Mapping variants to genes with the V2G otargen pipeline
Processing T cell expression data from the DICE database
Filtering V2G output based on cell-specific expression
Creating background and foreground datasets for network analysis
Requires Python (pandas, genopyc, polars) and R (otargen, purrr, dplyr, readr) libraries. Outputs include filtered V2G data and gene sets for further analysis.
This markdown file uses Seurat and SCEPTRE to analyze single-cell CRISPR screen data.