Skip to content

Code repository for the paper "Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic"

Notifications You must be signed in to change notification settings

Bornelov-lab/Camformer

Repository files navigation

Camformer

Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic

Problem: Let $S = {A,C,G,T,N}^{110}$ denote a promoter sequence of length $110$. Here, $A$, $C$, $G$, $T$ are the four nucleotides and $N$ represents an unknown nucleotide. The gene expression prediction task is then to learn a mapping $f: S \to \mathbb{R}$.

Graphical abstract

Data: We use data from DREAM Challenge consisting of 7 million random promoter sequences and the yellow fluorescent protein level. We then use the official test set from the challenge to evaluate our trained model(s).

Model: A residual convolutional neural network, strategically optimised using automated hyperparameter tuning.

Search for a model

The figure above shows the structure of the original (large variant) model (16M parameters). There is an almost equally good model that has 90% less parameters (1.4M). Please see the associated manuscript (preprint) for more details.

Assessment: Predictive, comparative

Evaluating a trained model

Assessment: Explanatory, Scientific discovery

Evaluating a trained model for explanatory assessment

File information

Here are some details on what the purpose of each file is:

File Purpose
gen_figs.ipynb A notebook to show (re-generate) some figures in the manuscript.
train_rep.py Program to train several replicates of a Camformer model using training data.
score_rep.py Program to test several replicates of a trained Camformer model on test data.

Directory structure

Directory Contents
base Contains core codebase, utility functions, auxiliary helper files etc.
manuscript_figures Contains data, script and figures present in the manuscript.
readme_figs Images used to prepare this nice-looking README file.
analysis Contains some basic analysis of results. Contents may be updated.

References

Relevant resources and previous Camformer repositories.

  1. Camformer repository (2022 version): DREAM2022 Submission
  2. DREAM 2022 Challenge Wiki Page
  3. Rafi et al., 2023: Paper
  4. Rafi et al., 2023: Official Evaluation

About

Code repository for the paper "Predicting gene expression using millions of yeast promoters reveals cis-regulatory logic"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published