Skip to content

Jupyter notebooks and associated data for designing enzyme-modified peptides

License

Notifications You must be signed in to change notification settings

VoigtLab/ripp-design

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ripp-design

Jupyter notebooks and associated data for designing enzyme-modified peptides

To install:

All of the .py files in the packages folder along with the .py files in the ripp-analysis (github.com/VoigtLab/ripp-analysis) packages folder need to be downloaded and accessible either in the path or in the working directory (they need to be importable in python).

In particular, files needed are:

  • modification_rules.py
  • ripp_design.py
  • enzymeanalysis.py
  • enzymeplots.py
  • lcms.py
  • lcmsanalysis.py

The following additional packages are required:

  • Code was written, run, and tested with Python 3.7.10

  • matplotlib (tested with version 3.0.2)

  • pandas (tested with version 0.23.4)

  • numpy (tested with version 1.19.4)

  • seaborn (tested with version 0.9.0)

  • regex (tested with version 2.5.77)

  • biopython (tested with version 1.79)

  • scipy (tested with version 1.5.4)

  • tqdm (tested with version 4.46.0)

Total install time (download of files and installation of dependencies) should take less than an hour.

To Use:

Ipython notebooks are the best way to see how the software is used and serve as a "How To".

Folder "analysis"

Includes ipython notebooks and related metadata for analyzing LC-MS data of peptide variants to calculate enzyme-peptide constraints on modification.

extracts.xlsx is an excel spreadsheet that contains all metadata for the extracts, including the peptide and modifying enzyme present in the extract, the expected mass, the expected mass shift after modification, etc. This file is required to parse the raw LC-MS data. Unfortunately due to space constraints, the raw data is not available through github, but is available upon request.

dataset.pickle is a pickled pandas dataframe that is exported from the "Import extracts to dataframe" notebook. It contains post-processed LC-MS data for all of the extracts that is used by the other notebooks.

Notebooks are split up by goal:

  • Import extracts to dataframe.ipynb -- This must be run first, analyzes raw LC-MS data to export a processed dataframe that is used by the other notebooks. With the full dataset, this takes about 3 hours on our server running 20 threads. The final result is a pickled pandas dataframe. The folder 'extract_dataframes' must be present in the working directory in order to run this notebook. The 'extracts.xlsx' spreadsheet must also be present. The raw data files are available upon request.
  • Leader varaint analysis (Figure 1, SI Notes).ipynb -- This notebook details the process used to generate plots shown in Figure 1 and Supplementary Notes. It pulls data from 'dataset.pickle'
  • Core variant analysis (Figure 2, SI Notes).ipynb -- This notebook details the process used to generate plots shown in Figure 2 and Supplementary Notes. It pulls data from 'dataset.pickle'
  • SI Figures 2-5, 8-12 (Mod Validation).ipynb -- This notebook details the process used to generate plots shown in Supplementary Figures 2-5 and 8-12. It pulls data from 'dataset.pickle'. Unfortunately due to space constraints, the per extract dataframes containing all of the chromatographic data, necessary for running this notebook, cannot be uploaded. They can be generated by running the 'Import extracts to dataframe.ipynb' notebook on the raw data. Raw data and/or extract dataframes are available upon request.
  • Raw Chromatograms (SI Figure 6).ipynb -- This notebook details the process used to generate plots shown in Supplementary Figure 6. It pulls data from 'dataset.pickle'. Unfortunately due to space constraints, the per extract dataframes containing all of the chromatographic data, necessary for running this notebook, cannot be uploaded. They can be generated by running the 'Import extracts to dataframe.ipynb' notebook on the raw data. Raw data and/or extract dataframes are available upon request.

Folder "matplotlib"

Within analysis, this folder contains exported .pdf and .png plots from executing the code in the ipython notebooks

Folder "design"

Includes ipython notebook for designing new RiPPs based on peptide constraints. The example detailed in the notebook is the same as what is shown in Figure 3 of the manuscript.

Folder "tgn-sample-dataset"

Contains all the raw data, extract.xlsx file, per extract dataframes, and notebook for analyzing and exporting plots for just the enzyme TgnB. It is meant to be an example dataset to interact with the software. To use - open the ipython notebook in the folder, ensure that all packages are importable and dependencies installed, and execute the code. Analyzing the raw data should take ~1 hour, generating all plots takes ~30 minutes. These are rough estimates based on execute times on our 10-core server.

About

Jupyter notebooks and associated data for designing enzyme-modified peptides

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published