This repository contains Python code, Jupyter Notebooks, and data for reproducing the results presented in the manuscript Conformational ensembles of the human intrinsically disordered proteome.
The CSV file IDRome_DB.csv
lists amino acid sequences, sequence features, and conformational properties of all the 28,058 IDRs.
Simulation trajectories and time series of conformational properties are available for all the IDRs at sid.erda.dk/sharelink/AVZAJvJnCO.
We also provide Notebooks on Google Colab to (i) generate conformational ensembles of user-supplied sequences using the CALVADOS model and (ii) predict scaling exponents and conformational entropies per residue using the SVR models:
A new version of the Colab notebook is available in the ColabCALVADOS repository.
seq_conf_prop.ipynb
reproduces Fig. 1, 3, and Extended Data Fig. 2, 5, 6e-t, and 7go_analysis.ipynb
reproduces Fig. 2conservation_analysis.ipynb
reproduces Fig. 4clinvar_fmug.ipynb
reproduces Fig. 5 and Extended Data Fig. 9uniprot_domains.ipynb
reproduces Extended Data Fig. 1svr_models.ipynb
reproduces Extended Data Fig. 8go_uniprot_calls.ipynb
performs API calls to obtain gene ontology terms from UniProtcalc_seq_prop.ipynb
andcalc_seq_prop_SPOT.ipynb
compute sequence descriptors and generate theIDRome_DB.csv
andIDRome_DB_SPOT.csv
filesCALVADOS_tests.ipynb
reproduces Extended Data Fig. 3AF2_PAEs.ipynb
reproduces Extended Data Fig. 4CD-CODE.ipynb
reproduces Extended Data Fig. 6a-dmd_simulations/
contains code and data related to single-chain simulations performed using the CALVADOS model and HOOMD-blue v2.9.3 installed with mphowardlab/azpluginsidr_selection/
contains code and data to generate the pLDDT-based and SPOT-based sets of IDRsidr_orthologs/
contains code and data to generate the set of orthologs of human IDRssvr_models/
contains scikit-learn SVR models generated insvr_models.ipynb
zscores/
contains code and data to calculate NARDINI z-scoresgo_analyses/
contains input and output data related to the Gene Ontology analyses ingo_analysis.ipynb
QCDPred/
contains code and data related to QCD calculationsclinvar_fmug_cdcode/
contains code and data related to the analysis of the ClinVar, FMUG, and CD-CODE databases
To open the Notebooks, install Miniconda and make sure all required packages are installed by issuing the following terminal commands
conda env create -f environment.yml
source activate idrome
jupyter-notebook
Commands to install HOOMD-blue v2.9.3 with mphowardlab/azplugins v0.11.0
curl -LO https://github.com/glotzerlab/hoomd-blue/releases/download/v2.9.3/hoomd-v2.9.3.tar.gz
tar xvfz hoomd-v2.9.3.tar.gz
git clone https://github.com/mphowardlab/azplugins.git
cd azplugins
git checkout tags/v0.11.0
cd ..
cd hoomd-v2.9.3
mkdir build
cd build
cmake ../ -DCMAKE_INSTALL_PREFIX=<path to python> \
-DENABLE_CUDA=ON -DENABLE_MPI=ON -DSINGLE_PRECISION=ON -DENABLE_TBB=OFF \
-DCMAKE_CXX_COMPILER=<path to g++> -DCMAKE_C_COMPILER=<path to gcc>
make -j4
cd ../hoomd
ln -s ../../azplugins/azplugins azplugins
cd ../build && make install -j4