Skip to content

Code and datasets used in the manuscript "Evolutionary and Structural Constraints Influencing Apolipoprotein A-I Amyloid Behavior"

Notifications You must be signed in to change notification settings

tomasMasson/APOA1_evolution

Repository files navigation

Evolutionary and Structural Constraints Influencing Apolipoprotein A-I Amyloid Behavior 📄


In this repo you will find the code and data employed to prepare the figures and the paper.

A manuscript describing the results from this work has been posted on BioRxiv (Preprint) and is also available within the folder BioRxiv_manuscript.

Sequence datasets (data/)

Evolutionary studies were conducted using sequence data from Ensembl and RefSeq databases.

Below you can find the link to retrieve protein sequence:

Nucleotide sequences from Ensembl can be downloaded from the same link detailed above. In the case of RefSeq entries, nucleotide sequences were retrieve using NCBI Entrez Programming Utilities.

Phylogeny and molecular evolution of apoA-I (molecular_evolution/)

Contains the phylogenetic reconstruction of apoA-I evolution with IQ-TREE (APOA1_phylogeny.treefile) and the evolutionary rates inferred with HyPhy (the file evolution_dataset.csv contains all the data used for visualizations).

Amyloid aggregation tendency (amyloid_aggregation/)

To compute the aggregation propensity of each protein sequence in our dataset we employed TANGO with default settings. The file aprs_dataset.csv contains all the aggregating regions predicted for apoA-I sequences.

Structural features (structural_features/)

Gaussian network model fluctuations (apoa1_msf.csv) and weighted contact numbers (apoa1.wcn.csv) were computed with the ProDy and with a custom script from clauswilke/proteinER, respectively. We used Camsol (Structurally-corrected protein solubility prediction) and ZipperDB (database of fibril-forming protein segments) to understand the contribution of apoA-I structure (link) to its aggregation tendency (camsol_solubility.txt and zipperdb.csv).

In silico saturation mutagenesis (in-silico_mutagenesis/)

We used FoldX to calculate the theoretical thermodynamic destabilization effect of each possible amino acid substitution in apoA-I sequence (foldx_dataset.csv) and automated this task with the aid of the Mutatex pipeline.

MutateX command was run inside mutatex-env (as recommended in the repo)

~/mutatex/bin/mutatex apoa1-hdl.pdb --foldx-binary ~/foldx5Linux64.tar__0/foldx --rotabase rotabase.txt --np 4 --binding-energy --foldx-log --clean deep --compress

Pathogenicity scores (rhapsody_dataset.csv) were calculated with the Rhapsody server. Apoa-I natural variants were extracted from gnomAD. The file variants_dataset.csv contains all the data used for visualizations.

Visualization (viz/)

Code and datasets used for visualization, together with the .svg figure files.

About

Code and datasets used in the manuscript "Evolutionary and Structural Constraints Influencing Apolipoprotein A-I Amyloid Behavior"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published