The script parses and processes PDB files generated by AlphaFold. It expects the pLDDT score in the B-factor column. As intermediate (mandatory) step it calculates the Relative Solvent Accessibility (RSA) as provided by DSSP and BioPython.
- Python3
- NumPy
- Pandas
- BioPython
- DSSP 3.x ("mkdssp" executable)
The script takes in input a folder with PDB files and writes two TSV files.
python3 alphafold_disorder.py -i pdbs/ -o out.tsv
- rsa_window (default 25) - RSA values are smoothed over a window centered on the residue to predict
- rsa_threshold (default 0.581) - Binding predictions are overweighted when disorder prediction is above this threshold
Both parameters take a space separated list of values (floats). The program generates an output for each possible combination of the provided lists.
By default, the program uses the TSV format and generates two files out_data.tsv and out_pred.tsv, representing intermediate calculation (DSSP output) and the final prediction, respectively. The last two columns (disorder-<rsa_window>, binding-<rsa_window>-<rsa_threshold>) are the relevant ones representing the disorder and binding propensities.
name pos aa lddt disorder rsa disorder-25 binding-25-0.581
P47710 1 M 0.688 0.312 1.000 0.680 0.869
P47710 2 R 0.832 0.168 0.879 0.691 0.929
P47710 3 L 0.850 0.150 0.854 0.696 0.937
P47710 4 L 0.863 0.137 0.756 0.705 0.943
...
Q5RJL0 67 V 0.502 0.498 0.951 0.896 0.791
Q5RJL0 68 L 0.511 0.489 1.000 0.881 0.795
Q5RJL0 69 P 0.449 0.551 0.787 0.866 0.769
Q5RJL0 70 R 0.514 0.486 1.000 0.864 0.796
...
The CAID format can be generated with the command below.
python3 alphafold_disorder.py -i pdbs/ -o out.tsv -f caid
The program will generate different files for different types of prediction and different combination of parameters:
- out_disorder.dat, disorder based on pLDDT
- out_disorder-<rsa_window>.dat, disorder based on RSA and smoothed over a window
- out_binding-<rsa_window>-<rsa_threshold>.dat, binding prediction wighted based on a threshold on the smoothed RSA
>P47710
1 M 0.68
2 R 0.691
3 L 0.696
4 L 0.705
...
67 V 0.896
68 L 0.881
69 P 0.866
70 R 0.864
...
Piovesan D, Monzon AM, Tosatto SCE.
Intrinsic protein disorder and conditional folding in AlphaFoldDB.
Protein Sci. 2022 Nov;31(11):e4466.
PMID: 36210722
PMCID: PMC9601767.