ISA-Tab annotation for the "SARS-CoV-2 infected host cell proteomics reveal potential therapy targets" publication
This is part of an effort to (re-)annotate
https://dx.doi.org/10.21203/rs.3.rs-17218/v1
Available from PRIDE at https://www.ebi.ac.uk/pride/archive/projects/PXD017710 and MassIVE/CCMS Maestro+MSstats reanalysis of MSV000085096 / PXD017710
The formatting and reannotation are based on information extracted from:
- the original publication
- the supplementary tables available from the publishers site
- the 'filtered-results.csv' helper file as supplied to @sneumann during the HUPO-PSI
Viewing the ISA-tab formatted and reannotated PXD017710 with ISATab-Viewer
Viewing the ISA-tab formatted and reannotated PXD017710 locally, do the following:
python -m http.server 8000
Then point your browser to http://0.0.0.0:8000/isaviewer-demo.html
-
initial structure of the study design in ISA format:
-
linkage of Proteome and Translatome data (supplementary material) to ISA assay tables (via Derived Data File)
-
processing the Proteome and Translatome data (supplementary material) with python pandas library to generate the following csv files:
- proteome_intensities_long_table_ggplot2.txt
- proteome_diffanal_ratio_pvalue_long_table_ggplot2.txt
- translatome_intensities_long_table_ggplot2.txt
- translatome_diffanal_ratio_pvalue_long_table_ggplot2
The files are
long table
corresponding to amelt
on the Excel file originally generated by the users and can be readily loaded in R ggplot2 library for graphical representation. The statistical relevant elements have been annotated with the STATO ontology and the tables comply with a Frictionless.io Data Package. The jupyter notebook for the transformation is available. -
conversion of raw data to mzML format
install docker:
>brew update
>brew install docker
sign in to docker
>docker start
>docker login
pull docker container for ProteoWizard:
>docker pull chambm/pwiz-i-agree-to-the-vendor-licenses
in order to be able to reach
https://hub.docker.com/r/chambm/pwiz-skyline-i-agree-to-the-vendor-licenses
run the pwiz tool from the container over the raw data:
docker run -it --rm -e WINEDEBUG=-all -v /Users/philippe/Downloads/PXD017710/raw/:/data chambm/pwiz-skyline-i-agree-to-the-vendor-licenses wine msconvert /data/*.raw --mzML
- ontology markup for:
- declaration of independant variables as ISA Study Factors:{biological agent, dose, timepoint,replicate} ->OBI
- Taxonomic information (host cells and virus) -> NCBITaxonomy
- Cell line: CaCo-2 cells -> Cell Line Ontology
- Disease: Colon Cancer -> Human Phenotype Ontology
- MS specific aspect (TMT reagent, instrument ... ) -> PSI-MS
- Statistical Tests -> STATO
-
ambiguities related to Tandem Mass Tag labeling protococol
- the publication mentions TMT11 (see Figure 2 in https://www.researchsquare.com/article/rs-17218/v1)
- the information available from PRIDE mentions TMT6 (https://www.ebi.ac.uk/pride/archive/projects/PXD017710) This may require another round of annotation on the TMT agents and fractions in the ISA a_assay representation
-
SARS-Cov2 isolate: no clear NCBI Taxonomic anchoring and unclear origin: -> the markup is made to the parent class (as of 06.04.2020)
The default ISA configuration from https://isa-tools.org/format/configurations/index.html was used for validation.
Code snipet showing how to invoke the python ISA validator from the isatools API
import isatools
import os
from isatools import isatab
my_json_report = isatab.validate(open(os.path.join('PXD017710', 'i_PXD017710.txt')))
print(my_json_report)
Data Formats | Terminologies | Models |
---|---|---|
Investigation Study Assay (ISA) | CLO | |
mzML | OBI | |
NCBI taxonomy | ||
HP | ||
MS | ||
STATO |
Name | Affiliation | orcid | CrediT role |
---|---|---|---|
Steffen Neumann | IPB-Halle | 0000-0002-7899-7192 | Writing - First Draft |
Philippe Rocca-Serra | Data Readiness Group, Department of Engineering Science, University of Oxford, | 0000-0001-9853-5668 | Writing - Review |