Skip to content

oloBion/oloPATH

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

oloPATH

oloPATH

Package for metabolomic pathways analysis.

Installation

oloPATH requires Python 3.10 and it can be installed from GitHub repository via:

pip install olopath@git+https://github.com/oloBion/oloPATH.git

Input data

oloPATH requires the following input data in CSV format:

  1. data: is a matrix containing metabolite information, where rows represent metabolites and columns correspond to metabolite IDs (alignid), metabolite names (metabolite_name), metabolite InChiKeys (inchikey) and individual samples with their associated intensities.
alignid metabolite_name inchikey C1 C2 C3 K1 K2 K3
1 Phenol sulfate CTYRPMDGLDAWRQ-UHFFFAOYSA-N 37833 19019 2648536 127311 2368521 19525
2 Nicotinamide DFPAKSUCGFBDDF-UHFFFAOYSA-N 122401 54811 418613 95612 101095 153269

  1. study_design: is a table of two columns that associates study groups (group) with individual samples (sample).
group sample
Control C1
Control C2
Control C3
KO K1
KO K2
KO K3

Species databases structure

Each database consists of two dictionaries: molecules and pathways.

  • molecules dictionary has ChEBI or PlantCyc molecules identifiers as keys and dictionaries containing the molecule name, short InChiKey and associated Reactome or PlantCyc pathways identfiers as values.
"molecules": {"10055": {"name": "Xamoterol",
                        "inchikey": "DXPOSRCHIDYWHW",
                        "pathways": ["R-HSA-162582",
                                     "R-HSA-372790",
                                     "R-HSA-373076",
                                     "R-HSA-375280",
                                     "R-HSA-390696",
                                     "R-HSA-500792"]
                       }
             }

  • pathways dictionary has Reactome or PlantCyc pathways identifiers as keys and dictionaries containing the pathway name as values.
"pathways": {"R-HSA-162582": {"name": "Signal Transduction"},
             "R-HSA-372790": {"name": "Signaling by GPCR"},
             "R-HSA-373076": {"name": "Class A/1 (Rhodopsin-like receptors)"},
             "R-HSA-375280": {"name": "Amine ligand-binding receptors"},
             "R-HSA-390696": {"name": "Adrenoceptors"},
             "R-HSA-500792": {"name": "GPCR ligand binding"}
            }

Running oloPATH

Import oloPATH as a Python library and perform your pathway analysis following the instructions:

import olopath.oloutils as ut
from olopath.DataObj import DataSource
from olopath.PATHAnalysis import PATHAnalysis

  1. Load your data

Ensure your input files are in the proper format as specified in the Input Data section and run the load_data function, specifying the case and control groups for analysis. The function will return annotation_df, intensity_df and std_design objects.

annotation_df, intensity_df, study_design =\
    ut.load_data(data = "data.csv", study_design = "study_design.csv",
                 case='KO', control='Control')

  1. Initialize DataSource object

Create the DataSource object with the loaded data in the previous step, specifiying the species, pvalue and foldchange2, logscale and mode parameters.

ds = DataSource(intensity_df=intesity_df,
                annotation_df=annotation_df,
                study_design=study_design,
                species='Homo sapiens',
                pvalue=0.05,
                foldchange2=[-0.5, 0.5],
                logscale=False,
                mode='1/10')

Considerations for the object parameters:

  • species: indicates the species for which the analysis is conducted. oloPATH supports humans (Homo sapiens), mouse (Mus musculus) and plants (PlantCyc).
  • pvalue: specifies the threshold for statistical significance. It is set to 0.05 by default.
  • foldchange2: specifies the cutoff values for log2(fold change). It is set to [-0.5, 0.5] by default.
  • logscale: indicates if data will be log-transformed for p-values computation. It set to False by default.
  • mode: indicates the approach for handling missing values, with three options: 1/10 (replace missing values with 1/10 of minimum positive value of each analyte), 1/5 (replace with 1/5 of minimum positive value of each analyte) and 1 (replace with 1).

  1. Run PATHAnalysis

Run PATHAnalysis with the DataSource object created in the previous step to perform pathway analysis. The class will return pathway_df and metabolites_df objects.

PATH = PATHAnalysis(ds)
pathway_df, metabolites_df = PATH.get_results(filter_by_hits=1)

Pathways can be filtered by hits using the filter_by_hits parameter, which is set to 1 by default.

About

Package for metabolic pathway analysis

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages