# MTBLS233 with PhenoMeNal Jupyter
In this page we introduce an OpenMS preprocessing workflow, and R downstream analysis that you can run using the Jupyter fronted, that is provided by PhenoMeNal.

## Introduction
The aim of [the study](http://www.sciencedirect.com/science/article/pii/S000326701630647X) performed on MTBLS233 was to produce quantitative information of the highest possible number of reliable features in untargeted metabolomics. Three different approaches of mass spectromic acquisition parameter tuning were tested to see which gave the highest number of spectral features.

In this proof-of-principle workflow we recreate the workflow used in the MTBLS233 study in a distributed manner to run on the PhenoMeNal platform. The workflow was originally implemeted in [OpenMS](https://www.openms.de/) v. 1.1.1. followed by the downstream analysis in [KNIME](https://www.knime.org/). Here we fire up and controll the pipeline with Jupyter where the preprocessing in OpenMS has been wrapped in Docker containers to facilitate scaling, and the downstream analysis written in R has been extracted and implemented directly in Jupyter.

## Run the preprocessing workflow

Start by opening Jupyter in your browser at: 

`http://notebook.<deployment-id>.phenomenal.cloud/`

### Ingest the MTBLS233 dataset from MetaboLights

[MetaboLights](http://www.ebi.ac.uk/metabolights/) offers an FTP service, so we can ingest the MTBLS233 dataset with Linux commands. 

1. First open a Jupyter terminal: `New > Terminal`
2. Ingest the dataset using **wget**:

```bash
wget ftp://anonymous@ftp.ebi.ac.uk/pub/databases/metabolights/studies/public/MTBLS233/*.mzML -P MTBLS233/data/
```

### Run the preprocessing workflow with Luigi

In order to run the preprocessing analysis we use the [Luigi](https://github.com/spotify/luigi) wrokflow system. Please notice that this is a heavy analysis, and to run it successfully you will have to deploy a moderately large number of fat nodes in your cloud provider. To run the preprocessing workflow please run:

```bash
cd MTBLS233 
export PYTHONPATH=./ 
luigi --module preprocessing_workflow AllGroups \
  --scheduler-host luigi.default \
  --workers <parallelism-level>
```

> **Warning**: Remember to substitute `<parallelism-level>` with the number of parallel processes that you aim to spawn in the cluster

If everithing goes well you'll be able to monitor the progress of your analysis at:

`http://luigi.<deployment-id>.phenomenal.cloud/`

## Run the downstream analysis

To open the downstream analysis notebook, please go to: 

`MTBLS233 > downstream-analysis > downstream-analysis.ipynb` 

The CSV output generated by the TextExporter in OpenMS will be saved in the `MTBLS233/results` directory and it is set as input in their respective mass range.

To run the workflow click `Cell > Run All`.

After successfully running the whole workflow you may change the parameters to see the impact on the result.