BioWardrobe backend (airflow+cwl)

About

Python package to replace BioWardrobe's python/cron scripts. It uses Apache-Airflow functionality with CWL v1.0.

Install

Add biowardrobe MySQL connection into Airflow connections

select * from airflow.connection;
insert into airflow.connection values(NULL,'biowardrobe','mysql','localhost','ems','wardrobe','',null,'{"cursor":"dictcursor"}',0,0);

Install
```
sudo pip3 install .
```

Requirements

Make sure your system satisfies the following criteria:

Ubuntu 16.04.3

python3.6

sudo add-apt-repository ppa:jonathonf/python-3.6
sudo apt-get update
sudo apt-get install python3.6

pip3

curl https://bootstrap.pypa.io/get-pip.py | sudo python3.6
pip3 install --upgrade pip3

setuptools
```
pip3 install setuptools
```

docker

sudo apt-get update
sudo apt-get install apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install docker-ce
sudo groupadd docker
sudo usermod -aG docker $USER

Log out and log back in so that your group membership is re-evaluated.

libmysqlclient-dev
```
sudo apt-get install libmysqlclient-dev
```
nodejs
```
sudo apt-get install nodejs
```

Get the latest version of cwl-airflow-parser. If Apache-Airflow or cwltool aren't installed, installation will be done automatically with recommended versions. Set AIRFLOW_HOME environment variable to airflow config directory default is ~/airflow/.
```
git clone https://github.com/datirium/cwl-airflow-parser.git
cd cwl-airflow-parser
sudo pip3 install .
```
If required, add extra airflow packages for extending Airflow functionality, for instance, with MySQL support pip3 install apache-airflow[mysql].

Running

To create BioWardrobe's dags run biowardrobe-init in airflow's dags directory
```
cd ~/airflow/dags
./biowardrobe-init 
```
Run Airflow scheduler:
```
airflow scheduler
```

Use airflow trigger_dag with input parameter --conf "JSON" where JSON is either job definition or biowardrobe_uid and explicitly specified cwl descriptor dag_id.

airflow trigger_dag --conf "{\"job\":$(cat ./hg19.job)}" "bowtie-index"

where hg19.job is:

{
  "fasta_input_file": {
    "class": "File", 
    "location": "file:///wardrobe/indices/bowtie/hg19/chrM.fa", 
    "format":"http://edamontology.org/format_1929",
    "size": 16909,
    "basename": "chrM.fa",
    "nameroot": "chrM",
    "nameext": ".fa"
  },
  "output_folder": "/wardrobe/indices/bowtie/hg19/",
  "threads": 6,
  "genome": "hg19"
}

All the output will be moved from temporary directory into output_folder parameter of the job.

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
biowardrobe_airflow_analysis		biowardrobe_airflow_analysis
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioWardrobe backend (airflow+cwl)

About

Install

Requirements

Running

About

Releases

Packages

Contributors 3

Languages

License

Barski-lab/biowardrobe-airflow-analysis

Folders and files

Latest commit

History

Repository files navigation

BioWardrobe backend (airflow+cwl)

About

Install

Requirements

Running

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages