Python package to replace BioWardrobe's python/cron scripts. It uses Apache-Airflow functionality with CWL v1.0.
- Add biowardrobe MySQL connection into Airflow connections
select * from airflow.connection; insert into airflow.connection values(NULL,'biowardrobe','mysql','localhost','ems','wardrobe','',null,'{"cursor":"dictcursor"}',0,0);
- Install
sudo pip3 install .
-
Make sure your system satisfies the following criteria:
- Ubuntu 16.04.3
- python3.6
sudo add-apt-repository ppa:jonathonf/python-3.6 sudo apt-get update sudo apt-get install python3.6
- pip3
curl https://bootstrap.pypa.io/get-pip.py | sudo python3.6 pip3 install --upgrade pip3
- setuptools
pip3 install setuptools
- docker
Log out and log back in so that your group membership is re-evaluated.
sudo apt-get update sudo apt-get install apt-transport-https ca-certificates curl software-properties-common curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add - sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" sudo apt-get update sudo apt-get install docker-ce sudo groupadd docker sudo usermod -aG docker $USER
- libmysqlclient-dev
sudo apt-get install libmysqlclient-dev
- nodejs
sudo apt-get install nodejs
- python3.6
- Ubuntu 16.04.3
-
Get the latest version of
cwl-airflow-parser
. If Apache-Airflow or cwltool aren't installed, installation will be done automatically with recommended versions. SetAIRFLOW_HOME
environment variable to airflow config directory default is~/airflow/
.git clone https://github.com/datirium/cwl-airflow-parser.git cd cwl-airflow-parser sudo pip3 install .
-
If required, add extra airflow packages for extending Airflow functionality, for instance, with MySQL support
pip3 install apache-airflow[mysql]
.
-
To create BioWardrobe's dags run
biowardrobe-init
in airflow's dags directorycd ~/airflow/dags ./biowardrobe-init
-
Run Airflow scheduler:
airflow scheduler
-
Use
airflow trigger_dag
with input parameter--conf "JSON"
where JSON is either job definition or biowardrobe_uid and explicitly specified cwl descriptordag_id
.airflow trigger_dag --conf "{\"job\":$(cat ./hg19.job)}" "bowtie-index"
where
hg19.job
is:{ "fasta_input_file": { "class": "File", "location": "file:///wardrobe/indices/bowtie/hg19/chrM.fa", "format":"http://edamontology.org/format_1929", "size": 16909, "basename": "chrM.fa", "nameroot": "chrM", "nameext": ".fa" }, "output_folder": "/wardrobe/indices/bowtie/hg19/", "threads": 6, "genome": "hg19" }
-
All the output will be moved from temporary directory into output_folder parameter of the job.