Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crux Pipeline #7

Open
max-mapper opened this issue Jul 13, 2022 · 1 comment
Open

Crux Pipeline #7

max-mapper opened this issue Jul 13, 2022 · 1 comment

Comments

@max-mapper
Copy link

image

Here's an idea for how to parallelize the building of crux dbs. All worker types plus the scheduler + combiner are different types of docker images deployed with kubernetes.

  • scheduler has a redis database and issues jobs to workers, and keeps track of job state
  • obi-downloader workers get assigned a list of SRA accessions to download, downloads them in parallel, and after downloading converts them to fasta using fasterq-dump and then builds an obitools database for them, and then stores the fasta and obitools in a ceph folder
  • ecopcr workers get assigned a set of obitools databases, and are given a set of primers to run, and performs ecopcr against the databases using the primers, and stores the results in a ceph folder
  • blastn is given a set of ecopcr output queries and a set of blast databases to query against, and runs blastn in parallel, storing results in a ceph folder
  • combiner takes all of the blast results from ceph, combines them all (including deprelication), and then builds a bowtie2 database which is the final output stored in ceph

There are approximately 1.2 million SRA accessions for WGS projects, and ~64 NT chunks (nt.00.tar.gz etc). So blastn workers for example will receive some subset of the 1.2 million SRA accessions, plus an assignment to BLAST against one of the 64 NT chunks

@max-mapper
Copy link
Author

QC command

 /home/max/miniconda3/pkgs/singularity-3.8.6-h9c2343c_0/bin/singularity exec    -B /home/max/src/crux/anacapa /home/max/src/cruxcontainer/anacapa/anacapa-1.5.0.img /bin/bash    -c "/home/max/src/crux/anacapa/anacapa_db/anacapa_QC_dada2.sh    -i /home/max/src/crux/tronko-test    -o /home/max/src/crux/tronko-test/out/12S  -d /home/max/src/crux/anacapa/anacapa_db    -f /home/max/src/crux/tronko-test/forward_primers.txt    -r /home/max/src/crux/tronko-test/reverse_primers.txt    -e /home/max/src/crux/anacapa/anacapa_db/metabarcode_loci_min_merge_length.txt    -a nextera    -t MiSeq    -l    -m 50  -q 30"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant