Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manually supplied reference indexes not found if not ending in .fasta #150

Closed
jfy133 opened this issue Feb 22, 2019 · 9 comments
Closed
Labels
bug Something isn't working

Comments

@jfy133
Copy link
Member

jfy133 commented Feb 22, 2019

Describe the bug

When supplying a reference FASTA file, and manually supplying pre-generated reference indexes (bwa, samtools, picard etc.), the current behavior of EAGER2 renaming a supplied fasta from the file ending to specifically .fasta breaks finding of the corresponding indices.

i.e. if the reference and and index file are supplied as reference.fa and reference.fa.idx, it appears EAGER2 renames reference.fa to reference.fasta, and then looks for reference.fasta.idx - which does not exist.

See #131 as this the original source of the issue.

Note: the error message is quite unclear and requires some hunting/familiarity with the code and would not be useful for a new user.

N E X T F L O W  ~  version 0.28.0
Launching `nf-core/eager` [silly_jang] - revision: 2786af36de [2.0.5]
ERROR ~ Unknown argument 'checkIfExists' for operator 'path' -- Possible arguments: type, followLinks, hidden, maxDepth, exists, glob, relative

 -- Check script 'main.nf' at line: 256 or see '.nextflow.log' file for more details

To Reproduce
This was run using EAGER version 2.0.5

nextflow run nf-core/eager \
-c "$NXF_PROFILE" \
-profile shh_custom \
--reads "$PROJDIR/03-preprocessing/unmapped_nonhg19_reads/*_R1_*fastq.gz" \
--singleEnd \
--fasta "$PROJDIR/01-data/reference_genomes/$reference/$reference.fa" \
--bwa_index "$PROJDIR/01-data/reference_genomes/r$reference/" \
--seq_dict "$PROJDIR/01-data/reference_genomes/$reference/$reference.fa.dict" \
--fasta_index ""$PROJDIR/01-data/reference_genomes/$reference/$reference.fa.fai \
--outdir ""$PROJDIR/04-analysis/mappings/$reference \
--name '$reference_mapping' \
--max_cpus 4 \
--max_mem '32.GB' \
--skip_preseq \
--min_adap_overlap 1 \
--clip_readlength 30 \
--clip_min_read_quality 20 \
--bwaalnn 0.01 \
--bwaalnl 32 \
--bam_discard_unmapped \
--bam_unmapped_type fastq \
--dedupper dedup \
--dedup_all_merged \
-with-dag flowchart.pdf \
-resume \
-r 2.0.5
@jfy133
Copy link
Member Author

jfy133 commented Feb 24, 2019

@apeltzer thinking about it, was it a nextflow specific reason why you decided to rename it as .fasta? Couldn't you just extract the the file-suffix of the supplied reference and store this as a variable?

@jfy133 jfy133 added the bug Something isn't working label Feb 24, 2019
@apeltzer
Copy link
Member

I just thought it would make sense to have a single point of failure for these kinds of things. Having references being named (fa, fna, fn, fastA, fasta ...) makes things potentially a bit annoying, especially when some tools expect e.g. *.fasta .... so my idea was initially to just always rename to .fasta ....

@apeltzer
Copy link
Member

I can store it in a single process, but you cannot have a process setting a params.ref_extension for example...

@jfy133
Copy link
Member Author

jfy133 commented Feb 24, 2019

I just thought it would make sense to have a single point of failure for these kinds of things. Having references being named (fa, fna, fn, fastA, fasta ...) makes things potentially a bit annoying, especially when some tools expect e.g. *.fasta .... so my idea was initially to just always rename to .fasta ....

Mm true. But we could as far as possible give the most common (.fna. is default from NCBI). And otherwise specify in documentation/help message they must be those most common.

I can store it in a single process, but you cannot have a process setting a params.ref_extension for example...

Fair enough. If that makes life more complicated don't worry. Maybe it also makes sense to add a note to the documentation saying it's 'better' if the reference name ends in .fasta. Although making sure the indicies also get renamed if someone does give a different suffix would still be important.

@apeltzer
Copy link
Member

Agreed! I’ll give this some thought once this week is over 😅😅

@apeltzer
Copy link
Member

Is this the case when you recreate an index once or is it just the case because you already have existing reference indices with the formerly allowed *.fa extensions? In the former case (when EAGER2) create indices properly, I just would like to "push" users to recreate the indices once, which is okayish I guess..

@jfy133
Copy link
Member Author

jfy133 commented Feb 25, 2019

The latter case. We had already created the indices a year ago or so. For many groups who work on the same organism will already have that...

@apeltzer
Copy link
Member

Working on this in #194

@apeltzer
Copy link
Member

Fixed in #194

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants