Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing input files at dry-run #425

Open
nacasfer opened this issue Jan 24, 2023 · 13 comments
Open

Missing input files at dry-run #425

nacasfer opened this issue Jan 24, 2023 · 13 comments

Comments

@nacasfer
Copy link

Dear Drop-Team
I am runnign into the drop dry-run (release 1.2.4) with command snakemake -n but getting following error. Config file and samplesheet are attached.

WARNING: Less than 30 IDs in DROP_GROUP sp
check for missing R packages
MonoallelicExpression has been turned off in the config file
rnaVariantCalling has been turned off in the config file
Structuring dependencies...
Dependencies file generated at: /tmp/tmp6sr26_ja

Building DAG of jobs...
WorkflowError:
WorkflowError:
WorkflowError:
WorkflowError (rule AberrantExpression_pipeline_Counting_mergeCounts_R, line 135, /tmp/tmp6sr26_ja):
Function did not return str or list of str.
MissingInputException: Missing input files for rule markdown:
output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/AberrantExpression/Counting/gtf108/Summary_abexp.html
wildcards: file=AberrantExpression/Counting/gtf108/Summary_abexp
affected files:
AberrantExpression/Counting/gtf108/Summary_abexp.md
MissingInputException: Missing input files for rule markdown:
output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_AberrantExpression_pipeline_Counting_Datasets.html
wildcards: file=Scripts_AberrantExpression_pipeline_Counting_Datasets
affected files:
Scripts_AberrantExpression_pipeline_Counting_Datasets.md
MissingInputException: Missing input files for rule markdown:
output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/aberrant-expression-pipeline_index.html
wildcards: file=aberrant-expression-pipeline_index
affected files:
aberrant-expression-pipeline_index.md

Also, --verbose output has been attached.
What I am messing up?
Many thanks in advance for that awesome tool. Cheers!

config.yaml.txt
sample_annotation.tsv.txt
verboseOutput.txt
Snakefile.txt

@vyepez88
Copy link
Collaborator

Hi, it seems the values of the GENE_ANNOTATION column are missing for the external counts. It should be: gtf108 to match your gtf file. Have a look here: https://gagneurlab-drop.readthedocs.io/en/latest/prepare.html#external-count-examples
You can then check first by running:
snakemake -n sampleAnnotation

@nacasfer
Copy link
Author

nacasfer commented Jan 24, 2023

Hi Vicente,
Many thanks in advance for a so fast response.
I updated the samplesheet file, but i'm stil having issues with the exportCounts module
snakemake -c6 exportCounts (verbose log file attached)

WARNING: Less than 30 IDs in DROP_GROUP sp
check for missing R packages
MonoallelicExpression has been turned off in the config file
rnaVariantCalling has been turned off in the config file
Structuring dependencies...
Dependencies file generated at: /tmp/tmpo5jhoczo

Building DAG of jobs...
WorkflowError in file /tmp/tmpo5jhoczo, line 51:
Function did not return str or list of str.

Also, there is the output of the command snakemake -n sampleAnnotation , which is, apparently, working well now.

WARNING: Less than 30 IDs in DROP_GROUP sp
check for missing R packages
MonoallelicExpression has been turned off in the config file
rnaVariantCalling has been turned off in the config file
Structuring dependencies...
Dependencies file generated at: /tmp/tmpx0_z_0op

Building DAG of jobs...
Job stats:
job count min threads max threads


Pipeline_SampleAnnotation_R 1 1 1
sampleAnnotation 1 1 1
total 2 1 1

[Tue Jan 24 15:27:21 2023]
rule Pipeline_SampleAnnotation_R:
input: /media/bio/datosbio2/antonio/drop_discovery/sample_annotation.tsv, Scripts/Pipeline/SampleAnnotation.R
output: /media/bio/datosbio2/antonio/drop_discovery/Output/processed_data/sample_anno/genes_overlapping_HPO_terms.tsv, /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_Pipeline_SampleAnnotation.html
log: /media/bio/datosbio2/antonio/drop_discovery/.drop/tmp/SampleAnnotation.Rds
jobid: 1
reason: Missing output files: /media/bio/datosbio2/antonio/drop_discovery/Output/processed_data/sample_anno/genes_overlapping_HPO_terms.tsv, /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_Pipeline_SampleAnnotation.html
resources: tmpdir=/tmp

[Tue Jan 24 15:27:21 2023]
localrule sampleAnnotation:
input: /media/bio/datosbio2/antonio/drop_discovery/Output/processed_data/sample_anno/genes_overlapping_HPO_terms.tsv, /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_Pipeline_SampleAnnotation.html
jobid: 0
reason: Input files updated by another job: /media/bio/datosbio2/antonio/drop_discovery/Output/processed_data/sample_anno/genes_overlapping_HPO_terms.tsv, /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_Pipeline_SampleAnnotation.html
resources: tmpdir=/tmp

Job stats:
job count min threads max threads


Pipeline_SampleAnnotation_R 1 1 1
sampleAnnotation 1 1 1
total 2 1 1

Reasons:
(check individual jobs above for details)
input files updated by another job:
sampleAnnotation
missing output files:
Pipeline_SampleAnnotation_R

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.
2023-01-24T153510.757027.snakemake.log

Again, many thanks in advance
Best!

@vyepez88
Copy link
Collaborator

Hi, the error seems to be in the mergeCounts_R rule. A potential reason is that the gene annotation used to generate the external count matrix is different from your provided gtf file. Can you verify this? One way of doing that is loading one of your counted samples under {root}/processed_data/aberrant_expression/gtf108/counts and compare the rownames with the ones from your provided count matrix (geneCounts_60C.tsv.gz)

@nacasfer
Copy link
Author

nacasfer commented Jan 24, 2023

Hi!,
I am affraid the only folder inside {root}/processed_data/aberrant_expression/gtf108 is params/ ;no counts/ folder at all.
Is that a clue for you?
There is a {root}/processed_data/aberrant_expression/gtf108/params/counts/ folder. I am attaching one of these files but they seems ok to me.

Anyway, I only use release 108 as gtf file (I checked that the matrix counts and gtf have same labels, just in case some issue have done happened)
40623_countParams.csv
I've tried removing the samples with the external count matrix (C1 to N45) from the sample annotation file, but the error is still the same... so I guess the error must be on the way that bam files are feeding that script.... but I am completely stuck at this point

Thanks again for your kindly help

@vyepez88
Copy link
Collaborator

oh true, that folder will only be populated after you count.
Check that all BAM files exist and they have a corresponding index file (.bai).
If they all exist, execute snakemake --cores X sampleAnnotation? An html file will be generated. Check that the DROP groups contain the number of samples they should have in the histogram at the bottom of the html.

@nacasfer
Copy link
Author

Hi Vicente!
Everithing is ok with the sampleAnnotation output. BAM and VCF files are correctly detected, while histogram-groups are correctly conformed.
Having bam files on the same folder as drop init command has been used is mandatory? I mean, I used drop init at the folder set at {root} on the config file, while all the BAM files are outside that (path have set on the sampleannotation file.)

@vyepez88
Copy link
Collaborator

Hi, sorry, I don't understand whether you have executed the pipeline partially or not.
Can you execute:

snakemake -n aberrantExpression

@nacasfer
Copy link
Author

Hi, sure!
snakemake -n aberrantExpression

WARNING: Less than 30 IDs in DROP_GROUP sp
check for missing R packages
MonoallelicExpression has been turned off in the config file
rnaVariantCalling has been turned off in the config file
Structuring dependencies...
Dependencies file generated at: /tmp/tmpp606shop

Building DAG of jobs...
WorkflowError:
WorkflowError:
MissingInputException: Missing input files for rule markdown:
output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/AberrantExpression/Counting/gtf108/Summary_abexp.html
wildcards: file=AberrantExpression/Counting/gtf108/Summary_abexp
affected files:
AberrantExpression/Counting/gtf108/Summary_abexp.md
WorkflowError (rule AberrantExpression_pipeline_Counting_mergeCounts_R, line 135, /tmp/tmpp606shop):
Function did not return str or list of str.
MissingInputException: Missing input files for rule markdown:
output: /media/bio/datosbio2/antonio/drop_discovery/Output/html/Scripts_AberrantExpression_pipeline_Counting_Datasets.html
wildcards: file=Scripts_AberrantExpression_pipeline_Counting_Datasets
affected files:
Scripts_AberrantExpression_pipeline_Counting_Datasets.md

@geocarvalho
Copy link

geocarvalho commented Nov 6, 2023

Hi @nacasfer, did you solve this issue? I'm dealing with something similar.

Regards.

@vyepez88
Copy link
Collaborator

vyepez88 commented Nov 7, 2023

Hi, so sorry this slipped. Can you please share your sample annotation file?

@geocarvalho
Copy link

drop_test.zip
Hi @vyepez88, can you check it out, please?

@vyepez88
Copy link
Collaborator

Can you share your sample annotation file as well?

@geocarvalho
Copy link

Sorry!
drop_test.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants