-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove multiqc versions channel mix #1966
Remove multiqc versions channel mix #1966
Conversation
Codecov Report
@@ Coverage Diff @@
## dev #1966 +/- ##
==========================================
- Coverage 61.14% 60.72% -0.42%
==========================================
Files 37 37
Lines 4676 4637 -39
==========================================
- Hits 2859 2816 -43
- Misses 1817 1821 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. |
Should we report the multiqc version earlier in the script then, so it is still recorded? |
Is it not already? I thought this was already taken care of. Let me check. |
So technically MultiQC reports it's own version, although not in the same place as the rest of the tools. I'm inclined to leave it this way unless there is strong opposition to this. This seems one of those chicken and egg scenarios, if we try to do this with the |
yeah, wasn't sure how easy it is. The multiqc version on top should probably be enough. |
@@ -82,7 +82,7 @@ workflow {{ short_name|upper }} { | |||
ch_versions = ch_versions.mix(FASTQC.out.versions.first()) | |||
|
|||
CUSTOM_DUMPSOFTWAREVERSIONS ( | |||
ch_versions.unique().collectFile(name: 'collated_versions.yml') | |||
ch_versions.unique{ it.text }.collectFile(name: 'collated_versions.yml') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change appears to have introduced a bug, a lot of people's pipelines are hanging after the latest pipeline merge. See discussion in Slack here.
I guess that it was to make it more similar to other instances such as:
versions = versions.mix(BISMARK.out.versions.unique{ it.baseName })
However, we're not in a mix()
here and it seems to silently hang forever without any errors. I think we need to revert this change unless anyone has any objections? I'll make an issue..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand why the template CI checks didn't pick up on this. I found it because the methylseq CI failed due to timeouts. I would have expected the tools CI tests to do the same 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a mixed channel (a few lines above). It's surprising that checking the text for uniqueness is causing a hang. There would appear to be a deeper cause, but I'm on leave now, so I don't have time to dig into it. It's definitely strange though. If it fixes the issue then revert it, but need to make sure duplicate versions.yml are not kept. It's mostly to do with duplicates emitted from subworkflows, since you can't use first()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, perhaps check if the dumpsoftware versions script is also removing duplicate entries by loading them into a dictionary. Then the unique
is not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, my mistake on it's need. It's for the case when a subworkflow is used in multiple places.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ewels OK. I figured out why unique{ it.text }
is causing a hang.
People are doing this in their workflows:
ch_versions.mix(TOOL.out.versions.first().ifEmpty(null))
specifically, people are putting null
objects in their versions channels. They don't need to emit anything ...
The correct code should be
ch_versions.mix(TOOL.out.versions.first())
Mixing an empty channel is fine.
My guess is there are null pointer exceptions under the hood, resulting in the hang.
That's why this was never caught before being integrated into workflows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, nice detective work! Sorry to bother you whilst you're away.
Is there any functional difference between unique()
and unique{ it.text }
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, perhaps check if the dumpsoftware versions script is also removing duplicate entries by loading them into a dictionary. Then the unique is not needed.
I have a feeling that even if the Python code is doing this, having duplicate keys within the YAML might make it invalid and crash the loading process? (not tested)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any functional difference between
unique()
andunique{ it.text }
?
Yes, but I don't know how often the issue of duplication occurs. unique()
is only checking for uniquely named files ( I think it must be checking the full path which includes the work directory ), which means here it's actually doing nothing, since the work directories are always unique. While most people use first()
in their workflows, it doesn't help removing occurrences of modules used under different aliases for example from subworkflows. For example, versions
from, WORKFLOW_A:SAMTOOLS
and WORKFLOW_B:SAMTOOLS
, should both be present in the versions channel, and since their work directories are unique, unique()
won't do anything, but unique{ it.text }
should remove the second occurrence.
unique
operator to filter on version file contents.Fixes #1962
PR checklist
CHANGELOG.md
is updateddocs
is updated