Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove multiqc versions channel mix #1966

Merged
merged 2 commits into from
Oct 25, 2022
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions nf_core/pipeline-template/workflows/pipeline.nf
Original file line number Diff line number Diff line change
Expand Up @@ -82,7 +82,7 @@ workflow {{ short_name|upper }} {
ch_versions = ch_versions.mix(FASTQC.out.versions.first())

CUSTOM_DUMPSOFTWAREVERSIONS (
ch_versions.unique().collectFile(name: 'collated_versions.yml')
ch_versions.unique{ it.text }.collectFile(name: 'collated_versions.yml')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change appears to have introduced a bug, a lot of people's pipelines are hanging after the latest pipeline merge. See discussion in Slack here.

I guess that it was to make it more similar to other instances such as:

versions = versions.mix(BISMARK.out.versions.unique{ it.baseName })

However, we're not in a mix() here and it seems to silently hang forever without any errors. I think we need to revert this change unless anyone has any objections? I'll make an issue..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really understand why the template CI checks didn't pick up on this. I found it because the methylseq CI failed due to timeouts. I would have expected the tools CI tests to do the same 🤔

Copy link
Member Author

@mahesh-panchal mahesh-panchal Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mixed channel (a few lines above). It's surprising that checking the text for uniqueness is causing a hang. There would appear to be a deeper cause, but I'm on leave now, so I don't have time to dig into it. It's definitely strange though. If it fixes the issue then revert it, but need to make sure duplicate versions.yml are not kept. It's mostly to do with duplicates emitted from subworkflows, since you can't use first().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, perhaps check if the dumpsoftware versions script is also removing duplicate entries by loading them into a dictionary. Then the unique is not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, my mistake on it's need. It's for the case when a subworkflow is used in multiple places.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ewels OK. I figured out why unique{ it.text } is causing a hang.
People are doing this in their workflows:

ch_versions.mix(TOOL.out.versions.first().ifEmpty(null))

specifically, people are putting null objects in their versions channels. They don't need to emit anything ...

The correct code should be

ch_versions.mix(TOOL.out.versions.first())

Mixing an empty channel is fine.
My guess is there are null pointer exceptions under the hood, resulting in the hang.

That's why this was never caught before being integrated into workflows.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, nice detective work! Sorry to bother you whilst you're away.

Is there any functional difference between unique() and unique{ it.text }?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, perhaps check if the dumpsoftware versions script is also removing duplicate entries by loading them into a dictionary. Then the unique is not needed.

I have a feeling that even if the Python code is doing this, having duplicate keys within the YAML might make it invalid and crash the loading process? (not tested)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any functional difference between unique() and unique{ it.text }?

Yes, but I don't know how often the issue of duplication occurs. unique() is only checking for uniquely named files ( I think it must be checking the full path which includes the work directory ), which means here it's actually doing nothing, since the work directories are always unique. While most people use first() in their workflows, it doesn't help removing occurrences of modules used under different aliases for example from subworkflows. For example, versions from, WORKFLOW_A:SAMTOOLS and WORKFLOW_B:SAMTOOLS, should both be present in the versions channel, and since their work directories are unique, unique() won't do anything, but unique{ it.text } should remove the second occurrence.

)

//
Expand All @@ -107,7 +107,6 @@ workflow {{ short_name|upper }} {
ch_multiqc_logo.toList()
)
multiqc_report = MULTIQC.out.report.toList()
ch_versions = ch_versions.mix(MULTIQC.out.versions)
}

/*
Expand Down