Various MultiQC issues: FastQC sections for raw and trimmed reads // umi-tools dedup and extraction plots, custom content styling. #1308

MatthiasZepper · 2024-05-29T20:29:01Z

This draft PR comprises my current progress towards fixing issue #1303.

It does modify the publishDir directives in the FastQC module config such that the reports are consistently published in ${params.outdir}/fastqc/raw and ${params.outdir}/fastqc/trim regardless of the chosen trimmer (TrimGalore!, Fastp), and adapts the custom MultiQC config of the pipeline accordingly.

This is, however, not sufficient to fix the issue, because recent versions of MultiQC have a bug that prevents running the same module twice. There are still separate entries and columns in the General Statistics table, but the modules are not shown in the report and navigation bar:

MultiQC 1.23dev, 1.22, 1.21	MultiQC 1.18

Screenshot of Navbar 1.23dev	Screenshot of Navbar 1.18

For both screenshots, I ran MultiQC on the output directory of a test profile run of this pipeline using the custom profile in workflows/rnaseq/assets/multiqc/multiqc_config.yml.

It should be stressed that the FastQC module itself works in modern versions, because if the custom config is omitted, it is also shown. But forcing the module to run twice via a custom config seemingly breaks it. Only in theGeneral Statisticstable, it still works like a charm. Thus, the reports are parsed, but the module output is not displayed in the report.

Further issues

In the course of troubleshooting this issue, I discovered more issues that need to be tackled. Help would be greatly appreciated with those:

Inconsistent naming of FastQC output:

For FastP, the file names are retained before and after trimming:

fastqc
├── raw
│   ├── RAP1_IAA_30M_REP1_1_fastqc.html
│   ├── RAP1_IAA_30M_REP1_1_fastqc.zip
│   ├── RAP1_IAA_30M_REP1_2_fastqc.html
│   ├── RAP1_IAA_30M_REP1_2_fastqc.zip
│   ├── RAP1_UNINDUCED_REP1_fastqc.html
│   ├── RAP1_UNINDUCED_REP1_fastqc.zip
│   ├── RAP1_UNINDUCED_REP2_fastqc.html
│   ├── RAP1_UNINDUCED_REP2_fastqc.zip
│   ├── WT_REP1_1_fastqc.html
│   ├── WT_REP1_1_fastqc.zip
│   ├── WT_REP1_2_fastqc.html
│   ├── WT_REP1_2_fastqc.zip
│   ├── WT_REP2_1_fastqc.html
│   ├── WT_REP2_1_fastqc.zip
│   ├── WT_REP2_2_fastqc.html
│   └── WT_REP2_2_fastqc.zip
└── trim
   ├── RAP1_IAA_30M_REP1_1_fastqc.html
   ├── RAP1_IAA_30M_REP1_1_fastqc.zip
   ├── RAP1_IAA_30M_REP1_2_fastqc.html
   ├── RAP1_IAA_30M_REP1_2_fastqc.zip
   ├── RAP1_UNINDUCED_REP1_fastqc.html
   ├── RAP1_UNINDUCED_REP1_fastqc.zip
   ├── RAP1_UNINDUCED_REP2_fastqc.html
   ├── RAP1_UNINDUCED_REP2_fastqc.zip
   ├── WT_REP1_1_fastqc.html
   ├── WT_REP1_1_fastqc.zip
   ├── WT_REP1_2_fastqc.html
   ├── WT_REP1_2_fastqc.zip
   ├── WT_REP2_1_fastqc.html
   ├── WT_REP2_1_fastqc.zip
   ├── WT_REP2_2_fastqc.html
   └── WT_REP2_2_fastqc.zip

For TrimGalore!, the RAP1_UNINDUCED samples are renamed with a trimmed suffix and the others receive _val1_ and _val2_ suffixes.

fastqc
├── raw
│   ├── RAP1_IAA_30M_REP1_1_fastqc.html
│   ├── RAP1_IAA_30M_REP1_1_fastqc.zip
│   ├── RAP1_IAA_30M_REP1_2_fastqc.html
│   ├── RAP1_IAA_30M_REP1_2_fastqc.zip
│   ├── RAP1_UNINDUCED_REP1_fastqc.html
│   ├── RAP1_UNINDUCED_REP1_fastqc.zip
│   ├── RAP1_UNINDUCED_REP2_fastqc.html
│   ├── RAP1_UNINDUCED_REP2_fastqc.zip
│   ├── WT_REP1_1_fastqc.html
│   ├── WT_REP1_1_fastqc.zip
│   ├── WT_REP1_2_fastqc.html
│   ├── WT_REP1_2_fastqc.zip
│   ├── WT_REP2_1_fastqc.html
│   ├── WT_REP2_1_fastqc.zip
│   ├── WT_REP2_2_fastqc.html
│   └── WT_REP2_2_fastqc.zip
└── trim
    ├── RAP1_IAA_30M_REP1_1_val_1_fastqc.html
    ├── RAP1_IAA_30M_REP1_1_val_1_fastqc.zip
    ├── RAP1_IAA_30M_REP1_2_val_2_fastqc.html
    ├── RAP1_IAA_30M_REP1_2_val_2_fastqc.zip
    ├── RAP1_UNINDUCED_REP1_trimmed_fastqc.html
    ├── RAP1_UNINDUCED_REP1_trimmed_fastqc.zip
    ├── RAP1_UNINDUCED_REP2_trimmed_fastqc.html
    ├── RAP1_UNINDUCED_REP2_trimmed_fastqc.zip
    ├── WT_REP1_1_val_1_fastqc.html
    ├── WT_REP1_1_val_1_fastqc.zip
    ├── WT_REP1_2_val_2_fastqc.html
    ├── WT_REP1_2_val_2_fastqc.zip
    ├── WT_REP2_1_val_1_fastqc.html
    ├── WT_REP2_1_val_1_fastqc.zip
    ├── WT_REP2_2_val_2_fastqc.html
    └── WT_REP2_2_val_2_fastqc.zip

Unfortunately, I have no idea why. I have quadruplechecked the publishDir directives and can't explain. Help and inspiration needed!

Duplicate column is actually shown in the General Statistics table (FIXED!)

According to the config, the duplicate column from FastQC should be hidden in the General Statistics table. However, it is shown. Might be another MultiQC bug or that I just stared myself blind.

# Don't show % Dups in the General Stats table (we have this from Picard)
table_columns_visible:
  fastqc:
    percent_duplicates: False

umi-tools dedup stats not shown (Fixed)

According to our current master / dev branch config, the umi_tools module is not run. Seeing this, I believed that would be an easy fix for #1277 and added the module in the config. However, no reports are shown. Either the module is broken or the deduplication stats are not channelled to MultiQC. In either way, also no quick solution in sight here.

PR checklist

github-actions · 2024-05-29T20:33:02Z

`nf-core lint` overall result: Passed ✅ ⚠️

Posted for pipeline commit 1c75669

+| ✅ 173 tests passed       |+
#| ❔   9 tests were ignored |#
!| ❗   7 tests had warnings |!

❗ Test warnings:

files_exist - File not found: assets/multiqc_config.yml
files_exist - File not found: .github/workflows/awstest.yml
files_exist - File not found: .github/workflows/awsfulltest.yml
pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
pipeline_todos - TODO string in methods_description_template.yml: #Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline

❔ Tests ignored:

files_exist - File is ignored: conf/modules.config
nextflow_config - Config default ignored: params.ribo_database_manifest
files_unchanged - File ignored due to lint config: assets/email_template.html
files_unchanged - File ignored due to lint config: assets/email_template.txt
files_unchanged - File ignored due to lint config: .gitignore or .prettierignore
actions_ci - actions_ci
actions_awstest - 'awstest.yml' workflow not found: /home/runner/work/rnaseq/rnaseq/.github/workflows/awstest.yml
multiqc_config - multiqc_config
modules_config - modules_config

✅ Tests passed:

files_exist - File found: .gitattributes
files_exist - File found: .gitignore
files_exist - File found: .nf-core.yml
files_exist - File found: .editorconfig
files_exist - File found: .prettierignore
files_exist - File found: .prettierrc.yml
files_exist - File found: CHANGELOG.md
files_exist - File found: CITATIONS.md
files_exist - File found: CODE_OF_CONDUCT.md
files_exist - File found: LICENSE or LICENSE.md or LICENCE or LICENCE.md
files_exist - File found: nextflow_schema.json
files_exist - File found: nextflow.config
files_exist - File found: README.md
files_exist - File found: .github/.dockstore.yml
files_exist - File found: .github/CONTRIBUTING.md
files_exist - File found: .github/ISSUE_TEMPLATE/bug_report.yml
files_exist - File found: .github/ISSUE_TEMPLATE/config.yml
files_exist - File found: .github/ISSUE_TEMPLATE/feature_request.yml
files_exist - File found: .github/PULL_REQUEST_TEMPLATE.md
files_exist - File found: .github/workflows/branch.yml
files_exist - File found: .github/workflows/ci.yml
files_exist - File found: .github/workflows/linting_comment.yml
files_exist - File found: .github/workflows/linting.yml
files_exist - File found: assets/email_template.html
files_exist - File found: assets/email_template.txt
files_exist - File found: assets/sendmail_template.txt
files_exist - File found: assets/nf-core-rnaseq_logo_light.png
files_exist - File found: conf/test.config
files_exist - File found: conf/test_full.config
files_exist - File found: docs/images/nf-core-rnaseq_logo_light.png
files_exist - File found: docs/images/nf-core-rnaseq_logo_dark.png
files_exist - File found: docs/output.md
files_exist - File found: docs/README.md
files_exist - File found: docs/README.md
files_exist - File found: docs/usage.md
files_exist - File found: main.nf
files_exist - File found: conf/base.config
files_exist - File found: conf/igenomes.config
files_exist - File found: modules.json
files_exist - File not found check: .github/ISSUE_TEMPLATE/bug_report.md
files_exist - File not found check: .github/ISSUE_TEMPLATE/feature_request.md
files_exist - File not found check: .github/workflows/push_dockerhub.yml
files_exist - File not found check: .markdownlint.yml
files_exist - File not found check: .nf-core.yaml
files_exist - File not found check: .yamllint.yml
files_exist - File not found check: bin/markdown_to_html.r
files_exist - File not found check: conf/aws.config
files_exist - File not found check: docs/images/nf-core-rnaseq_logo.png
files_exist - File not found check: lib/Checks.groovy
files_exist - File not found check: lib/Completion.groovy
files_exist - File not found check: lib/NfcoreTemplate.groovy
files_exist - File not found check: lib/Utils.groovy
files_exist - File not found check: lib/Workflow.groovy
files_exist - File not found check: lib/WorkflowMain.groovy
files_exist - File not found check: lib/WorkflowRnaseq.groovy
files_exist - File not found check: parameters.settings.json
files_exist - File not found check: pipeline_template.yml
files_exist - File not found check: Singularity
files_exist - File not found check: lib/nfcore_external_java_deps.jar
files_exist - File not found check: .travis.yml
nextflow_config - Config variable found: manifest.name
nextflow_config - Config variable found: manifest.nextflowVersion
nextflow_config - Config variable found: manifest.description
nextflow_config - Config variable found: manifest.version
nextflow_config - Config variable found: manifest.homePage
nextflow_config - Config variable found: timeline.enabled
nextflow_config - Config variable found: trace.enabled
nextflow_config - Config variable found: report.enabled
nextflow_config - Config variable found: dag.enabled
nextflow_config - Config variable found: process.cpus
nextflow_config - Config variable found: process.memory
nextflow_config - Config variable found: process.time
nextflow_config - Config variable found: params.outdir
nextflow_config - Config variable found: params.input
nextflow_config - Config variable found: params.validationShowHiddenParams
nextflow_config - Config variable found: params.validationSchemaIgnoreParams
nextflow_config - Config variable found: manifest.mainScript
nextflow_config - Config variable found: timeline.file
nextflow_config - Config variable found: trace.file
nextflow_config - Config variable found: report.file
nextflow_config - Config variable found: dag.file
nextflow_config - Config variable (correctly) not found: params.nf_required_version
nextflow_config - Config variable (correctly) not found: params.container
nextflow_config - Config variable (correctly) not found: params.singleEnd
nextflow_config - Config variable (correctly) not found: params.igenomesIgnore
nextflow_config - Config variable (correctly) not found: params.name
nextflow_config - Config variable (correctly) not found: params.enable_conda
nextflow_config - Config timeline.enabled had correct value: true
nextflow_config - Config report.enabled had correct value: true
nextflow_config - Config trace.enabled had correct value: true
nextflow_config - Config dag.enabled had correct value: true
nextflow_config - Config manifest.name began with nf-core/
nextflow_config - Config variable manifest.homePage began with https://github.com/nf-core/
nextflow_config - Config dag.file ended with .html
nextflow_config - Config variable manifest.nextflowVersion started with >= or !>=
nextflow_config - Config manifest.version ends in dev: 3.15.0dev
nextflow_config - Config params.custom_config_version is set to master
nextflow_config - Config params.custom_config_base is set to https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Lines for loading custom profiles found
nextflow_config - nextflow.config contains configuration profile test
nextflow_config - Config default value correct: params.hisat2_build_memory= 200.GB
nextflow_config - Config default value correct: params.gtf_extra_attributes= gene_name
nextflow_config - Config default value correct: params.gtf_group_features= gene_id
nextflow_config - Config default value correct: params.featurecounts_group_type= gene_biotype
nextflow_config - Config default value correct: params.featurecounts_feature_type= exon
nextflow_config - Config default value correct: params.igenomes_base= s3://ngi-igenomes/igenomes/
nextflow_config - Config default value correct: params.trimmer= trimgalore
nextflow_config - Config default value correct: params.min_trimmed_reads= 10000
nextflow_config - Config default value correct: params.umitools_extract_method= string
nextflow_config - Config default value correct: params.umitools_grouping_method= directional
nextflow_config - Config default value correct: params.aligner= star_salmon
nextflow_config - Config default value correct: params.pseudo_aligner_kmer_size= 31
nextflow_config - Config default value correct: params.min_mapped_reads= 5.0
nextflow_config - Config default value correct: params.kallisto_quant_fraglen= 200
nextflow_config - Config default value correct: params.kallisto_quant_fraglen_sd= 200
nextflow_config - Config default value correct: params.stranded_threshold= 0.8
nextflow_config - Config default value correct: params.unstranded_threshold= 0.1
nextflow_config - Config default value correct: params.deseq2_vst= true
nextflow_config - Config default value correct: params.rseqc_modules= bam_stat,inner_distance,infer_experiment,junction_annotation,junction_saturation,read_distribution,read_duplication
nextflow_config - Config default value correct: params.skip_bbsplit= true
nextflow_config - Config default value correct: params.skip_preseq= true
nextflow_config - Config default value correct: params.custom_config_version= master
nextflow_config - Config default value correct: params.custom_config_base= https://raw.githubusercontent.com/nf-core/configs/master
nextflow_config - Config default value correct: params.max_cpus= 16
nextflow_config - Config default value correct: params.max_memory= 128.GB
nextflow_config - Config default value correct: params.max_time= 240.h
nextflow_config - Config default value correct: params.publish_dir_mode= copy
nextflow_config - Config default value correct: params.max_multiqc_email_size= 25.MB
nextflow_config - Config default value correct: params.validate_params= true
nextflow_config - Config default value correct: params.pipelines_testdata_base_path= https://raw.githubusercontent.com/nf-core/test-datasets/7f1614baeb0ddf66e60be78c3d9fa55440465ac8/
files_unchanged - .gitattributes matches the template
files_unchanged - .prettierrc.yml matches the template
files_unchanged - CODE_OF_CONDUCT.md matches the template
files_unchanged - LICENSE matches the template
files_unchanged - .github/.dockstore.yml matches the template
files_unchanged - .github/CONTRIBUTING.md matches the template
files_unchanged - .github/ISSUE_TEMPLATE/bug_report.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/config.yml matches the template
files_unchanged - .github/ISSUE_TEMPLATE/feature_request.yml matches the template
files_unchanged - .github/PULL_REQUEST_TEMPLATE.md matches the template
files_unchanged - .github/workflows/branch.yml matches the template
files_unchanged - .github/workflows/linting_comment.yml matches the template
files_unchanged - .github/workflows/linting.yml matches the template
files_unchanged - assets/sendmail_template.txt matches the template
files_unchanged - assets/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_light.png matches the template
files_unchanged - docs/images/nf-core-rnaseq_logo_dark.png matches the template
files_unchanged - docs/README.md matches the template
readme - README Nextflow minimum version badge matched config. Badge: 23.04.0, Config: 23.04.0
readme - README Zenodo placeholder was replaced with DOI.
pipeline_name_conventions - Name adheres to nf-core convention
template_strings - Did not find any Jinja template strings (553 files)
schema_lint - Schema lint passed
schema_lint - Schema title + description lint passed
schema_lint - Input mimetype lint passed: 'text/csv'
schema_params - Schema matched params returned from nextflow config
system_exit - No System.exit calls found
actions_schema_validation - Workflow validation passed: clean-up.yml
actions_schema_validation - Workflow validation passed: linting_comment.yml
actions_schema_validation - Workflow validation passed: cloud_tests_small.yml
actions_schema_validation - Workflow validation passed: linting.yml
actions_schema_validation - Workflow validation passed: branch.yml
actions_schema_validation - Workflow validation passed: ci.yml
actions_schema_validation - Workflow validation passed: cloud_tests_full.yml
actions_schema_validation - Workflow validation passed: release-announcements.yml
actions_schema_validation - Workflow validation passed: fix-linting.yml
actions_schema_validation - Workflow validation passed: download_pipeline.yml
merge_markers - No merge markers found in pipeline files
modules_json - Only installed modules found in modules.json
modules_structure - modules directory structure is correct 'modules/nf-core/TOOL/SUBTOOL'
base_config - conf/base.config found and not ignored.
nfcore_yml - Repository type in .nf-core.yml is valid: pipeline
nfcore_yml - nf-core version in .nf-core.yml is set to the latest version: 2.14.1

Run details

nf-core/tools version 2.14.1
Run at 2024-07-11 17:39:54

MatthiasZepper · 2024-06-03T18:52:40Z

Some progress:

Vlad Savelyev speedily fixed the "Multi Module Multi QC issue" for us, and the MultiQC release 1.22.2 was pushed just for us. Patching the MultiQC module to the latest version thus fixes MultiQC report is missing fastQC results on the dev branch #1303 in conjunction with my proposed changes in the publishDir directives. One issue is down, three to go.
After some testing, I finally understood that the YAML config in table_columns_visible expects the actual module names and not the original module name used by MultiQC. Thus, I could now successfully suppress the display of the unwanted column. Two issues is down, two to go.

drpatelh · 2024-06-19T09:10:11Z

Thanks @MatthiasZepper !!

Two issues is down, two to go.

I read through your write-up but was a little unclear as to what is still missing here?

pinin4fjords · 2024-06-20T13:10:53Z

To copy in @MatthiasZepper's note on this from Slack:

I am somewhat stuck with #1308, both because of a lack of time recently and also a lack of ideas. I believed that I fixed 3 of the 4 issues with the 4th, the inconsistent naming of the TrimGalore! output, being somewhat neglectable.

However, it turns out that I did not fix the main issue yet. The reports generated by MultiQC when run inside the pipeline and manually on the outdir of the pipeline differ. The manual runs look exactly how I want them, so I thought it should be good, but the pipeline version does not work alike.

In the pipeline version, the path_filters in the MultiQC config (workflows/rnaseq/assets/multiqc/multiqc_config.yml) are not applied:

module_order:
  - fastqc:
      name: "FastQC (raw)"
      anchor: "fastqc_raw"
      info: "This section of the report shows FastQC results before adapter trimming."
      path_filters:
        - "**/raw/*.zip"
  - cutadapt
  - fastp
  - fastqc:
      name: "FastQC (trimmed)"
      anchor: "fastqc_trimmed"
      info: "This section of the report shows FastQC results after adapter trimming."
      path_filters:
        - "**/trim/*.zip"

I think that is because the file paths in the ch_multiqc_files are still those to the work dir and to not correspond yet to the final folder structure specified by the publishDir directives when I mix the output into the channel…

ch_multiqc_files = ch_multiqc_files.mix(FASTQ_FASTQC_UMITOOLS_FASTP.out.fastqc_raw_zip.collect{it[1]})

… but since I can’t do a proper introspection into the channel (a .view() or .collectFile() completely crashes the pipeline), I don’t know for sure.

pinin4fjords · 2024-06-20T13:29:54Z

OK, I know the fix @MatthiasZepper, I sorted this in riboseq. The issue is that the file structure is flat by the time it gets to MultiQC.

We need to do like:

    if (params.trimmer == 'trimgalore') {
        process {
            withName: '.*:FASTQ_FASTQC_UMITOOLS_TRIMGALORE:FASTQC' {
                ext.prefix = { "${meta.id}_raw" }
                ext.args   = '--quiet'
                publishDir = [
                    path: { "${params.outdir}/preprocessing/fastqc" },
                    mode: params.publish_dir_mode,
                    saveAs: { filename -> filename.equals('versions.yml') ? null : filename }
                ]
            }
        }
    }

... and then:

module_order:
  - fastqc:
      name: "FastQC (raw)"
      info: "This section of the report shows FastQC results before adapter trimming."
      path_filters:
        - "*_raw_fastqc.zip"

So we're using the prefix sent to FASTQC to mark the outputs appropriately. I'll push a commit to your branch if I can, but this is the way to solve it.

MatthiasZepper · 2024-06-20T13:51:11Z

OK, I know the fix @MatthiasZepper, I sorted this in riboseq. The issue is that the file structure is flat by the time it gets to MultiQC.

So we're using the prefix sent to FASTQC to mark the outputs appropriately. I'll push a commit to your branch if I can, but this is the way to solve it.

Thank you so much! That would be fantastic! You should be able to push to the branch since you are a maintainer, but just in case, I have also invited to as a collaborator to my fork!

pinin4fjords · 2024-06-20T13:56:58Z

@MatthiasZepper OK, committed! Had a quick check and I think this works, though I note that the trimgalore subworkflow doesn't do a post-trim FASTQ, which we might want to address at some point....

Anyway, I'll let you take it home from here :-)

MatthiasZepper · 2024-06-20T15:27:23Z

Thank you so much! I will try my best to finish this quickly now!

though I note that the trimgalore subworkflow doesn't do a post-trim FASTQ, which we might want to address at some point....

Oh, it does. It is just confusing, because TrimGalore! in itself is a wrapper script around cutadapt and FastQC. So FastQC is not run as a Nextflow process but by the TrimGalore Perl script.

pinin4fjords · 2024-06-20T15:43:54Z

Ahh right, thought I was forgetting something ;-). So there is probably a missing bit to get those outputs prefixed correctly, but you know what to do.

pinin4fjords · 2024-06-21T11:46:41Z

@MatthiasZepper in case it's impacting on your work, we've noticed that the lastest MultiQC has generated some issues in the workflow. We're looking into it.

MatthiasZepper · 2024-07-03T13:09:16Z

I think/hope/wish I am done with this PR. It now fixes 3 out of the 4 issues that were spotted with the TrimGalore! renaming being left. However, I perceive this as a minor issue and think that it could be tackled some when later if needed.

pinin4fjords · 2024-07-04T08:19:11Z

Great, thanks @MatthiasZepper ! Just to be clear, you don't need an updated MultiQC?

MatthiasZepper · 2024-07-04T09:37:28Z

Great, thanks @MatthiasZepper ! Just to be clear, you don't need an updated MultiQC?

It did need changes to MultiQC, since the previous version was not working. However, the critical bug was fixed with 1.22.2 and my updates to the umi-tools module were already contained within 1.22.3.

Therefore, with this PR, we should now see (re)introduced:

MultiQC report has a FastQC (raw) and FastQC (trimmed) section again, closes MultiQC report is missing fastQC results on the dev branch #1303
MultiQC report now features an umi-tools extract statistics. While not very helpful for the basic extraction, it will be quite useful for the regex mode of umi-tools.
The FastQC duplicate estimate is hidden from the General Statistics table (since umi-tools / picard duplicate estimates are run on the aligned reads and thus more accurate)
MultiQC report now correctly picks up and displays the umi-tools dedup statistics. This should close or at least represent significant progress towards a solution of Improve/add UMI deduplication metrics #1277 .
Account for some MultiQC config changes. e.g. reverseColors is now reverse_colors in the custom content stuff. Since the custom content is, however, not displayed, it is hard to fully test this. It at least tackles all the warning messages about deprecated config that have been displayed before.
I have sneaked in instructions for processing the Watchmaker UMIs with the pipeline. Unrelated to the purpose of this PR, but since it was a tiny update it felt excessive to make a seperate PR for this.

…onfig.

…in the General Stats table of MultiQC.

… logs.

pinin4fjords · 2024-07-10T12:24:16Z

Hope you don't mind @MatthiasZepper - just illustrating in those last couple of commits what I meant. So use the module in its updated form, but also have a patch to help with updates.

I also removed something I added to the patch earlier and which shouldn't have been there, and bumped the module (think it was just Maxime mucking about with stubs)

MatthiasZepper · 2024-07-10T17:54:32Z

Hope you don't mind @MatthiasZepper - just illustrating in those last couple of commits what I meant. So use the module in its updated form, but also have a patch to help with updates.

No, I don't mind at all. In contrast, I highly appreciate your help here! Please push your changes also to the draft PR of the modules' repo right away so they don't get lost in translation!.

Fixing the dupradar module was not even in the original scope of this PR. I think, the first changes got introduced by rebasing my draft PR to the dev branch, and then I packed some more changes in there because a colleague suggested them and felt that it was too minor for an PR on its own right ?!?

In either way, I would like to see the MultiQC fixes merged and am happy to take everything else out, if it complicates the review and decision.

pinin4fjords

For me this is good to go. I've given the MultiQC report a check, and it's looking good to me when I check for the recent issues.

Module state is per @MatthiasZepper's module PR, temporarily achieved via a patch pending the merge of that PR. We can merge as-is, or just merge that module PR (since everything seems to be working) and update here, removing the patch.

pinin4fjords · 2024-07-11T11:13:09Z

@MatthiasZepper I think the failure here is because the nf-test didn't run on the module PR (maybe touching the template file isn't enough), and we need to update the tests to reflect the changes. I'll take a look.

pinin4fjords · 2024-07-11T11:24:44Z

nf-core/modules#5966

MatthiasZepper · 2024-07-11T13:00:21Z

@MatthiasZepper I think the failure here is because the nf-test didn't run on the module PR (maybe touching the template file isn't enough), and we need to update the tests to reflect the changes. I'll take a look.

Thanks. But don't overthink it, since I probably just screwed up undoing the local changes.

nf-core modules patch -r dupradar did not work (it complained about the presence of a nextflow.config in the module's directory, so is evidently not yet adapted for the new pipeline structure with separate module configs) and thus I ended up removing the .diff file with git rm, which of course left the dangling reference in the modules.json that I was not aware of. So the failing test was presumably just a layer 8 issue.

pinin4fjords · 2024-07-11T13:52:31Z

Thanks @MatthiasZepper! Module update done, lights are green. Merge away when you're ready.

MatthiasZepper · 2024-07-11T14:46:06Z

For me this is good to go. I've given the MultiQC report a check, and it's looking good to me when I check for the recent issues.

I did not have time to test the latest iteration of this PR until just now, but to me it seems the MultiQC issues are not fixed (or new ones emerged - can't tell if I overlooked something before, because I have deleted the results from the previous test runs already).

1.) The FastQC section is missing samples. Only 2 samples in the FastQC, but 5 samples in the umi-tools module, if I use the test profile:

Also, the sample names are oddly mixed up. I never paid much attention to the contents of the test data, but it seems that some tools only process parts of the data or perform some weird renaming of the samples?

2.) The Dupradar plot is there, but no lines are shown and the sample values are all 0,0. That might be due to the small testdata being poorly suited to test the tool, but of course it could also be due to an invalid config or incorrect data processing.

Can you as a first step, please let me know if it is the same with you or not?

pinin4fjords · 2024-07-11T16:15:47Z

Oh goddamn, you're right on FastQC, I'll have a dig. Dupradar doesn't look like that for me though:

pinin4fjords · 2024-07-11T16:22:37Z

Only 2 samples in the FastQC

This is actually because there are only two e.g. _raw_fastqc.zip files getting to the multiqc process, so probably a workflow issue. I'll figure it out

pinin4fjords · 2024-07-11T17:39:10Z

@MatthiasZepper think I fixed the FASTQC thing at least (see last commit) - could you check again?

pinin4fjords · 2024-07-12T09:00:17Z

Also, could you give me the UMI params you're testing with, and which are not set in the test profile by default?

MatthiasZepper · 2024-07-12T10:32:54Z

could you check again?

Pipeline run is queued and about to start as we speak (as I type).

Also, could you give me the UMI params you're testing with, and which are not set in the test profile by default?

Of course. Mind, however, that this is a completely nonesense pattern. I am defining three fixed bases at the start, but allowing for two random substitutions. I just wanted to make the UMI-tools extract plot a little more informative (and take that module to a true test), because with a fixed pattern there are no failures when extracting.

with_umi: true
umitools_extract_method: "regex"
umitools_bc_pattern: "^(?P<umi_1>CGA.{8}){s<=2}.*"
umi_dedup_stats: true

Dupradar doesn't look like that for me though

Then this is probably a side effect of my random UMI specification that I did not think through properly or an issue with my browser.

pinin4fjords · 2024-07-12T10:44:06Z

I've also looked into the UMI thing. I don't think it can ever have worked (unless there is some regression I was unaware of).

Multi QC is parsing log lines like:

output generated by extract -I SRR6357076_1.fastq.gz --read2-in=SRR6357076_2.fastq.gz -S RAP1_IAA_30M_REP1.umi_extract_1.fastq.gz --read2-out=RAP1_IAA_30M_REP1.umi_extract_2.fastq.gz --extract-method=string --bc-pattern=NNNN

Those input files are taken directly from the sample sheet. Other processes where FASTQ files have been merged end up using a prefix on their output, so the log for umitools extract looks like:

# output generated by extract -I WT_REP1_1.merged.fastq.gz --read2-in=WT_REP1_2.merged.fastq.gz -S WT_REP1.umi_extract_1.fastq.gz --read2-out=WT_REP1.umi_extract_2.fastq.gz --extract-method=regex --bc-pattern=^(?P<umi_1>CGA.{8}){s<=2}.*

To fix this, someone will probably have to alter umitools/extract to allow symlinking of input files to use the prefix. Suggest you create an issue, but probably not something to address here.

pinin4fjords · 2024-07-12T11:02:53Z

Dupradar doesn't look like that for me though

Then this is probably a side effect of my random UMI specification that I did not think through properly or an issue with my browser.

OK, I see it now with your parameters, so it's not your browser! Not sure of the fix though as it relates to your params (I haven't dug much into the UMI stuff), and don't think it's MultiQC related. So probably one for a separate issue as well.

MatthiasZepper · 2024-07-12T11:05:49Z

I've also looked into the UMI thing. I don't think it can ever have worked (unless there is some regression I was unaware of).

I fear, I can't follow you without being a tad more specific than thing. 🙃

I think, you are aware of that there are two umi-tools steps in the pipeline:

umi-tools extract: This indeed has never worked, because there was no module for that tool. I wrote it recently and it was included in MultiQC 1.22.3. Hence, a summary of the extraction success is now newly included in rnaseq 3.15.dev, since we are using 1.23 now.
umi-tools dedup: There has been a module for this subtool, but the file search pattern was incorrect for a while. I fixed that as well to address Improve/add UMI deduplication metrics #1277.

The examples you show evidently refer to extract, but I fail to see the difference. Both take some SRR sample names as input and have some proper output file names? If the pipeline supports automatically merging multipart input FastQs into single samples, maybe we need to generate some additional MultiQC config to rename the samples there as well?

pinin4fjords · 2024-07-12T11:59:47Z

Yes, I meant your earlier flagged inconsistency in the names on the extract plots.

Being more specific, the sample IDs are derived like this, which means they're derived from data lines like:

# stdin                                   : <_io.TextIOWrapper name='SRR6357073_1.fastq.gz' encoding='ascii'>

... i.e. from the actual, bare, FASTQ file names, exactly as supplied to the pipeline. The FASTQs that . I might suggest that the simplest thing would be to work off the stdout line instead:

# stdout                                  : <_io.TextIOWrapper name='RAP1_UNINDUCED_REP1.umi_extract.fastq.gz' encoding='ascii'>

... but obviously we'd then we waiting on a release. MultiQC renaming looks good, and I just had a quick stab, but I can't see how to do it quickly (the MultiQC module really needs a file input for the renaming TSV).

To my mind we should probably get this merged (assuming you confirm the FASTQC fix works), and deal with this down the line.

MatthiasZepper mentioned this pull request May 29, 2024

MultiQC report is missing fastQC results on the dev branch #1303

Closed

pinin4fjords linked an issue May 30, 2024 that may be closed by this pull request

MultiQC report is missing fastQC results on the dev branch #1303

Closed

MatthiasZepper force-pushed the MultiQC_FastQC_bug branch from 5a8912a to e75b875 Compare June 3, 2024 12:24

MatthiasZepper force-pushed the MultiQC_FastQC_bug branch 3 times, most recently from c005701 to 3ad2adf Compare July 2, 2024 16:55

MatthiasZepper marked this pull request as ready for review July 3, 2024 13:06

MatthiasZepper requested a review from pinin4fjords July 3, 2024 13:07

MatthiasZepper added 9 commits July 4, 2024 11:40

Modify FastQC output publishing to comply with the paths in MultiQC c…

a2443e7

…onfig.

MultiQC config updates.

00d6113

Linting.

7dade35

Use the correct module names to supress the Percent Duplicate column …

1bf9390

…in the General Stats table of MultiQC.

Publish logfiles from umi-tools dedup steps.

850cb4c

Modify subworkflow bam_dedup_stats_samtools_umitools to publish dedup…

2cb8eff

… logs.

Update bam_dedup_stats_samtools_umitools subworkflow and MultiQC module.

6648b35

Add UMI information for Watchmaker mRNA Library Prep Kit.

2924f9e

Publish the UMI-tools extract logs for MultiQC.

8a4aa8a

Update MultiQC to v1.23

9230216

pinin4fjords changed the title ~~MultiQC report: Issues with FastQC~~ Various MultiQC issues: FastQC sections for raw and trimmed reads // umi-tools dedup and extraction plots, custom content styling. Jul 11, 2024

pinin4fjords approved these changes Jul 11, 2024

View reviewed changes

pinin4fjords mentioned this pull request Jul 11, 2024

Update dupradar.r according to new MultiQC (>=v1.22) config values nf-core/modules#5943

Merged

17 tasks

pinin4fjords added this to the 3.15.0 milestone Jul 11, 2024

Update dupradar from modules repo and remove patch.

8e42ab1

Update module for snapshot fix

d2d21cf

Fix prefixes for FASTQC

1c75669

MatthiasZepper merged commit 18d8d10 into nf-core:dev Jul 12, 2024
34 checks passed

MatthiasZepper deleted the MultiQC_FastQC_bug branch July 12, 2024 13:20

MatthiasZepper mentioned this pull request Jul 12, 2024

Improve/add UMI deduplication metrics #1277

Closed

This was referenced Jul 12, 2024

Add rename in the MultiQC report for samples without techreps #1341

Merged

Custom content plots missing in MultiQC ouptut #1332

Closed

pinin4fjords linked an issue Jul 19, 2024 that may be closed by this pull request

Custom content plots missing in MultiQC ouptut #1332

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various MultiQC issues: FastQC sections for raw and trimmed reads // umi-tools dedup and extraction plots, custom content styling. #1308

Various MultiQC issues: FastQC sections for raw and trimmed reads // umi-tools dedup and extraction plots, custom content styling. #1308

MatthiasZepper commented May 29, 2024 •

edited

Loading

github-actions bot commented May 29, 2024 •

edited

Loading

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

MatthiasZepper commented Jun 3, 2024

drpatelh commented Jun 19, 2024

pinin4fjords commented Jun 20, 2024

pinin4fjords commented Jun 20, 2024

MatthiasZepper commented Jun 20, 2024 •

edited

Loading

pinin4fjords commented Jun 20, 2024

MatthiasZepper commented Jun 20, 2024

pinin4fjords commented Jun 20, 2024

pinin4fjords commented Jun 21, 2024

MatthiasZepper commented Jul 3, 2024

pinin4fjords commented Jul 4, 2024

MatthiasZepper commented Jul 4, 2024 •

edited

Loading

pinin4fjords commented Jul 10, 2024

MatthiasZepper commented Jul 10, 2024

pinin4fjords left a comment •

edited

Loading

pinin4fjords commented Jul 11, 2024

pinin4fjords commented Jul 11, 2024

MatthiasZepper commented Jul 11, 2024

pinin4fjords commented Jul 11, 2024

MatthiasZepper commented Jul 11, 2024 •

edited

Loading

pinin4fjords commented Jul 11, 2024

pinin4fjords commented Jul 11, 2024

pinin4fjords commented Jul 11, 2024

pinin4fjords commented Jul 12, 2024

MatthiasZepper commented Jul 12, 2024 •

edited

Loading

pinin4fjords commented Jul 12, 2024 •

edited

Loading

pinin4fjords commented Jul 12, 2024

MatthiasZepper commented Jul 12, 2024

pinin4fjords commented Jul 12, 2024

Various MultiQC issues: FastQC sections for raw and trimmed reads // umi-tools dedup and extraction plots, custom content styling. #1308

Various MultiQC issues: FastQC sections for raw and trimmed reads // umi-tools dedup and extraction plots, custom content styling. #1308

Conversation

MatthiasZepper commented May 29, 2024 • edited Loading

Further issues

Inconsistent naming of FastQC output:

Duplicate column is actually shown in the General Statistics table (FIXED!)

umi-tools dedup stats not shown (Fixed)

PR checklist

github-actions bot commented May 29, 2024 • edited Loading

nf-core lint overall result: Passed ✅ ⚠️

❗ Test warnings:

❔ Tests ignored:

✅ Tests passed:

Run details

MatthiasZepper commented Jun 3, 2024

drpatelh commented Jun 19, 2024

pinin4fjords commented Jun 20, 2024

pinin4fjords commented Jun 20, 2024

MatthiasZepper commented Jun 20, 2024 • edited Loading

pinin4fjords commented Jun 20, 2024

MatthiasZepper commented Jun 20, 2024

pinin4fjords commented Jun 20, 2024

pinin4fjords commented Jun 21, 2024

MatthiasZepper commented Jul 3, 2024

pinin4fjords commented Jul 4, 2024

MatthiasZepper commented Jul 4, 2024 • edited Loading

pinin4fjords commented Jul 10, 2024

MatthiasZepper commented Jul 10, 2024

pinin4fjords left a comment • edited Loading

Choose a reason for hiding this comment

pinin4fjords commented Jul 11, 2024

pinin4fjords commented Jul 11, 2024

MatthiasZepper commented Jul 11, 2024

pinin4fjords commented Jul 11, 2024

MatthiasZepper commented Jul 11, 2024 • edited Loading

pinin4fjords commented Jul 11, 2024

pinin4fjords commented Jul 11, 2024

pinin4fjords commented Jul 11, 2024

pinin4fjords commented Jul 12, 2024

MatthiasZepper commented Jul 12, 2024 • edited Loading

pinin4fjords commented Jul 12, 2024 • edited Loading

pinin4fjords commented Jul 12, 2024

MatthiasZepper commented Jul 12, 2024

pinin4fjords commented Jul 12, 2024

MatthiasZepper commented May 29, 2024 •

edited

Loading

github-actions bot commented May 29, 2024 •

edited

Loading

`nf-core lint` overall result: Passed ✅ ⚠️

MatthiasZepper commented Jun 20, 2024 •

edited

Loading

MatthiasZepper commented Jul 4, 2024 •

edited

Loading

pinin4fjords left a comment •

edited

Loading

MatthiasZepper commented Jul 11, 2024 •

edited

Loading

MatthiasZepper commented Jul 12, 2024 •

edited

Loading

pinin4fjords commented Jul 12, 2024 •

edited

Loading