Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.0.0-beta #318

Merged
merged 15 commits into from
Jun 19, 2024
6 changes: 3 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ process {
ext.conda = "$projectDir/environments/pgscatalog_utils/environment.yml"
ext.docker = 'ghcr.io/pgscatalog/pygscatalog'
ext.singularity = 'oras://ghcr.io/pgscatalog/pygscatalog'
ext.docker_version = ':pgscatalog-utils-1.0.2'
ext.singularity_version = ':pgscatalog-utils-1.0.2-singularity'
ext.docker_version = ':pgscatalog-utils-1.1.2'
ext.singularity_version = ':pgscatalog-utils-1.1.2-singularity'
}

withLabel: plink2 {
Expand Down Expand Up @@ -79,7 +79,7 @@ process {
ext.singularity = 'oras://ghcr.io/pgscatalog/fraposa_pgsc'
ext.singularity_version = ':v0.1.0-singularity'
ext.docker = 'ghcr.io/pgscatalog/fraposa_pgsc'
ext.docker_version = ':v0.1.0'
ext.docker_version = ':v0.1.1'
}

// output configuration
Expand Down
1 change: 1 addition & 0 deletions docs/_templates/globaltoc.html
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ <h3>Contents</h3>
<li><a href="{{ pathto('explanation/index') }}">Explanations</a></li>
<ul>
<li><a href="{{ pathto('explanation/plink2') }}">Why not use plink2?</a></li>
<li><a href="{{ pathto('explanation/match') }}">Match rate errors</a></li>
<li><a href="{{ pathto('explanation/geneticancestry') }}">Adjusting PGS with genetic ancestry</a></li>
<li><a href="{{ pathto('explanation/output') }}">Outputs & report</a></li>
</ul>
Expand Down
1 change: 1 addition & 0 deletions docs/explanation/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ Explanation
:maxdepth: 1

output
match
geneticancestry
plink2
63 changes: 63 additions & 0 deletions docs/explanation/match.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
.. _matchrates:

Why do I get match rate errors?
===============================

When you're running the PGS Catalog Calculator you might see errors like:

.. code-block:: console

pgscatalog.core.lib.pgsexceptions.ZeroMatchesError: All scores fail to meet match threshold 0.75

You might also see some scoring files in the report are coloured red, and are excluded from the output.

By default pgsc_calc will continue calculating if at least one score passes the **match rate threshold**, which is controlled by the ``--min_overlap`` parameter.

The default parameter is 0.75, this was chosen because on our experiences applying PGS to new cohorts where most scores will score better than this threshold.

If scores match your target genome poorly it's typically because a problem with input data (target genomes or scoring files).

What is matching?
-----------------

The calculator carefully checks that variants (rows) in a scoring file are present in your target genomes.

The matching procedure `is described in the preprint supplement <https://www.medrxiv.org/content/10.1101/2024.05.29.24307783v1.supplementary-material>`_.

The matching procedure never makes any changes to target genome data and only seeks to match variants in the scoring file to the genome.

Adjusting ``--min_overlap`` is a bad idea
------------------------------------------

The aim of the PGS Catalog Calculator is to faithfully recalculate scores submitted by authors to the PGS Catalog on new target genomes.

If few variants in a published scoring file are present in a target genome, then the calculated score isn't a good representation of the original published score.

When you evaluate the predictive performance of a score with low match rates it will be less likely to reproduce the metrics reported in the PGS Catalog.

If you reduce ``--min_overlap`` then the calculator will output scores calculated with the remaining variants, **but these scores may not be representative of the original data submitted to the PGS Catalog.**

Are your target genomes imputed? Are they WGS?
----------------------------------------------

The calculator assumes that target genotyping data were called from a limited number of markers on a genotyping array and imputed using a larger reference panel to increase variant density.

WGS data are not natively supported by the calculator (as homozygous REF sites are excluded from the variant sites). However, it's `possible to create compatible gVCFs from WGS data. <https://github.com/PGScatalog/pgsc_calc/discussions/123#discussioncomment-6469422>`_

In the future we plan to improve support for WGS.

Did you set the correct genome build?
-------------------------------------

The calculator will automatically grab scoring files in the correct genome build from the PGS Catalog. If match rates are low it may be because you have specified the wrong genome build. If you're using custom scoring files and the match rate is low it is possible that the `--liftover` command may have been omitted.

I'm still getting match rate errors. How do I figure out what's wrong?
----------------------------------------------------------------------

Problems with matching are normally because of problems with input data rather than the matching procedure.

If you're trying to reproduce a specific score and are experiencing problems, then some manual work is required.

Try checking the full variant matching log to see which variants are missing, which will be present in the work directory reported in the Nextflow error.

It can be a good idea to manually search your target genotypes for missing variants to see what's happening.
42 changes: 39 additions & 3 deletions docs/how-to/offline.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,10 @@ pgsc_calc has been deployed on secure platforms like Trusted Research
Environments (TREs). Running pgsc_calc is a little bit more difficult in this
case. The basic set up approach is to:

1. Download containers
2. Download reference data
3. Download scoring files
1. Set up Nextflow
2. Download containers
3. Download reference data
4. Download scoring files

And transfer everything to your offline environment.

Expand All @@ -22,6 +23,41 @@ if you are having problems and we'll try our best to help you.

.. _open a discussion on Github: https://github.com/PGScatalog/pgsc_calc/discussions

Set up Nextflow
----------------

From the Nextflow documentation for `offline usage <https://www.nextflow.io/docs/latest/plugins.html#offline-usage>`_:

1. Run the test profile of the calculator with ``nextflow run pgscatalog/pgsc_calc -r test,<docker/singularity/conda>``

.. tip::

It doesn't matter if the profile you use on your computer with internet access is different to the profile you use in the airlocked environment.

The important thing is that Nextflow automatically configures itself using an internet connection.

2. Transfer the Nextflow binary and ``$NXF_HOME/.nextflow`` directory to your airlocked environment

.. tip::

``$NXF_HOME`` is ``$HOME`` by default, so the directory is probably ``~/.nextflow``

.. warning::

Make sure to transfer the Nextflow binary even if the airlocked environment already has Nextflow installed. It's important that the Nextflow versions match across both environments.

3. Remember to always set the environment variable ``NXF_OFFLINE='true'`` in the offline environment


.. tip::

You shouldn't need to:

1. Edit any Nextflow configuration files
2. Manually download any plugins

Unless you want to use a special plugin in the airlocked environment

Preload container images
------------------------

Expand Down
10 changes: 5 additions & 5 deletions environments/fraposa/environment.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: fraposa
name: fraposa-pgsc
channels:
- conda-forge
- bioconda
dependencies:
- python=3.10
- pip
- pip:
- fraposa-pgsc==0.1.0
- fraposa-pgsc=0.1.1
10 changes: 5 additions & 5 deletions environments/pgscatalog_utils/environment.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name: pgscatalog_utils
name: pgscatalog-utils
channels:
- conda-forge
- bioconda
dependencies:
- python=3.11
- pip
- pip:
- pgscatalog_utils==1.0.2
- pgscatalog.utils=1.1.2
9 changes: 5 additions & 4 deletions environments/pyyaml/environment.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
name: pyyaml
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- python=3.10
- pip
- pip:
- pyyaml==6.0
- pyyaml=6.0.1
2 changes: 1 addition & 1 deletion environments/report/environment.yml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name: report
channels:
- conda-forge
- defaults
- bioconda
dependencies:
- r-jsonlite
- r-dplyr
Expand Down
3 changes: 3 additions & 0 deletions environments/zstd/environment.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
name: zstd
channels:
- conda-forge
- bioconda
dependencies:
- zstd=1.4.8
3 changes: 2 additions & 1 deletion lib/WorkflowMain.groovy
Original file line number Diff line number Diff line change
Expand Up @@ -12,11 +12,12 @@ class WorkflowMain {
public static String citation(workflow) {
return "If you use ${workflow.manifest.name} for your analysis please cite:\n\n" +
"* The Polygenic Score Catalog\n" +
" https://doi.org/10.1101/2024.05.29.24307783\n" +
" https://doi.org/10.1038/s41588-021-00783-5\n\n" +
"* The nf-core framework\n" +
" https://doi.org/10.1038/s41587-020-0439-x\n\n" +
"* Software dependencies\n" +
" https://github.com/${workflow.manifest.name}/blob/master/CITATIONS.md"
" https://github.com/${workflow.manifest.name}/blob/main/CITATIONS.md"
}


Expand Down
10 changes: 5 additions & 5 deletions main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,13 @@ nextflow.enable.dsl = 2
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*/

include { paramsHelp } from 'plugin/nf-schema'

// Print help message if needed
if (params.help) {
def logo = NfcoreTemplate.logo(workflow, params.monochrome_logs)
def citation = '\n' + WorkflowMain.citation(workflow) + '\n'
def String command = '\n' + "\$ nextflow run ${workflow.manifest.name} -profile test,docker" + '\n'
log.info logo + command + citation + NfcoreTemplate.dashedLine(params.monochrome_logs)
System.exit(0)
log.info paramsHelp("nextflow run pgscatalog/pgsc_calc --input input_file.csv")
log.info "See https://pgsc-calc.readthedocs.io/en/latest/getting-started.html for more help"
exit 0
}

WorkflowMain.initialise(workflow, params, log, args)
Expand Down
2 changes: 1 addition & 1 deletion modules/local/ancestry/ancestry_analysis.nf
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ process ANCESTRY_ANALYSIS {

cat <<-END_VERSIONS > versions.yml
${task.process.tokenize(':').last()}:
pgscatalog_utils: \$(echo \$(python -c 'import pgscatalog_utils; print(pgscatalog_utils.__version__)'))
pgscatalog.calc: \$(echo \$(python -c 'import pgscatalog.calc; print(pgscatalog.calc.__version__)'))
END_VERSIONS
"""
}
2 changes: 1 addition & 1 deletion modules/local/ancestry/oadp/fraposa_pca.nf
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ process FRAPOSA_PCA {

cat <<-END_VERSIONS > versions.yml
${task.process.tokenize(':').last()}:
fraposa: TODO
fraposa_pgsc: \$(echo \$(python -c 'import fraposa_pgsc; print(fraposa_pgsc.__version__)'))
END_VERSIONS
"""
}
2 changes: 1 addition & 1 deletion modules/local/ancestry/oadp/fraposa_project.nf
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ process FRAPOSA_PROJECT {

cat <<-END_VERSIONS > versions.yml
${task.process.tokenize(':').last()}:
fraposa: TODO
fraposa_pgsc: \$(echo \$(python -c 'import fraposa_pgsc; print(fraposa_pgsc.__version__)'))
END_VERSIONS
"""
}
2 changes: 1 addition & 1 deletion modules/local/ancestry/relabel_afreq.nf
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ process RELABEL_AFREQ {

cat <<-END_VERSIONS > versions.yml
${task.process.tokenize(':').last()}:
pgscatalog_utils: \$(echo \$(python -c 'import pgscatalog_utils; print(pgscatalog_utils.__version__)'))
pgscatalog.core: \$(echo \$(python -c 'import pgscatalog.core; print(pgscatalog.core.__version__)'))
END_VERSIONS
"""
}
2 changes: 1 addition & 1 deletion modules/local/ancestry/relabel_ids.nf
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ process RELABEL_IDS {

cat <<-END_VERSIONS > versions.yml
${task.process.tokenize(':').last()}:
pgscatalog_utils: \$(echo \$(python -c 'import pgscatalog_utils; print(pgscatalog_utils.__version__)'))
pgscatalog.core: \$(echo \$(python -c 'import pgscatalog.core; print(pgscatalog.core.__version__)'))
END_VERSIONS
"""
}
2 changes: 1 addition & 1 deletion modules/local/ancestry/relabel_scorefiles.nf
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ process RELABEL_SCOREFILES {

cat <<-END_VERSIONS > versions.yml
${task.process.tokenize(':').last()}:
pgscatalog_utils: \$(echo \$(python -c 'import pgscatalog_utils; print(pgscatalog_utils.__version__)'))
pgscatalog.core: \$(echo \$(python -c 'import pgscatalog.core; print(pgscatalog.core.__version__)'))
END_VERSIONS
"""
}
2 changes: 0 additions & 2 deletions modules/local/match_variants.nf
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ process MATCH_VARIANTS {

script:
def args = task.ext.args ?: ''
def fast = params.fast_match ? '--fast' : ''
def ambig = params.keep_ambiguous ? '--keep_ambiguous' : ''
def multi = params.keep_multiallelic ? '--keep_multiallelic' : ''
def match_chrom = meta.chrom.contains("ALL") ? '' : "--chrom $meta.chrom"
Expand All @@ -42,7 +41,6 @@ process MATCH_VARIANTS {
$match_chrom \
$ambig \
$multi \
$fast \
--outdir \$PWD \
-v

Expand Down
40 changes: 25 additions & 15 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,6 @@ params {
normalization_method = "empirical mean mean+var"
n_normalization = 4


// compatibility params
liftover = false
target_build = null
Expand All @@ -54,11 +53,9 @@ params {
min_overlap = 0.75
keep_ambiguous = false
keep_multiallelic = false
fast_match = false
copy_genomes = false
genotypes_cache = null


// Debug params
only_bootstrap = false
only_input = false
Expand All @@ -68,9 +65,6 @@ params {
only_score = false
skip_ancestry = true

// deprecated params
platform = null

// Boilerplate options
outdir = "$launchDir/results"
publish_dir_mode = 'copy'
Expand All @@ -96,14 +90,6 @@ params {
max_memory = '128.GB'
max_cpus = 16
max_time = '240.h'

// Schema validation default options
validationFailUnrecognisedParams = false
validationLenientMode = false
validationSchemaIgnoreParams = 'genomes,igenomes_base,platform,only_bootstrap,only_input,only_compatible,only_match,only_score'
validationShowHiddenParams = false
validate_params = true

}

// Load base.config by default for all pipelines
Expand Down Expand Up @@ -270,7 +256,7 @@ manifest {
description = 'The Polygenic Score Catalog Calculator is a nextflow pipeline for polygenic score calculation'
mainScript = 'main.nf'
nextflowVersion = '>=23.10.0'
version = '2.0.0-alpha.6'
version = '2.0.0-beta'
}

// Load modules.config for DSL2 module specific options
Expand Down Expand Up @@ -308,3 +294,27 @@ def check_max(obj, type) {
}
}
}

plugins {
id 'nf-schema@2.0.0' // validation of parameters
id 'nf-prov@1.2.2' // workflow provenance
}

prov {
enabled = true
formats {
bco {
file = "${params.outdir}/pipeline_info/manifest_${trace_timestamp}.bco.json"
}
}
}

validation {
// Schema validation default options
monochromeLogs = params.monochrome_logs
failUnrecognisedParams = false
lenientMode = false
defaultIgnoreParams = ['platform']
ignoreParams = ['genomes','igenomes_base',',only_bootstrap','only_input','only_compatible','only_match','only_score']
showHiddenParams = false
}
Loading
Loading