nf-core · jasmezz · Aug 20, 2024 · Aug 12, 2024 · Aug 12, 2024 · Aug 12, 2024
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,7 +7,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### `Breaking change`
 
-[#391](https://github.com/nf-core/funcscan/pull/391) Made all "database" parameter names consistent, skip hmmsearch by default. (by @jasmezz)
+- [#391](https://github.com/nf-core/funcscan/pull/391) Made all "database" parameter names consistent, skip hmmsearch by default. (by @jasmezz)
 
 | Old parameter                                    | New parameter                           |
 | ------------------------------------------------ | --------------------------------------- |
@@ -27,6 +27,9 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 | `amp_skip_hmmsearch`                             | `amp_run_hmmsearch`                     |
 | `bgc_skip_hmmsearch`                             | `bgc_run_hmmsearch`                     |
 
+- [#343](https://github.com/nf-core/funcscan/pull/343) Standardized the resulting workflow summary tables to always start with 'sample_id\tcontig_id\t..'. Reformatted the output of `hamronization/summarize` module. (by @darcy220606)
+- [#411](https://github.com/nf-core/funcscan/pull/411) Optimised hAMRonization input: only high-quality hits from fARGene output are reported. (by @jasmezz, @jfy133)
+
 ### `Added`
 
 - [#322](https://github.com/nf-core/funcscan/pull/322) Updated all modules: introduce environment.yml files. (by @jasmezz)
@@ -44,7 +47,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### `Fixed`
 
-- [#343](https://github.com/nf-core/funcscan/pull/343) Standardized the resulting workflow summary tables to always start with 'sample_id\tcontig_id\t..'. Reformatted the output of `hamronization/summarize` module. (by @darcy220606)
 - [#348](https://github.com/nf-core/funcscan/pull/348) Updated samplesheet for pipeline tests to 'samplesheet_reduced.csv' with smaller datasets to reduce resource consumption. Updated prodigal module to fix pigz issue. Removed `tests/` from `.gitignore`. (by @darcy220606)
 - [#362](https://github.com/nf-core/funcscan/pull/362) Save annotations from bakta in subdirectories per sample. (by @jasmezz)
 - [#363](https://github.com/nf-core/funcscan/pull/363) Removed warning from DeepBGC usage docs. (by @jasmezz)
@@ -53,7 +55,6 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [#376](https://github.com/nf-core/funcscan/pull/376) Fixed an occasional RGI process failure when certain files not produced. (❤️ to @amizeranschi for reporting, fix by @amizeranschi & @jfy133)
 - [#386](https://github.com/nf-core/funcscan/pull/386) Updated DeepBGC module to fix output file names, separate annotation step for all BGC tools, add warning if no BGCs found, fix MultiQC reporting of annotation workflow. (by @jfy133, @jasmezz)
 - [#392](https://github.com/nf-core/funcscan/pull/392) & [#397](https://github.com/nf-core/funcscan/pull/397) Fixed a docker/singularity only error appearing when running with conda. (❤️ to @ewissel for reporting, fix by @jfy33 & @jasmezz)
-- [#394](https://github.com/nf-core/funcscan/pull/394) Fixed BGC input channel: pre-annotated input is picked up correctly now. (by @jfy133, @jasmezz)
 - [#391](https://github.com/nf-core/funcscan/pull/391) Skip hmmmsearch by default to not crash pipeline if user provides no HMM files, updated docs. (by @jasmezz)
 - [#397](https://github.com/nf-core/funcscan/pull/397) Removed deprecated AMPcombi module, fixed variable name in BGC workflow, updated minor parts in docs (usage, parameter schema). (by @jasmezz)
 - [#402](https://github.com/nf-core/funcscan/pull/402) Fixed BGC length calculation for antiSMASH hits by comBGC. (by @jasmezz)

diff --git a/conf/modules.config b/conf/modules.config
@@ -279,13 +279,13 @@ process {
                 path: { "${params.outdir}/arg/fargene/${meta.id}" },
                 mode: params.publish_dir_mode,
                 saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
-                pattern: "*/{predictedGenes,retrievedFragments}/*"
+                pattern: "*/{hmmsearchresults,predictedGenes,retrievedFragments}/*"
             ],
             [
                 path: { "${params.outdir}/arg/fargene/${meta.id}/" },
                 mode: params.publish_dir_mode,
                 saveAs: { filename -> filename.equals('versions.yml') ? null : filename },
-                pattern: "*/{hmmsearchresults,tmpdir}/*",
+                pattern: "*/{tmpdir}/*",
                 enabled: params.arg_fargene_savetmpfiles
             ]
         ]

diff --git a/docs/output.md b/docs/output.md
@@ -327,7 +327,7 @@ Output Summaries:
 - `fargene/`
   - `fargene_analysis.log`: logging output that fARGene produced during its run
   - `<sample_name>/`:
-    - `hmmsearchresults/`: output from intermediate hmmsearch step (only if `--arg_fargene_savetmpfiles` supplied)
+    - `hmmsearchresults/`: output from intermediate hmmsearch step
     - `predictedGenes/`:
       - `*-filtered.fasta`: nucleotide sequences of predicted ARGs
       - `*-filtered-peptides.fasta`: amino acid sequences of predicted ARGs

diff --git a/docs/usage.md b/docs/usage.md
@@ -44,6 +44,31 @@ work            # Directory containing temporary files required for the run
 # Other nextflow hidden files, eg. history of pipeline runs and old logs
 ```
 
+If you wish to repeatedly use the same parameters for multiple runs, rather than specifying each flag in the command, you can specify these in a params file.
+
+Pipeline settings can be provided in a `yaml` or `json` file via `-params-file <file>`.
+
+:::warning
+Do not use `-c <file>` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args).
+:::
+
+The above pipeline run specified with a params file in yaml format:
+
+```bash
+nextflow run nf-core/funcscan -profile docker -params-file params.yaml
+```
+
+with `params.yaml` containing:
+
+```yaml
+input: './samplesheet.csv'
+outdir: './results/'
+genome: 'GRCh37'
+<...>
+```
+
+You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
+
 ## Samplesheet input
 
 nf-core/funcscan takes FASTA files as input, typically contigs or whole genome sequences. To supply these to the pipeline, you will need to create a samplesheet with information about the samples you would like to analyse. Use this parameter to specify its location.
@@ -95,13 +120,15 @@ The implementation of some tools in the pipeline may have some particular behavi
 
 MMseqs2 is currently the only taxonomic classification tool used in the pipeline to assign a taxonomic lineage to the input contigs. The database used to assign the taxonomic lineage can either be:
 
-- a custom based database created by the user using `mmseqs createdb` externally and beforehand. If this flag is assigned, this database takes precedence over the default database in `--mmseqs_db_id`.
+- A custom based database created by the user using `mmseqs createdb` externally and beforehand. If this flag is assigned, this database takes precedence over the default database in `--mmseqs_db_id`.
 
   ```bash
-  --taxa_classification_mmseqs_db 'path/to/mmsesqs_custom_database/dir'
+  --taxa_classification_mmseqs_db '<path>/<to>/<mmsesqs_custom_database>/<directory>'
   ```
 
-- an MMseqs2 ready database. These databases were compiled by the developers of MMseqs2 and can be called using their labels. All available options can be found [here](https://github.com/soedinglab/MMseqs2/wiki#downloading-databases). Only use those databases that have taxonomy files available (i.e., Taxonomy == Yes). By default mmseqs2 in the pipeline uses '[Kalamari](https://github.com/lskatz/Kalamari)', and runs an aminoacid based alignment. However, if the user requires a more comprehensive taxonomic classification, we recommend the use of [GTDB](https://gtdb.ecogenomic.org/), but for that please remember to increase the memory, CPU threads and time required for the process `MMSEQS_TAXONOMY`.
+  The contents of the directory should have files such as `<dbname>.version` and `<dbname>.taxonomy` in the top level.
+
+- An MMseqs2 ready database. These databases were compiled by the developers of MMseqs2 and can be called using their labels. All available options can be found [here](https://github.com/soedinglab/MMseqs2/wiki#downloading-databases). Only use those databases that have taxonomy files available (i.e., Taxonomy == Yes). By default mmseqs2 in the pipeline uses '[Kalamari](https://github.com/lskatz/Kalamari)', and runs an aminoacid based alignment. However, if the user requires a more comprehensive taxonomic classification, we recommend the use of [GTDB](https://gtdb.ecogenomic.org/), but for that please remember to increase the memory, CPU threads and time required for the process `MMSEQS_TAXONOMY`.
 
   ```bash
   --taxa_classification_mmseqs_db_id 'Kalamari'
@@ -146,9 +173,11 @@ tar xvzf db.tar.gz
 And then passed to the pipeline with:
 
 ```bash
---annotation_bakta_db /<path>/<to>/db/
+--annotation_bakta_db /<path>/<to>/<db>/
 ```
 
+The contents of the directory should have files such as `*.dmnd` in the top level.
+
 :::info
 The flag `--save_db` saves the pipeline-downloaded databases in your results directory. You can then move these to a central cache directory of your choice for re-use in the future.
 :::
@@ -174,9 +203,11 @@ Ensure to wrap this path in double quotes if using an asterisk, to ensure Nextfl
 For AMPcombi, nf-core/funcscan will by default download the most recent version of the [DRAMP](http://dramp.cpu-bioinfor.org/) database as a reference database for aligning the AMP hits in the AMP workflow. However, the user can also supply their own custom AMP database by following the guidelines in [AMPcombi](https://github.com/Darcy220606/AMPcombi). This can then be passed to the pipeline with:
 
 ```bash
---amp_ampcombi_db '/<path>/<to>/<amp_ref_database>
+--amp_ampcombi_db '/<path>/<to>/<ampcombi_database>
 ```
 
+The contents of the directory should have files such as `*.dmnd` and `*.fasta` in the top level.
+
 :::warning
 The pipeline will automatically run Pyrodigal instead of Prodigal if the parameters `--run_annotation_tool prodigal --run_amp_screening` are both provided. This is due to an incompatibility issue of Prodigal's output `.gbk` file with multiple downstream tools.
 :::
@@ -210,21 +241,28 @@ conda activate abricate
 
 ## Download the bacmet2 database
 abricate-get_db --db bacmet2 ## the logging will tell you where the database is downloaded to, e.g. /home/<user>/bin/miniconda3/envs/abricate/db/bacmet2/sequences
+```
 
-## Run nextflow
-nextflow run nf-core/funcscan -r <version> -profile docker --input samplesheet.csv --outdir <outdir> --run_arg_screening --arg_abricate_db /home/<user>/bin/miniconda3/envs/abricate/db/ --arg_abricate_db_id bacmet2
+The resulting directory and database name can be passed to the pipeline as follows
+
+```bash
+--arg_abricate_db /<path>/<to>/<abricate>/db/ --arg_abricate_db_id bacmet2
 ```
 
+The contents of the directory should have a directory named with the database name in the top level (e.g. `bacmet2/`).
+
 ### AMRFinderPlus
 
 AMRFinderPlus relies on NCBI's curated Reference Gene Database and curated collection of Hidden Markov Models.
 
 nf-core/funcscan will download this database for you, unless the path to a local version is given with:
 
 ```bash
---arg_amrfinderplus_db '/<path>/<to>/<amrfinderplus_db>/'
+--arg_amrfinderplus_db '/<path>/<to>/<amrfinderplus_db>/latest'
 ```
 
+You must give the `latest` directory to the pipeline, and the contents of the directory should include files such as `*.nbd`, `*.nhr`, `versions.txt` etc. in the top level.
+
 To obtain a local version of the database:
 
 1. Install AMRFinderPlus from [bioconda](https://bioconda.github.io/recipes/ncbi-amrfinderplus/README.html?highlight=amrfinderplus). To ensure database compatibility, please use the same version as is used in your nf-core/funcscan release (check version in file `<installation>/<path>/funcscan/modules/nf-core/amrfinderplus/run/environment.yml`).
@@ -284,6 +322,8 @@ You can then supply the path to resulting database directory with:
 --arg_deeparg_db '/<path>/<to>/<deeparg>/<db>/'
 ```
 
+The contents of the directory should include directories such as `database`, `moderl`, and files such as `deeparg.gz` etc. in the top level.
+
 Note that if you supply your own database that is not downloaded by the pipeline, make sure to also supply `--arg_deeparg_db_version` along
 with the version number so hAMRonization will correctly display the database version in the summary report.
 
@@ -304,6 +344,8 @@ You can then supply the path to resulting database directory with:
 --arg_rgi_db '/<path>/<to>/<card>/'
 ```
 
+The contents of the directory should include files such as `card.json`, `aro_index.tsv`, `snps.txt` etc. in the top level.
+
 :::info
 The flag `--save_db` saves the pipeline-downloaded databases in your results directory. You can then move these to a central cache directory of your choice for re-use in the future.
 :::
@@ -324,24 +366,34 @@ To supply the database directories to the pipeline:
 
 ```bash
 --bgc_antismash_db '/<path>/<to>/<antismash>/<db>/'
---bgc_antismash_installdir '/<path>/<to>/<antismash>/<dir>/'
+--bgc_antismash_installdir '/<path>/<to>/<antismash>/<dir>/antismash'
 ```
 
-Note that the names of the supplied folders must differ from each other (e.g. `antismash_db` and `antismash_dir`). If they are not provided, the databases will be auto-downloaded upon each BGC screening run of the pipeline.
+The contents of the database directory should include directories such as `as-js/`, `clusterblast/`, `clustercompare/` etc. in the top level.
+The contents of the installation directory should include directories such as `common/` `config/` and files such as `custom_typing.py` `custom_typing.pyi` etc. in the top level.
 
 :::info
-The flag `--save_db` saves the pipeline-downloaded databases in your results directory. You can then move these to a central cache directory of your choice for re-use in the future.
+If installing with conda, the installation directory will be `lib/python3.10/site-packages/antismash` from the base directory of your conda install or conda environment directory.
 :::
 
+Note that the names of the two required folders must differ from each other (i.e., the `--bgc_antismash_db` directory must not be called `antismash`).
+If they are not provided, the databases will be auto-downloaded upon each BGC screening run of the pipeline.
+
 :::info
-If installing with conda, the installation directory will be `lib/python3.10/site-packages/antismash` from the base directory of your conda install or conda environment directory.
+The flag `--save_db` saves the pipeline-downloaded databases in your results directory. You can then move these to a central cache directory of your choice for re-use in the future.
 :::
 
 ### DeepBGC
 
 DeepBGC relies on trained models and Pfams to run its analysis. nf-core/funcscan will download these databases for you. If the flag `--save_db` is set, the downloaded files will be stored in the output directory under `databases/deepbgc/`.
 
-Alternatively, if you already downloaded the database locally with `deepbgc download`, you can indicate the path to the database folder with `--bgc_deepbgc_db <path>/<to>/<deepbgc_db>/`. The folder has to contain the subfolders as in the database folder downloaded by `deepbgc download`:
+Alternatively, if you already downloaded the database locally with `deepbgc download`, you can indicate the path to the database folder with:
+
+```bash
+--bgc_deepbgc_db <path>/<to>/<deepbgc_db>/
+```
+
+The contents of the database directory should include directories such as `common`, `0.1.0` in the top level.
 
 ```console
 deepbgc_db/
@@ -354,31 +406,6 @@ deepbgc_db/
     └── myDetectors*.pkl
 ```
 
-If you wish to repeatedly use the same parameters for multiple runs, rather than specifying each flag in the command, you can specify these in a params file.
-
-Pipeline settings can be provided in a `yaml` or `json` file via `-params-file <file>`.
-
-:::warning
-Do not use `-c <file>` to specify parameters as this will result in errors. Custom config files specified with `-c` must only be used for [tuning process resource specifications](https://nf-co.re/docs/usage/configuration#tuning-workflow-resources), other infrastructural tweaks (such as output directories), or module arguments (args).
-:::
-
-The above pipeline run specified with a params file in yaml format:
-
-```bash
-nextflow run nf-core/funcscan -profile docker -params-file params.yaml
-```
-
-with `params.yaml` containing:
-
-```yaml
-input: './samplesheet.csv'
-outdir: './results/'
-genome: 'GRCh37'
-<...>
-```
-
-You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch).
-
 ## Updating the pipeline
 
 When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline:

diff --git a/modules.json b/modules.json
@@ -55,6 +55,11 @@
                         "git_sha": "4e5f4687318f24ba944a13609d3ea6ebd890737d",
                         "installed_by": ["modules"]
                     },
+                    "argnorm": {
+                        "branch": "master",
+                        "git_sha": "e4fc46af5ec30070e6aef780aba14f89a28caa88",
+                        "installed_by": ["modules"]
+                    },
                     "bakta/bakta": {
                         "branch": "master",
                         "git_sha": "9d0f89b445e1f5b2fb30476f4be9a8b519c07846",
@@ -87,7 +92,7 @@
                     },
                     "fargene": {
                         "branch": "master",
-                        "git_sha": "a7231cbccb86535529e33859e05d19ac93f3ea04",
+                        "git_sha": "9cf6f5e4ad9cc11a670a94d56021f1c4f9a91ec1",
                         "installed_by": ["modules"]
                     },
                     "gecco/run": {
@@ -205,11 +210,6 @@
                         "git_sha": "4e5f4687318f24ba944a13609d3ea6ebd890737d",
                         "installed_by": ["modules"],
                         "patch": "modules/nf-core/untar/untar.diff"
-                    },
-                    "argnorm": {
-                        "branch": "master",
-                        "git_sha": "e4fc46af5ec30070e6aef780aba14f89a28caa88",
-                        "installed_by": ["modules"]
                     }
                 }
             },