Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release PR for 2.0.0 #410

Merged
merged 418 commits into from
Aug 27, 2024
Merged

Release PR for 2.0.0 #410

merged 418 commits into from
Aug 27, 2024

Conversation

jasmezz
Copy link
Collaborator

@jasmezz jasmezz commented Jul 29, 2024

Added

Fixed

  • #343 Standardized the resulting workflow summary tables to always start with 'sample_id\tcontig_id\t..'. Reformatted the output of hamronization/summarize module. (by @Darcy220606)
  • #348 Updated samplesheet for pipeline tests to 'samplesheet_reduced.csv' with smaller datasets to reduce resource consumption. Updated prodigal module to fix pigz issue. Removed tests/ from .gitignore. (by @Darcy220606)
  • #362 Save annotations from bakta in subdirectories per sample. (by @jasmezz)
  • #363 Removed warning from DeepBGC usage docs. (by @jasmezz)
  • #365 Fixed AMRFinderPlus module and usage docs for manual database download. (by @jasmezz)
  • #371 Fixed AMRFinderPlus parameter arg_amrfinderplus_name. (by @m3hdad)
  • #376 Fixed an occasional RGI process failure when certain files not produced. (❤️ to @amizeranschi for reporting, fix by @amizeranschi & @jfy133)
  • #386 Updated DeepBGC module to fix output file names, separate annotation step for all BGC tools, add warning if no BGCs found, fix MultiQC reporting of annotation workflow. (by @jfy133, @jasmezz)
  • #392 & #397 Fixed a docker/singularity only error appearing when running with conda. (❤️ to @ewissel for reporting, fix by @jfy33 & @jasmezz)
  • #394 Fixed BGC input channel: pre-annotated input is picked up correctly now. (by @jfy133, @jasmezz)
  • #391 Skip hmmmsearch by default to not crash pipeline if user provides no HMM files, updated docs. (by @jasmezz)
  • #391 Made all "database" parameter names consistent. (by @jasmezz)
  • #397 Removed deprecated AMPcombi module, fixed variable name in BGC workflow, updated minor parts in docs (usage, parameter schema). (by @jasmezz)
  • #402 Fixed BGC length calculation for antiSMASH hits by comBGC. (by @jasmezz)
  • #406 Fixed prediction tools not being executed if annotation workflow skipped. (by @jasmezz)
  • #407 Fixed comBGC bug when parsing multiple antiSMASH files. (by @jasmezz)
  • #409 Fixed argNorm overwriting its output for DeepARG. (by @jasmezz, @jfy133)

Dependencies

Tool Previous version New version
AMPcombi 0.1.7 0.2.2
AMPlify 1.1.0 2.0.0
AMRFinderPlus 3.11.18 3.12.8
antiSMASH 6.1.1 7.1.0
argNorm NA 0.5.0
bioawk 1.0 NA
comBGC 1.6.1 1.6.2
DeepARG 1.0.2 1.0.4
DeepBGC 0.1.30 0.1.31
GECCO 0.9.8 0.9.10
hAMRonization 1.1.1 1.1.4
HMMER 3.3.2 3.4
MMSeqs NA 2:15.6f452
MultiQC 1.15 1.23
Pyrodigal 2.1.0 3.3.0
RGI 5.2.1 6.0.3
seqkit NA 2.8.1
tabix/htslib 1.11 1.19.1

Deprecated

  • #384 Deprecated AMPcombi and exchanged it with full suite of AMPcombi2 submodules. (by @Darcy220606)
  • #382 Optimised BGC screening run time and prevent crashes due to too-short contigs by adding contig length filtering for BGC workflow only. Bioawk is replaced with seqkit. (by @jfy133, @Darcy220606)

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/funcscan branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

jasmezz and others added 30 commits May 3, 2024 10:40
Important! Template update for nf-core/tools v2.14.1
…GC + taxonomy merge due to wrong sample names
Co-authored-by: Jasmin Frangenberg <73216762+jasmezz@users.noreply.github.com>
Co-authored-by: Jasmin Frangenberg <73216762+jasmezz@users.noreply.github.com>
assets/schema_input.json Outdated Show resolved Hide resolved
docs/output.md Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
nextflow_schema.json Outdated Show resolved Hide resolved
jasmezz and others added 2 commits August 6, 2024 10:52
Co-authored-by: Matthias Hörtenhuber <mashehu@users.noreply.github.com>
Copy link
Contributor

@adamrtalbot adamrtalbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Congrats on a herculean effort. Looks like a very good job.

I've left some comments about improvements you could make but none of these are blocking I would hate for a release of this magnitude to get delayed for minor code stuff. It is readable and maintainable. I assume it works so it's good to go!

.github/workflows/ci.yml Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
bin/comBGC.py Show resolved Hide resolved
bin/comBGC.py Show resolved Hide resolved
# grab the unique sample names from the taxonomy table
samples_taxa = taxa_df['sample_id'].unique()
# for every sampleID in taxadf merge the results
for sampleID in samples_taxa:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit too much looping over a Pandas DataFrame for my liking. I would consider using a vector operation if you can for memory and speed reasons.

Since this looks pretty quick I don't think it's critical but it may slow down on really large data.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Darcy220606 Comment for you on the taxa merge script – you might come back to this for 2.1?

conf/test_bgc_bakta.config Outdated Show resolved Hide resolved
docs/images/funcscan_metro_workflow.png Outdated Show resolved Hide resolved
when:
task.ext.when == null || task.ext.when

script: // This script is bundled with the pipeline, in nf-core/funcscan/bin/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

templates?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template for what do you mean? 😶

tests/test_taxonomy_pyrodigal.nf.test Show resolved Hide resolved
workflows/funcscan.nf Outdated Show resolved Hide resolved
luispedro added a commit to luispedro/argNorm that referenced this pull request Aug 21, 2024
Big change is adding GROOT support

Full Changelog:

- argNorm supports the GROOT v1.1.2 ARG annotation tool: https://github.com/will-rowe/groot
- GROOT support is via the `GrootNormalizer` (for use in python scripts) and the `groot` tool parameter with the `groot-db`, `groot-core-db`, `groot-argannot`, `groot-card`, and `groot-resfinder` `db` parameters in the CLI.

Other
-----

- `__version__` attribute added to the package (accessible as `argnorm.__version__` or `argnorm.lib.__version__`)
- Use atomic writing for outputs (https://github.com/untitaker/python-atomicwrites/tree/master)

funcscan integration
--------------------

- argNorm has been included as an nf-core module: https://nf-co.re/modules/argnorm/
- argNorm will also be available on the funcscan pipeline: nf-core/funcscan#410

DB harmonisation
----------------

- SARG db link was changed in `crude_db_harmonisation` to https://raw.githubusercontent.com/xinehc/args_oap/a3e5cff4a6c09f81e4834cfd9a31e6ce7d678d71/src/args_oap/db/sarg.fasta as old link (Galaxy instance, http://smile.hku.hk/SARGs) is down
- RGI outputs in `crude_db_harmonisation` are concatenated so frequencies of `perfect`, `strict`, and `loose` hits can be calculated from concatenated file
@jasmezz jasmezz merged commit 571d7eb into master Aug 27, 2024
57 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants