Skip to content

Commit

Permalink
Merge pull request #84 from gbouras13/mmseqs
Browse files Browse the repository at this point in the history
Mmseqs
  • Loading branch information
gbouras13 authored Nov 20, 2024
2 parents f844d5d + 7e19b60 commit 1c739b9
Show file tree
Hide file tree
Showing 153 changed files with 111,690 additions and 4,026 deletions.
5 changes: 2 additions & 3 deletions .github/workflows/ci.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,20 +22,19 @@ jobs:
fetch-depth: 0

# Setup env
- uses: conda-incubator/setup-miniconda@v2
- uses: conda-incubator/setup-miniconda@v3
with:
activate-environment: dnaapler
environment-file: build/environment.yaml
python-version: ${{ matrix.python-version }}
auto-activate-base: false
miniforge-variant: Mambaforge
channels: conda-forge,bioconda,defaults
channel-priority: strict
auto-update-conda: true
- name: Install project
shell: bash -l {0}
run: |
mamba install python=${{ matrix.python-version }}
conda install python=${{ matrix.python-version }}
python -m pip install --upgrade pip
pip install -e .
pip install black
Expand Down
3 changes: 1 addition & 2 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,12 @@ jobs:

steps:
- uses: actions/checkout@v2
- uses: conda-incubator/setup-miniconda@v2
- uses: conda-incubator/setup-miniconda@v3
with:
python-version: 3.9
activate-environment: dnaapler
environment-file: build/environment.yaml
auto-activate-base: false
miniforge-variant: Mambaforge
channels: conda-forge,bioconda,defaults
channel-priority: strict
auto-update-conda: true
Expand Down
9 changes: 9 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# History

# 1.0.0 (2024-11-21)

* **BREAKING CHANGE** - `dnaapler` now uses `MMSeqs2` rather than `BLAST`. You will need to install `MMSeqs` if you upgrade (if you use conda, it should be handled for you)
* There are 2 reasons for this:
1. Users reported problems installing BLAST on MacOS with Apple Silicon (see e.g. [here](https://github.com/gbouras13/pharokka/issues/368)). MMseqs works on all platforms and is dilligently maintained.
2. MMSeqs2 is much much faster than BLAST (what took BLAST a few of minutes takes MMSeqs2 seconds). We should have written `dnaapler` with `MMseqs2` to begin with.
* The alignment resuls may not be identicial (i.e. they might find specifically different top hits), but the actual reorientation is likely to be identical (at least in my tests). Please reach out or make an issue if you notice any discrepancies.


# 0.8.1 (2024-09-16)

* Minor release - adds `--db dnaa,repa,cog1474` as an option for `dnaapler all` to allow for archaea orientation in hybracter
Expand Down
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,11 +52,22 @@ Additionally, please consider citing the dependencies where relevant:
```
Altschul S.F., Gish W., Miller W., Myers E.W., Lipman D.J. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2. PMID: 2231712.
Steinegger M, Söding J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol. 2017 Nov;35(11):1026-1028. doi: 10.1038/nbt.3988.
Larralde, M., (2022). Pyrodigal: Python bindings and interface to Prodigal, an efficient method for gene prediction in prokaryotes. Journal of Open Source Software, 7(72), 4296, https://doi.org/10.21105/joss.04296.
Hyatt, D., Chen, GL., LoCascio, P.F. et al. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11, 119 (2010). https://doi.org/10.1186/1471-2105-11-119.
```

## v1.0.0

* **BREAKING CHANGE** - `dnaapler` now uses `MMSeqs2 v13.45111` rather than `BLAST`. You will need to install `MMSeqs` if you upgrade (if you use conda, it should be handled for you)
* There are 2 reasons for this:
1. Users reported problems installing BLAST on MacOS with Apple Silicon (see e.g. [here](https://github.com/gbouras13/pharokka/issues/368)). MMseqs works on all platforms and is dilligently maintained.
2. MMSeqs2 is much much faster than BLAST (what took BLAST a few of minutes takes MMSeqs2 seconds). We should have written `dnaapler` with `MMseqs2` to begin with. `MMSeqs2 v13.45111` was chosen to ensure interoperability with [pharokka](https://github.com/gbouras13/pharokka)
* The alignment resuls may not be identicial to ` dnaapler v0.8.1` (i.e. they might find specifically different top hits), but the actual reorientation is likely to be identical (at least in my tests). Please reach out or make an issue if you notice any discr


# Google Colab Notebooks

If you don't want to install `dnaapler` locally, you can run `dnaapler all` without any code using the [Google Colab notebook](https://colab.research.google.com/github/gbouras13/dnaapler/blob/master/run_dnaapler.ipynb).
Expand All @@ -65,6 +76,7 @@ If you don't want to install `dnaapler` locally, you can run `dnaapler all` with
- [dnaapler](#dnaapler)
- [Quick Start](#quick-start)
- [Paper](#paper)
- [v1.0.0](#v100)
- [Google Colab Notebooks](#google-colab-notebooks)
- [Table of Contents](#table-of-contents)
- [Description](#description)
Expand All @@ -86,7 +98,7 @@ If you don't want to install `dnaapler` locally, you can run `dnaapler all` with
<img src="paper/Dnaapler_figure.png" alt="Dnaapler Figure">
</p>

`dnaapler` is a simple python program that takes a single nucleotide input sequence (in FASTA format), finds the desired start gene using `blastx` against an amino acid sequence database, checks that the start codon of this gene is found, and if so, then reorients the chromosome to begin with this gene on the forward strand.
`dnaapler` is a simple python program that takes a single nucleotide input sequence (in FASTA format), finds the desired start gene using `MMseqs2` against an amino acid sequence database, checks that the start codon of this gene is found, and if so, then reorients the chromosome to begin with this gene on the forward strand.

It was originally designed to replicate the reorientation functionality of [Unicycler](https://github.com/rrwick/Unicycler/blob/main/unicycler/gene_data/repA.fasta) with dnaA, but for for long-read first assembled chromosomes. We have extended it to work with plasmids (`dnaapler plasmid`) and phages (`dnaapler phage`), or for any input FASTA desired with `dnaapler custom`, `dnaapler mystery` or `dnaapler nearest`.

Expand Down Expand Up @@ -177,7 +189,7 @@ Options:
-t, --threads INTEGER Number of threads to use with BLAST [default: 1]
-p, --prefix TEXT Prefix for output files [default: dnaapler]
-f, --force Force overwrites the output directory
-e, --evalue TEXT e value for blastx [default: 1e-10]
-e, --evalue TEXT e value for MMseqs2 [default: 1e-10]
--ignore PATH Text file listing contigs (one per row) that are to
be ignored
-a, --autocomplete TEXT Choose an option to autocomplete reorientation if
Expand Down
2 changes: 1 addition & 1 deletion build/environment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ channels:
- bioconda
- defaults
dependencies:
- blast >=2.9
- mmseqs2 ==13.45111
- just
- poetry
- python >=3.8,<4.0
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "dnaapler"
version = "0.8.1" # change VERSION too
version = "1.0.0" # change VERSION too
description = "Reorients assembled microbial sequences"
authors = ["George Bouras <george.bouras@adelaide.edu.au>"]
license = "MIT"
Expand Down
Loading

0 comments on commit 1c739b9

Please sign in to comment.