Skip to content

Commit

Permalink
Merge pull request #109 from cokelaer/main
Browse files Browse the repository at this point in the history
add levenshtein function
  • Loading branch information
cokelaer authored Aug 22, 2024
2 parents 09c5cbe + 1958859 commit 1ac37fc
Show file tree
Hide file tree
Showing 13 changed files with 79 additions and 20 deletions.
10 changes: 5 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -110,7 +110,7 @@ The **sequana_pipetools** package provide a standalone called **sequana_pipetool

.. figure:: https://raw.githubusercontent.com/sequana/sequana_pipetools/main/doc/UI.png

There are currently 3-4 main application. The first one is for Linux users under
There are several applications. The first one is for Linux users under
bash to obtain completion of a sequana pipeline command line arguments::

sequana_pipetools --completion fastqc
Expand All @@ -126,7 +126,7 @@ This is used within th pipeline but can be used manually as well and is useful t
The following command provides statistics about Sequana pipelines installed on your system (number of rules, wrappers
used)::

sequana_pipetools -- stats
sequana_pipetools --stats

And for developpers, a quick creation of schema file given a config file (experimental, developers would still need to edit the schema but it does 90% of the job)::

Expand Down Expand Up @@ -283,15 +283,14 @@ so, you will need to create an environment variable::

export SEQUANA_WRAPPERS="git+file:///home/user/github/sequana-wrappers"

If you decide to use singularity/apptainer, one common error on a cluster is that non-standard paths are not found. You can bind them using the -B option but a more general set up is to create thos environment variable::
If you decide to use singularity/apptainer, one common error on a cluster is that non-standard paths are not found. You can bind them using the -B option but a more general set up is to create this environment variable::

export SINGULARITY_BINDPATH="/path_to_bind"

for Singularity setup, or ::
for Apptainer setup ::

export APPTAINER_BINDPATH="/path_to_bind"

for Apptainer setup.


What is Sequana ?
Expand All @@ -318,6 +317,7 @@ Changelog
========= ======================================================================
Version Description
========= ======================================================================
1.0.3 * add levenshtein function. some typo corrections.
1.0.2 * add the dot2png command. pin docutils <0.21 due to pip error
1.0.1 * hot fix in the profile creation (regression)
1.0.0 * Stable release
Expand Down
Binary file added doc/_static/logo_256x256.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -157,11 +157,11 @@
# html_title = None

# A shorter title for the navigation bar. Default is the same as html_title.
html_short_title = "sequana"
html_short_title = "sequana_pipetools"

# The name of an image file (relative to this directory) to place at the top
# of the sidebar.
# html_logo = "../../share/data/images/crx_logo.png"
html_logo = "_static/logo_256x256.png"

# The name of an image file (within the static path) to use as favicon of the
# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-backend = "poetry.core.masonry.api"
#maintainer ?#maintainer email
[tool.poetry]
name = "sequana_pipetools"
version = "1.0.2"
version = "1.0.3"
description = "A set of tools to help building or using Sequana pipelines"
authors = ["Sequana Team"]
license = "BSD-3"
Expand Down
4 changes: 2 additions & 2 deletions sequana_pipetools/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,14 @@ def get_package_version(package_name):

from easydev.logging_tools import Logging

logger = Logging("sequana_pipetools", "WARNING")
logger = Logging("sequana_pipetools", "WARNING", "cyan")
# To keep the inheritance/propagation of levels. Logging from easydev will do
# the formatting only.
import colorlog

logger = colorlog.getLogger(logger.name)

from .misc import url2hash
from .misc import url2hash, levenshtein_distance
from .sequana_manager import SequanaManager # , get_pipeline_location
from .snaketools import (
Pipeline,
Expand Down
6 changes: 4 additions & 2 deletions sequana_pipetools/info.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,13 @@


sequana_epilog = Colors().purple(
"""If you use or like the Sequana project,
"""\U00002139\U0000FE0F To use completion, type 'sequana_pipetools --completion NAME'
\u2705 If you use or like the Sequana project,
please consider citing us (visit sequana.readthedocs.io for details) or use this
citation:
Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of
\U0001F4DA Cokelaer et al, (2017), ‘Sequana’: a Set of Snakemake NGS pipelines, Journal of
Open Source Software, 2(16), 352, JOSS DOI doi:10.21105/joss.00352
Expand Down
51 changes: 50 additions & 1 deletion sequana_pipetools/misc.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,58 @@

from sequana_pipetools import get_package_version

__all__ = ["Colors", "print_version", "error", "url2hash"]
__all__ = ["Colors", "print_version", "error", "url2hash", "levenshtein_distance"]


def levenshtein_distance(token1: str, token2: str) -> int:
"""Computes the Levenshtein distance between two strings using dynamic programming.
The Levenshtein distance is a measure of the minimum number of single-character edits
(insertions, deletions, or substitutions) required to change one word into the other.
:param str token1: The first input string.
:param str token2: The second input string.
:return: Levenshtein distance between the two input strings.
Example::
>>> levenshtein_distance("kitten", "sitting")
3
>>> levenshtein_distance("flaw", "lawn")
2
Notes:
- The function uses a 2D list to store the distances, which requires O(m * n) space,
where m and n are the lengths of the input strings.
- The time complexity is O(m * n) since each cell of the matrix is filled once.
"""
len1, len2 = len(token1), len(token2)

# Initialize the matrix with zeros
distances = [[0 for _ in range(len2 + 1)] for _ in range(len1 + 1)]

# Fill the first row and column
for t1 in range(len1 + 1):
distances[t1][0] = t1

for t2 in range(len2 + 1):
distances[0][t2] = t2

# Compute the Levenshtein distance
for t1 in range(1, len1 + 1):
for t2 in range(1, len2 + 1):
if token1[t1 - 1] == token2[t2 - 1]:
distances[t1][t2] = distances[t1 - 1][t2 - 1]
else:
distances[t1][t2] = min(
distances[t1][t2 - 1], # Insertion
distances[t1 - 1][t2], # Deletion
distances[t1 - 1][t2 - 1] # Substitution
) + 1

return distances[len1][len2]

def url2hash(url):
md5hash = hashlib.md5()
md5hash.update(url.encode())
Expand Down
2 changes: 1 addition & 1 deletion sequana_pipetools/options.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,7 +198,7 @@ def from_project_callback(ctx, param, value):
if not value:
return
else:
# When --from-project is called, all value of arguments are are replaced by the ones
# When --from-project is called, all value of arguments are replaced by the ones
# found in the config file. Therefore, users may ommit all arguments. However, some
# may be compulsary, so we need to reset all 'required' arguments to False
for option in ctx.command.params:
Expand Down
7 changes: 4 additions & 3 deletions sequana_pipetools/scripts/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -227,15 +227,16 @@ def main(**kwargs):
subprocess.call(cmd.split())

elif kwargs["completion"]:
name = kwargs["completion"]

if kwargs["force"] is True:
choice = "y"
else: # pragma: no cover
msg = "This action will replace files stored in ./config/sequana/pipelines. Do you want to proceed y/n: "
msg = f"This action will replace the {name}.sh file stored in ~/.config/sequana/pipelines. Do you want to proceed y/n: "
choice = input(msg)
if choice != "y": # pragma: no cover
sys.exit(0)

name = kwargs["completion"]
try:
c = ClickComplete(name)
c.save_completion_script()
Expand All @@ -244,7 +245,7 @@ def main(**kwargs):
finally:
click.echo("Please source the files using:: \n")
click.echo(" source ~/.config/sequana/pipelines/{}.sh".format(name))
click.echo("\nto activate the completion")
click.echo("\nto activate the completion. Add the line above in your environement")
elif kwargs["stats"]:
wrappers, rules = get_pipeline_statistics()
click.echo("\n ==== Number of wrappers per pipeline")
Expand Down
1 change: 1 addition & 0 deletions sequana_pipetools/snaketools/pipeline_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -172,6 +172,7 @@ def teardown(self, extra_dirs_to_remove=[], extra_files_to_remove=[], outdir="."
print("\u2705 Another successful analysis. Open summary.html in your browser. Have fun.")
else:
print("\u2705 Another successful analysis. Have fun.")
print("\u2705 Please consider citing us would you use Sequana in your research. See https://sequana.readthedocs.io or cite: \n\n\tCokelaer et al. Sequana': a Set of Snakemake NGS pipelines, (2007) JOSS 2(16)")

def get_html_summary(self, float="left", width=30):
import pandas as pd
Expand Down
2 changes: 1 addition & 1 deletion sequana_pipetools/snaketools/profile.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def create_profile(workdir: Path, profile: str, **kwargs) -> str:
with open(profile_file, "r") as fin:
profile_text = fin.read()
profile_text = profile_text.format(**kwargs)
except AttributeError:
except AttributeError: #pragma: no cover
# python 3.8 support for back compatibility
with resources.path("sequana_pipetools.resources", f"{profile}.yaml") as profile_file:
profile_text = profile_file.read_text().format(**kwargs)
Expand Down
2 changes: 1 addition & 1 deletion sequana_pipetools/snaketools/sequana_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ def check_config_with_schema(self, schemafile):
try:
ext_name = resources.files("sequana_pipetools.resources").joinpath("ext.py")
extensions = [str(ext_name)]
except AttributeError:
except AttributeError: #pragma: no cover
with resources.path("sequana_pipetools.resources", "ext.py") as ext_name:
extensions = [str(ext_name)]

Expand Down
8 changes: 7 additions & 1 deletion tests/test_misc.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
from sequana_pipetools.misc import Colors, print_version, error, url2hash
from sequana_pipetools.misc import Colors, print_version, error, url2hash, levenshtein_distance



def test_levenshtein():
assert levenshtein_distance("kitten", "sitting") == 3
assert levenshtein_distance("flaw", "lawn") == 2


def test_url2hash():
Expand Down

0 comments on commit 1ac37fc

Please sign in to comment.