Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make module/subworkflow names case insensitive #2869

Merged
merged 2 commits into from
Mar 21, 2024
Merged

Conversation

pmoris
Copy link
Contributor

@pmoris pmoris commented Mar 18, 2024

PR checklist

  • This comment contains a description of changes (with reason)
  • CHANGELOG.md is updated
  • If you've fixed a bug or added code that should be tested, add tests!
  • Documentation in docs is updated

Quick attempt at a fix for #2759.

EDIT: New approach is summarized in the main commit message body. Original PR history preserved below:


I went for the naive approach and simply converted all occurrences of tool/subworkflow names to lowercase, whenever they were being used as an input to a function.

A cleaner solution might be to use a decorator for this, but I couldn't find an option for click (token normalization only appears to work on choices or options, not arguments).

Alternatively, a check could be implemented inside the various functions of the ComponentCommand class, but I must admit I felt a bit lost trying to understand the overall structure of how these are currently implementend. I couldn't identify a uniform approach to verify input options used by the various components:

  • The ComponentInstall class contains a collect_and_verify_name() function where the input can be standardized.
  • ComponentUpdate uses a different approach than ComponentPatch, which itself has a remove function that differs from ModuleRemove.
  • Need to take into account single module install vs automatic collection of all modules inside a workflow.
  • Then there's another install option used by synced_repo, which does not reuse the ComponentInstall class?

If my current fix is "good enough", I will update the changelog and look at tests, but if another approach is preferred, I'll need some help from someone who's more familiar with the code base.

Copy link

codecov bot commented Mar 18, 2024

Codecov Report

Attention: Patch coverage is 88.88889% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 74.80%. Comparing base (9e756c0) to head (2cdb907).
Report is 1 commits behind head on dev.

Files Patch % Lines
nf_core/__main__.py 88.88% 2 Missing ⚠️
Additional details and impacted files

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -818,7 +818,7 @@ def modules_install(ctx, tool, dir, prompt, force, sha):
ctx.obj["modules_repo_branch"],
ctx.obj["modules_repo_no_pull"],
)
exit_status = module_install.install(tool)
exit_status = module_install.install(tool.lower())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.casefold() might be safer than .lower(). See here for an explanation and here for an example of it in action.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd never heard of this one! Good to know!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though in this case, I don't think it matters - module names must already be simple alphanumeric, I don't think that you could get this far with ß or anything 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is new to me as well, thanks!

Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding .lower() every time that tool is referenced, I think that it'd be simpler to just do the following right at the start of def modules_install?

tool = tool.lower()

Then it only needs to be called once.

@pmoris
Copy link
Contributor Author

pmoris commented Mar 19, 2024

I've incorporated both of your suggestions! Added a small comment to each call to make it more explicit what is happening too.

I also renamed one of the subworkflow function arguments from tool to subworkflow, similar to the other subworkflow function arguments.

To deal with the AttributeError: 'NoneType' object has no attribute 'casefold', would you prefer me to change all lines to

subworkflow = subworkflow.casefold() if subworkflow is not None

or just add AttributeError to the list of exceptions in the except block?

@adamrtalbot
Copy link
Contributor

My guidelines:

  1. Avoid doing something if it's not possible (i.e. if there is no subworkflow, don't call subworkflows_info)
  2. If it is possible and you control it use an if statement
  3. If it is possible and you don't control it (and you don't want to kill the software when it happens) use an except

@awgymer
Copy link
Contributor

awgymer commented Mar 19, 2024

The modules/subworkflow commands tend to allow the user to supply a module/subworkflow name OR provide nothing and get prompted with a dialogue.

So it's entirely expected that subworkflow (or module or tool) could be None in most cases.

In this case I wonder if it is better to actually use a callback in the parameter declaration to avoid faffing with the case when a user supplies no option?

e.g.

def lowerstr(x):
    return x.casefold()

...

@click.argument("subworkflow", type=str, callback=lowerstr, required=False, metavar="subworkflow name")

NB: That code is a prototype YMMV

@awgymer
Copy link
Contributor

awgymer commented Mar 19, 2024

Turns out that callback still fires when there is no value supplied I believe.

So I actually think perhaps a custom click parameter type is best:


class CasefoldedStr(click.ParamType):
    name = "casefold_str"

    def convert(self, value, param, ctx):
        if value:
            return str(value).casefold()


@click.argument("subworkflow", type=CasefoldedStr(), required=False, metavar="subworkflow name")

@pmoris
Copy link
Contributor Author

pmoris commented Mar 19, 2024

I'm currently also mapping out the different functions that can ever receive a module/subworkflow name. Seems like it's possible to do the conversion there, after any other verifications and after the user selection prompt code for when no name is supplied.

If that doesn't work out, the click approach looks promising!

Will report back later!

@awgymer
Copy link
Contributor

awgymer commented Mar 19, 2024

Oh nice, might just be a case of checking all the init for the underlying component commands and switching self.component = component to self.component = component.casefold() if component else None

@pmoris
Copy link
Contributor Author

pmoris commented Mar 19, 2024

Actually, on first glance your initial suggestion seems to work just fine @awgymer:

def normalise_case(ctx, param, component_name):
    if component_name is not None:
        return component_name.casefold()

@click.argument("tool", type=str, callback=normalise_case, required=False, metavar="<tool> or <tool/subtool>")

If a component is provided by the user, it is converted to lowercase. If not, the value remains None and all downstream user choice prompts should still fire as expected. Slight caveat here 1.

I'll try converting things to this format tomorrow and report back.

Footnotes

  1. if I understand the docs correctly, a callback function without a return should cause nothing to happen (https://click.palletsprojects.com/en/8.1.x/api/#click.Command.callback), but from my brief testing, it seems that None is still being assigned to the value in those cases. Fortunately, either way works for our intended use case.

@awgymer
Copy link
Contributor

awgymer commented Mar 19, 2024

Yes, your callback works here as you are catching the None case explicitly, I was not.

@pmoris pmoris force-pushed the case branch 2 times, most recently from 846196a to 7ff4f55 Compare March 20, 2024 08:33
@pmoris
Copy link
Contributor Author

pmoris commented Mar 20, 2024

There we go. This seems like a far better solution. See details in the commit message: 78b2a77 (I rebased locally to clean things up).

Thanks for the input everyone, I've added you all as co-authors, but you might need to verify them.

Some remaining questions:

  • Should I update the changelog myself or is this done whenever dev is merged into master?
  • Are there any tests I should add? Or is only internal logic checked (i.e., python functions and methods, not simulated CLI invocations).
  • What's up with these failings tests?
[INFO] This may take a few minutes...
ruff.....................................................................Failed
- hook id: ruff
- exit code: 1
- files were modified by this hook

Found 1 error (1 fixed, 0 remaining).

ruff-format..............................................................Passed
prettier.................................................................Passed
Check .editorconfig rules................................................Passed
mypy.....................................................................Passed
Error: Process completed with exit code 1.

@adamrtalbot
Copy link
Contributor

What's up with these failings tests?

You should be able to run Ruff locally, see the problem and fix it. I'll take a look now...

@pmoris
Copy link
Contributor Author

pmoris commented Mar 20, 2024

What's up with these failings tests?

You should be able to run Ruff locally, see the problem and fix it. I'll take a look now...

I am running ruff via the VSCode extension, that's why I was so confused :o

@awgymer
Copy link
Contributor

awgymer commented Mar 20, 2024

Seems like the branch was out of sync with dev?
Looks like updating it fixed the linting

@pmoris
Copy link
Contributor Author

pmoris commented Mar 20, 2024

branch was out of sync with dev?
Looks like updating it fixed the linting

I thought I had run a git rebase upstream/dev prior to pushing, but I guess I must have messed up somehow... Cheers!

@awgymer
Copy link
Contributor

awgymer commented Mar 20, 2024

GitHub usually gives you a button at the bottom of the MR to update the branch to be in sync (assuming it won't create conflicts) 😄

@pmoris
Copy link
Contributor Author

pmoris commented Mar 20, 2024

Updated the changelog (squashed into previous commit); ruff and rebase are no longer acting up this time around either :)

pmoris and others added 2 commits March 20, 2024 15:11
Uses a click callback function to normalize the name of a module/subworkflow
that is provided by the user on the CLI, so that names written in uppercase
still resolve to the expected lowercase name.

In case no argument is provided, the callback function simply returns None,
and thus the current behaviour of a user being asked to select a component
via an interactive prompt is maintained.

The .casefold() method is used in favour of .lower() for slightly
more robust case normalization.
See: https://stackoverflow.com/questions/45745661/lower-vs-casefold-in-string-matching-and-converting-to-lowercase

The callback is applied to all occurrences of "tool/module/subworkflow" arguments,
except those in the the "create" functions, because these already contain an internal
check that prompts to user to adhere to the lowercase naming convention.

Updates to CHANGELOG.md.

Co-authored-by: Adam Talbot <adam.talbot@seqera.io>
Co-authored-by: Phil Ewels <phil.ewels@seqera.io>
Co-authored-by: Arthur Gymer <24782660+awgymer@users.noreply.github.com>
The subworkflow info command referred to the subworkflow name
as "tool" in its list of arguments. It has been renamed to
"subworkflow" to align with the other subworkflow commands.
Copy link
Member

@ewels ewels left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@pmoris pmoris merged commit 1148c4e into nf-core:dev Mar 21, 2024
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants