-
Notifications
You must be signed in to change notification settings - Fork 664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow the concurrent run of multiple pipeline revisions #2870
Comments
@pditommaso one question: do we need to check if the repo is present in the |
Good point. If already exists think should report a warning message maybe? |
so, report with a warning message (maybe with some instructions to remove the current repo) and stop the command, right? |
No I mean, show a warning message i.e. |
ah ok, so if I understand correctly we try to identify which kind of repo we are working on at startup
|
I see your point. In principle the bare should have been created when the feature has been enabled with a config flag or env variable, right? If so, I think when this option does not match the repo format a warning should be reported |
@jorgeaguileraseqera any ETA for this? |
Hope to have in these days (it's a little tedious due to the API rate limit breaks sometimes to run all tests ) |
Can you please open at least a draft PR asap? |
Do you mean Github rate limits? Are you using your GITHUB_TOKEN for tests? |
yes, I've created one and configured the env to run the tests |
Weird, but for such tests it should not depends on GitHub. It can created a small test repos and then use it for testing. There's something similar for testing Git submodules |
Implementing the functionality in this issue would also solve issue #2655 . Maybe also, clarify in the issue title that "concurrent run" is only for runs from different working directories (with different work/ and .nextflow subdirs). |
Note that we're talking about the |
We lost the momentum with this feature :/ |
Hi! This was recently brought to my attention. Just flagging that this would likely impact our engineers who might be developing on different feature branches but on the same workflow repo, on our development environments (which currently only run on our on-prem infrastructure). |
Impacting in a good or bad way? |
Hi @pditommaso impact in a bad way, I'm afraid! Our current idea for developing workflows within our organisation is for engineers to have their own branch in a workflow repository. They would implement changes in their own branch, and potentially run said workflows on our on-prem infrastructure to test their implementations. I believe that due to this bug, the engineers would end up over-writing each other's workflow implementations, if multiple implementation of the same workflow are tested at the same time? |
Understand, but it's not a bug. Nextflow has always worked in this way. The goal of this issue is exactly to overcome this limitation |
Paolo I have found a git functionality for this. Let's # for ease of description
ROOT_DIR="/path/to/.nextflow/assets"
repo="nextflow-io/hello"
revision="rocket"
def_remote="origin"
# user
nextflow run $repo -r $revision
# behind the scenes
# only if revision is not there already
if [ ! -d $ROOT_DIR/$repo/$revision ] ; then
# first revision requested
if [ ! -d $ROOT_DIR/$repo ] ; then
mkdir -p $ROOT_DIR/$repo/first
git clone -b $revision https://github.com/$repo $ROOT_DIR/$repo/first
cd $ROOT_DIR/$repo/first
def_branch=$( git remote show $def_remote | sed -n '/HEAD branch/s/.*: //p' )
cd ..
mv first $def_branch
ln -s $def_branch first_branch
# additional revision
else
cd $ROOT_DIR/$repo/first_branch
git worktree add --track -b $revision ../$revision $def_remote/$revision
fi
fi The key functionality is this one: git worktree add --track -b dsl2 ../dsl2 origin/dsl2 Docs: https://git-scm.com/docs/git-worktree Found here: https://stackoverflow.com/questions/2048470/git-working-on-two-branches-simultaneously What do you think? If you like it, I can give it a shot myself, soon after I have worked on another couple of pending work items. |
Forgot to mention the key advantage: only the repo file tree is duplicated, whereas all the Git related files such as in |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
@pditommaso keen on your take on my proposed solution before I work on the implementation |
This is indeed an excellent idea. This could simplify the solution compared to the use of the bare repository approach. Using the worktree solution, the main/master checkout should remain in the current location. Instead, when Likely use of the |
maybe this path for non-master revsions: |
Apol @pditommaso , had to prioritise other activities with larger customer impact. I am keen to get this one done, on top of my list for when I am back in January. |
Ideally, the worktree should be checked out with all submodules recursively cloned, or there should be an option to do so. But if this complicates things, can be left for a later release. Thanks a lot for working on this! |
Working on it. Proposed steps for way forward:
At this stage, I believe 1. can already be good enough. In its basic implementation it would duplicate the So, going to proceed with 1. to begin with. |
Just double checking if this feature has been implemented. I cannot find a link to any doc clearly indicating this feature is now working. Thank you. |
Summary
Nextflow relies on built-in integration with Git to pull and run a workflow.
When the user specifies the Git repository URL on then run command line, Nextflow carry out a Git clone command, stores the pipeline code into the
$HOME/.nextflow/assets
directory and launch the execution from there.When the user specifies the
-r
(revision) CLI option, the repository is checked out at the specified revision ie. branch, tag or even commit id.This however poses a problem when if two or more users run different versions at the same time, because the last performing the operation would override the previous repository code, which could be a disruptive operation.
This is not such an unlikely event considering a pipeline execution can last for hours or even days.
To mitigate this problem nextflow refuses to perform a run if the project is currently checkout to a non-default version and the
run
does not specify the revision to be executed in an explicit manner. However, this is the cause of other unexpected side effects. See here.Goal
The goal of this enhancement is to allow the concurrent use of multiple pipeline revision in the same computer and deprecated the need for the stick revision check.
This could be achieved by downloading the Git repository with bare clone instead of a normal clone, and checkout the work tree into a separate subdirectory named as the commit id associated with the specified revision.
For example, if the user runs
nextflow should clone the repo above with the bare option and store in the path
$HOME/.nextflow/assets/nextflow-io/hello.git
Then implicitly the default branch is checkout, therefore the associate commit should be retrieved e.g.
4eab81bd42eed592f4371cd91b755ec78df25fe9
, therefore the following path should be created containing the work tree accessible for the executionWhen the user-specified a different revision e.g.
A new subdirectory with the corresponding commit id should be created.
The commit id should be resolved against the local git clone, unless the
-latest
option is specified.The text was updated successfully, but these errors were encountered: