diff --git a/.github/.dockstore.yml b/.github/.dockstore.yml index 030138a..191fabd 100644 --- a/.github/.dockstore.yml +++ b/.github/.dockstore.yml @@ -3,3 +3,4 @@ version: 1.2 workflows: - subclass: nfl primaryDescriptorPath: /nextflow.config + publish: True diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 538f355..284970f 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -9,9 +9,7 @@ Please use the pre-filled template to save time. However, don't be put off by this template - other more general issues and suggestions are welcome! Contributions to the code are even more welcome ;) -> If you need help using or modifying nf-core/hic then the best place to ask is on the nf-core -Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). - +> If you need help using or modifying nf-core/hic then the best place to ask is on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). ## Contribution workflow @@ -20,8 +18,9 @@ If you'd like to write some code for nf-core/hic, the standard workflow is as fo 1. Check that there isn't already an issue about your idea in the [nf-core/hic issues](https://github.com/nf-core/hic/issues) to avoid duplicating work * If there isn't one already, please create one so that others know you're working on this 2. [Fork](https://help.github.com/en/github/getting-started-with-github/fork-a-repo) the [nf-core/hic repository](https://github.com/nf-core/hic) to your GitHub account -3. Make the necessary changes / additions within your forked repository -4. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged +3. Make the necessary changes / additions within your forked repository following [Pipeline conventions](#pipeline-contribution-conventions) +4. Use `nf-core schema build .` and add any new parameters to the pipeline JSON schema (requires [nf-core tools](https://github.com/nf-core/tools) >= 1.10). +5. Submit a Pull Request against the `dev` branch and wait for the code to be reviewed and merged If you're not used to this workflow with git, you can start with some [docs from GitHub](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests) or even their [excellent `git` resources](https://try.github.io/). @@ -32,14 +31,14 @@ Typically, pull-requests are only fully reviewed when these tests are passing, t There are typically two types of tests that run: -### Lint Tests +### Lint tests `nf-core` has a [set of guidelines](https://nf-co.re/developers/guidelines) which all pipelines must adhere to. To enforce these and ensure that all pipelines stay in sync, we have developed a helper tool which runs checks on the pipeline code. This is in the [nf-core/tools repository](https://github.com/nf-core/tools) and once installed can be run locally with the `nf-core lint ` command. If any failures or warnings are encountered, please follow the listed URL for more documentation. -### Pipeline Tests +### Pipeline tests Each `nf-core` pipeline should be set up with a minimal set of test-data. `GitHub Actions` then runs the pipeline on this data to ensure that it exits successfully. @@ -55,8 +54,75 @@ These tests are run both with the latest available version of `Nextflow` and als * A PR should be made on `master` from patch to directly this particular bug. ## Getting help -For further information/help, please consult the [nf-core/hic documentation](https://nf-co.re/nf-core/hic/docs) and -don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel -([join our Slack here](https://nf-co.re/join/slack)). -For further information/help, please consult the [nf-core/hic documentation](https://nf-co.re/hic/docs) and don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). +For further information/help, please consult the [nf-core/hic documentation](https://nf-co.re/hic/usage) and don't hesitate to get in touch on the nf-core Slack [#hic](https://nfcore.slack.com/channels/hic) channel ([join our Slack here](https://nf-co.re/join/slack)). + +## Pipeline contribution conventions + +To make the nf-core/hic code and processing logic more understandable for new contributors and to ensure quality, we semi-standardise the way the code and other contributions are written. + +### Adding a new step + +If you wish to contribute a new step, please use the following coding standards: + +1. Define the corresponding input channel into your new process from the expected previous process channel +2. Write the process block (see below). +3. Define the output channel if needed (see below). +4. Add any new flags/options to `nextflow.config` with a default (see below). +5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`). +6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter). +7. Add sanity checks for all relevant parameters. +8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`. +9. Do local tests that the new code works properly and as expected. +10. Add a new test command in `.github/workflow/ci.yaml`. +11. If applicable add a [MultiQC](https://https://multiqc.info/) module. +12. Update MultiQC config `assets/multiqc_config.yaml` so relevant suffixes, name clean up, General Statistics Table column order, and module figures are in the right order. +13. Optional: Add any descriptions of MultiQC report sections and output files to `docs/output.md`. + +### Default values + +Parameters should be initialised / defined with default values in `nextflow.config` under the `params` scope. + +Once there, use `nf-core schema build .` to add to `nextflow_schema.json`. + +### Default processes resource requirements + +Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels. + +The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block. + +### Naming schemes + +Please use the following naming schemes, to make it easy to understand what is going where. + +* initial process channel: `ch_output_from_` +* intermediate and terminal channels: `ch__for_` + +### Nextflow version bumping + +If you are using a new feature from core Nextflow, you may bump the minimum required version of nextflow in the pipeline with: `nf-core bump-version --nextflow . [min-nf-version]` + +### Software version reporting + +If you add a new tool to the pipeline, please ensure you add the information of the tool to the `get_software_version` process. + +Add to the script block of the process, something like the following: + +```bash + --version &> v_.txt 2>&1 || true +``` + +or + +```bash + --help | head -n 1 &> v_.txt 2>&1 || true +``` + +You then need to edit the script `bin/scrape_software_versions.py` to: + +1. Add a Python regex for your tool's `--version` output (as in stored in the `v_.txt` file), to ensure the version is reported as a `v` and the version number e.g. `v2.1.1` +2. Add a HTML entry to the `OrderedDict` for formatting in MultiQC. + +### Images and figures + +For overview images and other documents we follow the nf-core [style guidelines and examples](https://nf-co.re/developers/design_guidelines). diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md index dea1805..81c1e33 100644 --- a/.github/ISSUE_TEMPLATE/bug_report.md +++ b/.github/ISSUE_TEMPLATE/bug_report.md @@ -1,3 +1,9 @@ +--- +name: Bug report +about: Report something that is broken or incorrect +labels: bug +--- + +## Check Documentation + +I have checked the following places for your error: + +- [ ] [nf-core website: troubleshooting](https://nf-co.re/usage/troubleshooting) +- [ ] [nf-core/hic pipeline documentation](https://nf-co.re/hic/usage) + ## Description of the bug @@ -22,6 +35,13 @@ Steps to reproduce the behaviour: +## Log files + +Have you provided the following extra information/files: + +- [ ] The command used to run the pipeline +- [ ] The `.nextflow.log` file + ## System - Hardware: @@ -35,7 +55,7 @@ Steps to reproduce the behaviour: ## Container engine -- Engine: +- Engine: - version: - Image tag: diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml new file mode 100644 index 0000000..887f045 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/config.yml @@ -0,0 +1,8 @@ +blank_issues_enabled: false +contact_links: + - name: Join nf-core + url: https://nf-co.re/join + about: Please join the nf-core community here + - name: "Slack #hic channel" + url: https://nfcore.slack.com/channels/hic + about: Discussion about the nf-core/hic pipeline diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md index ab6ff7c..2cec9b3 100644 --- a/.github/ISSUE_TEMPLATE/feature_request.md +++ b/.github/ISSUE_TEMPLATE/feature_request.md @@ -1,3 +1,9 @@ +--- +name: Feature request +about: Suggest an idea for the nf-core/hic pipeline +labels: enhancement +--- + ## Is your feature request related to a problem? Please describe @@ -25,4 +30,3 @@ Please delete this text and anything that's not relevant from the template below ## Additional context - diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index 0bdd575..ab821a6 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -10,12 +10,18 @@ Remember that PRs should be made against the dev branch, unless you're preparing Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/hic/tree/master/.github/CONTRIBUTING.md) --> + ## PR checklist -- [ ] This comment contains a description of changes (with reason) -- [ ] `CHANGELOG.md` is updated +- [ ] This comment contains a description of changes (with reason). - [ ] If you've fixed a bug or added code that should be tested, add tests! -- [ ] Documentation in `docs` is updated -- [ ] If necessary, also make a PR on the [nf-core/hic branch on the nf-core/test-datasets repo](https://github.com/nf-core/test-datasets/pull/new/nf-core/hic) - + - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py` + - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/hic/tree/master/.github/CONTRIBUTING.md) + - [ ] If necessary, also make a PR on the nf-core/hic _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository. +- [ ] Make sure your code lints (`nf-core lint .`). +- [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`). +- [ ] Usage Documentation in `docs/usage.md` is updated. +- [ ] Output Documentation in `docs/output.md` is updated. +- [ ] `CHANGELOG.md` is updated. +- [ ] `README.md` is updated (including new tool citations and authors/contributors). diff --git a/.github/markdownlint.yml b/.github/markdownlint.yml index 96b12a7..8d7eb53 100644 --- a/.github/markdownlint.yml +++ b/.github/markdownlint.yml @@ -1,5 +1,12 @@ # Markdownlint configuration file -default: true, +default: true line-length: false no-duplicate-header: siblings_only: true +no-inline-html: + allowed_elements: + - img + - p + - kbd + - details + - summary diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml index 8344dc5..cefb14a 100644 --- a/.github/workflows/awsfulltest.yml +++ b/.github/workflows/awsfulltest.yml @@ -1,10 +1,23 @@ name: nf-core AWS full size tests -# This workflow is triggered on push to the master branch. +# This workflow is triggered on published releases. +# It can be additionally triggered manually with GitHub actions workflow dispatch. # It runs the -profile 'test_full' on AWS batch on: - release: - types: [published] + workflow_run: + workflows: ["nf-core Docker push (release)"] + types: [completed] + workflow_dispatch: + + +env: + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} + AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} + AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} + AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} + jobs: run-awstest: @@ -13,7 +26,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Setup Miniconda - uses: goanpeca/setup-miniconda@v1.0.2 + uses: conda-incubator/setup-miniconda@v2 with: auto-update-conda: true python-version: 3.7 @@ -23,13 +36,6 @@ jobs: # Add full size test data (but still relatively small datasets for few samples) # on the `test_full.config` test runs with only one set of parameters # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command - env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} - AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} - AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} - AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} run: | aws batch submit-job \ --region eu-west-1 \ diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml index 94bca24..c9eafe6 100644 --- a/.github/workflows/awstest.yml +++ b/.github/workflows/awstest.yml @@ -1,11 +1,20 @@ name: nf-core AWS test # This workflow is triggered on push to the master branch. -# It runs the -profile 'test' on AWS batch +# It can be additionally triggered manually with GitHub actions workflow dispatch. +# It runs the -profile 'test' on AWS batch. on: - push: - branches: - - master + workflow_dispatch: + + +env: + AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} + AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} + TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} + AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} + AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} + AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} + jobs: run-awstest: @@ -14,7 +23,7 @@ jobs: runs-on: ubuntu-latest steps: - name: Setup Miniconda - uses: goanpeca/setup-miniconda@v1.0.2 + uses: conda-incubator/setup-miniconda@v2 with: auto-update-conda: true python-version: 3.7 @@ -23,17 +32,10 @@ jobs: - name: Start AWS batch job # For example: adding multiple test runs with different parameters # Remember that you can parallelise this by using strategy.matrix - env: - AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }} - AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }} - TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }} - AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }} - AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }} - AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }} run: | aws batch submit-job \ --region eu-west-1 \ --job-name nf-core-hic \ --job-queue $AWS_JOB_QUEUE \ --job-definition $AWS_JOB_DEFINITION \ - --container-overrides '{"command": ["nf-core/hic", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/hic/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/hic/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'hic/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' + --container-overrides '{"command": ["nf-core/hic", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/hic/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/hic/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}' diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml index 04dbb3d..3521022 100644 --- a/.github/workflows/branch.yml +++ b/.github/workflows/branch.yml @@ -2,7 +2,7 @@ name: nf-core branch protection # This workflow is triggered on PRs to master branch on the repository # It fails when someone tries to make a PR against the nf-core `master` branch instead of `dev` on: - pull_request: + pull_request_target: branches: [master] jobs: @@ -13,7 +13,7 @@ jobs: - name: Check PRs if: github.repository == 'nf-core/hic' run: | - { [[ ${{github.event.pull_request.head.repo.full_name}} == nf-core/hic ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] + { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/hic ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]] # If the above check failed, post a comment on the PR explaining the failure @@ -23,13 +23,22 @@ jobs: uses: mshick/add-pr-comment@v1 with: message: | + ## This PR is against the `master` branch :x: + + * Do not close this PR + * Click _Edit_ and change the `base` to `dev` + * This CI test will remain failed until you push a new commit + + --- + Hi @${{ github.event.pull_request.user.login }}, - It looks like this pull-request is has been made against the ${{github.event.pull_request.head.repo.full_name}} `master` branch. + It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch. The `master` branch on nf-core repositories should always contain code from the latest release. - Because of this, PRs to `master` are only allowed if they come from the ${{github.event.pull_request.head.repo.full_name}} `dev` branch. + Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch. You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page. + Note that even after this, the test will continue to show as failing until you push a new commit. Thanks again for your contribution! repo-token: ${{ secrets.GITHUB_TOKEN }} diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index a8a8ba5..0f0db1a 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -8,6 +8,9 @@ on: release: types: [published] +# Uncomment if we need an edge release of Nextflow again +# env: NXF_EDGE: 1 + jobs: test: name: Run workflow tests @@ -20,33 +23,35 @@ jobs: strategy: matrix: # Nextflow versions: check pipeline minimum and current latest - nxf_ver: ['19.10.0', ''] + nxf_ver: ['20.04.0', ''] steps: - name: Check out pipeline code uses: actions/checkout@v2 - name: Check if Dockerfile or Conda environment changed - uses: technote-space/get-diff-action@v1 + uses: technote-space/get-diff-action@v4 with: - PREFIX_FILTER: | + FILES: | Dockerfile environment.yml - name: Build new docker image - if: env.GIT_DIFF - run: docker build --no-cache . -t nfcore/hic:1.2.2 + if: env.MATCHED_FILES + run: docker build --no-cache . -t nfcore/hic:1.3.0 - name: Pull docker image - if: ${{ !env.GIT_DIFF }} + if: ${{ !env.MATCHED_FILES }} run: | docker pull nfcore/hic:dev - docker tag nfcore/hic:dev nfcore/hic:1.2.2 + docker tag nfcore/hic:dev nfcore/hic:1.3.0 - name: Install Nextflow + env: + CAPSULE_LOG: none run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ - name: Run pipeline with test data run: | - nextflow run ${GITHUB_WORKSPACE} -profile test,docker + nextflow run ${GITHUB_WORKSPACE} -profile test,docker \ No newline at end of file diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml index 7a41b0e..fcde400 100644 --- a/.github/workflows/linting.yml +++ b/.github/workflows/linting.yml @@ -19,6 +19,34 @@ jobs: run: npm install -g markdownlint-cli - name: Run Markdownlint run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml + + # If the above check failed, post a comment on the PR explaining the failure + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + ## Markdown linting is failing + + To keep the code consistent with lots of contributors, we run automated code consistency checks. + To fix this CI test, please run: + + * Install `markdownlint-cli` + * On Mac: `brew install markdownlint-cli` + * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`) + * Fix the markdown errors + * Automatically: `markdownlint . --config .github/markdownlint.yml --fix` + * Manually resolve anything left from `markdownlint . --config .github/markdownlint.yml` + + Once you push these changes the test should pass, and you can hide this comment :+1: + + We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false + + YAML: runs-on: ubuntu-latest steps: @@ -29,13 +57,44 @@ jobs: - name: Install yaml-lint run: npm install -g yaml-lint - name: Run yaml-lint - run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml") + run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml" -o -name "*.yaml") + + # If the above check failed, post a comment on the PR explaining the failure + - name: Post PR comment + if: failure() + uses: mshick/add-pr-comment@v1 + with: + message: | + ## YAML linting is failing + + To keep the code consistent with lots of contributors, we run automated code consistency checks. + To fix this CI test, please run: + + * Install `yaml-lint` + * [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`) + * Fix the markdown errors + * Run the test locally: `yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")` + * Fix any reported errors in your YAML files + + Once you push these changes the test should pass, and you can hide this comment :+1: + + We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help! + + Thanks again for your contribution! + repo-token: ${{ secrets.GITHUB_TOKEN }} + allow-repeats: false + + nf-core: runs-on: ubuntu-latest steps: + - name: Check out pipeline code uses: actions/checkout@v2 + - name: Install Nextflow + env: + CAPSULE_LOG: none run: | wget -qO- get.nextflow.io | bash sudo mv nextflow /usr/local/bin/ @@ -55,11 +114,19 @@ jobs: GITHUB_COMMENTS_URL: ${{ github.event.pull_request.comments_url }} GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} GITHUB_PR_COMMIT: ${{ github.event.pull_request.head.sha }} - run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} + run: nf-core -l lint_log.txt lint ${GITHUB_WORKSPACE} --markdown lint_results.md + + - name: Save PR number + if: ${{ always() }} + run: echo ${{ github.event.pull_request.number }} > PR_number.txt - name: Upload linting log file artifact if: ${{ always() }} uses: actions/upload-artifact@v2 with: - name: linting-log-file - path: lint_log.txt + name: linting-logs + path: | + lint_log.txt + lint_results.md + PR_number.txt + diff --git a/.github/workflows/linting_comment.yml b/.github/workflows/linting_comment.yml new file mode 100644 index 0000000..90f03c6 --- /dev/null +++ b/.github/workflows/linting_comment.yml @@ -0,0 +1,29 @@ + +name: nf-core linting comment +# This workflow is triggered after the linting action is complete +# It posts an automated comment to the PR, even if the PR is coming from a fork + +on: + workflow_run: + workflows: ["nf-core linting"] + +jobs: + test: + runs-on: ubuntu-latest + steps: + - name: Download lint results + uses: dawidd6/action-download-artifact@v2 + with: + workflow: linting.yml + + - name: Get PR number + id: pr_number + run: echo "::set-output name=pr_number::$(cat linting-logs/PR_number.txt)" + + - name: Post PR comment + uses: marocchino/sticky-pull-request-comment@v2 + with: + GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} + number: ${{ steps.pr_number.outputs.pr_number }} + path: linting-logs/lint_results.md + diff --git a/.github/workflows/push_dockerhub_dev.yml b/.github/workflows/push_dockerhub_dev.yml new file mode 100644 index 0000000..d6fc716 --- /dev/null +++ b/.github/workflows/push_dockerhub_dev.yml @@ -0,0 +1,28 @@ +name: nf-core Docker push (dev) +# This builds the docker image and pushes it to DockerHub +# Runs on nf-core repo releases and push event to 'dev' branch (PR merges) +on: + push: + branches: + - dev + +jobs: + push_dockerhub: + name: Push new Docker image to Docker Hub (dev) + runs-on: ubuntu-latest + # Only run for the nf-core repo, for releases and merged PRs + if: ${{ github.repository == 'nf-core/hic' }} + env: + DOCKERHUB_USERNAME: ${{ secrets.DOCKERHUB_USERNAME }} + DOCKERHUB_PASS: ${{ secrets.DOCKERHUB_PASS }} + steps: + - name: Check out pipeline code + uses: actions/checkout@v2 + + - name: Build new docker image + run: docker build --no-cache . -t nfcore/hic:dev + + - name: Push Docker image to DockerHub (dev) + run: | + echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin + docker push nfcore/hic:dev diff --git a/.github/workflows/push_dockerhub.yml b/.github/workflows/push_dockerhub_release.yml similarity index 68% rename from .github/workflows/push_dockerhub.yml rename to .github/workflows/push_dockerhub_release.yml index 280f8ba..eda09cc 100644 --- a/.github/workflows/push_dockerhub.yml +++ b/.github/workflows/push_dockerhub_release.yml @@ -1,16 +1,13 @@ -name: nf-core Docker push +name: nf-core Docker push (release) # This builds the docker image and pushes it to DockerHub # Runs on nf-core repo releases and push event to 'dev' branch (PR merges) on: - push: - branches: - - dev release: types: [published] jobs: push_dockerhub: - name: Push new Docker image to Docker Hub + name: Push new Docker image to Docker Hub (release) runs-on: ubuntu-latest # Only run for the nf-core repo, for releases and merged PRs if: ${{ github.repository == 'nf-core/hic' }} @@ -24,15 +21,7 @@ jobs: - name: Build new docker image run: docker build --no-cache . -t nfcore/hic:latest - - name: Push Docker image to DockerHub (dev) - if: ${{ github.event_name == 'push' }} - run: | - echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin - docker tag nfcore/hic:latest nfcore/hic:dev - docker push nfcore/hic:dev - - name: Push Docker image to DockerHub (release) - if: ${{ github.event_name == 'release' }} run: | echo "$DOCKERHUB_PASS" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin docker push nfcore/hic:latest diff --git a/.nf-core-lint.yml b/.nf-core-lint.yml new file mode 100644 index 0000000..a24fddf --- /dev/null +++ b/.nf-core-lint.yml @@ -0,0 +1,3 @@ +files_unchanged: + - .github/ISSUE_TEMPLATE/bug_report.md + - .github/PULL_REQUEST_TEMPLATE.md diff --git a/CHANGELOG.md b/CHANGELOG.md index 719ea08..0a2d45f 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,32 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). +## v1.3.0 - 2021-22-05 + +* Change the `/tmp/` folder to `./tmp/` folder so that all tmp files are now in the work directory (#24) +* Add `--hicpro_maps` options to generate the raw and normalized HiC-Pro maps. The default is now to use cooler +* Add chromosome compartments calling with cooltools (#53) +* Add HiCExplorer distance decay quality control (#54) +* Add HiCExplorer TADs calling (#55) +* Add insulation score TADs calling (#55) +* Generate cooler/txt contact maps +* Normalize Hi-C data with cooler instead of iced +* New `--digestion` parameter to automatically set the restriction_site and ligation_site motifs +* New `--keep_multi` and `keep_dup` options. Default: false +* Template update for nf-core/tools +* Minor fix to summary log messages in pipeline header + +### `Fixed` + +* Fix bug in stats report which were not all correcly exported in the results folder +* Fix recurrent bug in input file extension (#86) +* Fix bug in `--bin_size` parameter (#85) +* `--min_mapq` is ignored if `--keep_multi` is used + +### `Deprecated` + +* `--rm_dup` and `--rm_multi` are replaced by `--keep_dups` and `--keep_multi` + ## v1.2.2 - 2020-09-02 ### `Added` diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md index 9d68eed..f4fd052 100644 --- a/CODE_OF_CONDUCT.md +++ b/CODE_OF_CONDUCT.md @@ -1,76 +1,111 @@ -# Contributor Covenant Code of Conduct +# Code of Conduct at nf-core (v1.0) ## Our Pledge -In the interest of fostering an open and welcoming environment, we as -contributors and maintainers pledge to making participation in our project -and our community a harassment-free experience for everyone, regardless of -age, body size, disability, ethnicity, gender identity and expression, level -of experience, nationality, personal appearance, race, religion, or sexual -identity and orientation. +In the interest of fostering an open, collaborative, and welcoming environment, we as contributors and maintainers of nf-core, pledge to making participation in our projects and community a harassment-free experience for everyone, regardless of: -## Our Standards +- Age +- Body size +- Familial status +- Gender identity and expression +- Geographical location +- Level of experience +- Nationality and national origins +- Native language +- Physical and neurological ability +- Race or ethnicity +- Religion +- Sexual identity and orientation +- Socioeconomic status -Examples of behavior that contributes to creating a positive environment -include: +Please note that the list above is alphabetised and is therefore not ranked in any order of preference or importance. -* Using welcoming and inclusive language -* Being respectful of differing viewpoints and experiences -* Gracefully accepting constructive criticism -* Focusing on what is best for the community -* Showing empathy towards other community members +## Preamble -Examples of unacceptable behavior by participants include: +> Note: This Code of Conduct (CoC) has been drafted by the nf-core Safety Officer and been edited after input from members of the nf-core team and others. "We", in this document, refers to the Safety Officer and members of the nf-core core team, both of whom are deemed to be members of the nf-core community and are therefore required to abide by this Code of Conduct. This document will amended periodically to keep it up-to-date, and in case of any dispute, the most current version will apply. -* The use of sexualized language or imagery and unwelcome sexual attention -or advances -* Trolling, insulting/derogatory comments, and personal or political attacks -* Public or private harassment -* Publishing others' private information, such as a physical or electronic -address, without explicit permission -* Other conduct which could reasonably be considered inappropriate in a -professional setting +An up-to-date list of members of the nf-core core team can be found [here](https://nf-co.re/about). Our current safety officer is Renuka Kudva. + +nf-core is a young and growing community that welcomes contributions from anyone with a shared vision for [Open Science Policies](https://www.fosteropenscience.eu/taxonomy/term/8). Open science policies encompass inclusive behaviours and we strive to build and maintain a safe and inclusive environment for all individuals. + +We have therefore adopted this code of conduct (CoC), which we require all members of our community and attendees in nf-core events to adhere to in all our workspaces at all times. Workspaces include but are not limited to Slack, meetings on Zoom, Jitsi, YouTube live etc. + +Our CoC will be strictly enforced and the nf-core team reserve the right to exclude participants who do not comply with our guidelines from our workspaces and future nf-core activities. + +We ask all members of our community to help maintain a supportive and productive workspace and to avoid behaviours that can make individuals feel unsafe or unwelcome. Please help us maintain and uphold this CoC. + +Questions, concerns or ideas on what we can include? Contact safety [at] nf-co [dot] re ## Our Responsibilities -Project maintainers are responsible for clarifying the standards of acceptable -behavior and are expected to take appropriate and fair corrective action in -response to any instances of unacceptable behavior. +The safety officer is responsible for clarifying the standards of acceptable behavior and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behaviour. + +The safety officer in consultation with the nf-core core team have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any contributor for other behaviors that they deem inappropriate, threatening, offensive, or harmful. + +Members of the core team or the safety officer who violate the CoC will be required to recuse themselves pending investigation. They will not have access to any reports of the violations and be subject to the same actions as others in violation of the CoC. + +## When are where does this Code of Conduct apply? + +Participation in the nf-core community is contingent on following these guidelines in all our workspaces and events. This includes but is not limited to the following listed alphabetically and therefore in no order of preference: + +- Communicating with an official project email address. +- Communicating with community members within the nf-core Slack channel. +- Participating in hackathons organised by nf-core (both online and in-person events). +- Participating in collaborative work on GitHub, Google Suite, community calls, mentorship meetings, email correspondence. +- Participating in workshops, training, and seminar series organised by nf-core (both online and in-person events). This applies to events hosted on web-based platforms such as Zoom, Jitsi, YouTube live etc. +- Representing nf-core on social media. This includes both official and personal accounts. + +## nf-core cares šŸ˜Š + +nf-core's CoC and expectations of respectful behaviours for all participants (including organisers and the nf-core team) include but are not limited to the following (listed in alphabetical order): + +- Ask for consent before sharing another community memberā€™s personal information (including photographs) on social media. +- Be respectful of differing viewpoints and experiences. We are all here to learn from one another and a difference in opinion can present a good learning opportunity. +- Celebrate your accomplishments at events! (Get creative with your use of emojis šŸŽ‰ šŸ„³ šŸ’Æ šŸ™Œ !) +- Demonstrate empathy towards other community members. (We donā€™t all have the same amount of time to dedicate to nf-core. If tasks are pending, donā€™t hesitate to gently remind members of your team. If you are leading a task, ask for help if you feel overwhelmed.) +- Engage with and enquire after others. (This is especially important given the geographically remote nature of the nf-core community, so letā€™s do this the best we can) +- Focus on what is best for the team and the community. (When in doubt, ask) +- Graciously accept constructive criticism, yet be unafraid to question, deliberate, and learn. +- Introduce yourself to members of the community. (Weā€™ve all been outsiders and we know that talking to strangers can be hard for some, but remember weā€™re interested in getting to know you and your visions for open science!) +- Show appreciation and **provide clear feedback**. (This is especially important because we donā€™t see each other in person and it can be harder to interpret subtleties. Also remember that not everyone understands a certain language to the same extent as you do, so **be clear in your communications to be kind.**) +- Take breaks when you feel like you need them. +- Using welcoming and inclusive language. (Participants are encouraged to display their chosen pronouns on Zoom or in communication on Slack.) + +## nf-core frowns on šŸ˜• + +The following behaviours from any participants within the nf-core community (including the organisers) will be considered unacceptable under this code of conduct. Engaging or advocating for any of the following could result in expulsion from nf-core workspaces. + +- Deliberate intimidation, stalking or following and sustained disruption of communication among participants of the community. This includes hijacking shared screens through actions such as using the annotate tool in conferencing software such as Zoom. +- ā€œDoxingā€ i.e. posting (or threatening to post) another personā€™s personal identifying information online. +- Spamming or trolling of individuals on social media. +- Use of sexual or discriminatory imagery, comments, or jokes and unwelcome sexual attention. +- Verbal and text comments that reinforce social structures of domination related to gender, gender identity and expression, sexual orientation, ability, physical appearance, body size, race, age, religion or work experience. + +### Online Trolling + +The majority of nf-core interactions and events are held online. Unfortunately, holding events online comes with the added issue of online trolling. This is unacceptable, reports of such behaviour will be taken very seriously, and perpetrators will be excluded from activities immediately. + +All community members are required to ask members of the group they are working within for explicit consent prior to taking screenshots of individuals during video calls. + +## Procedures for Reporting CoC violations -Project maintainers have the right and responsibility to remove, edit, or -reject comments, commits, code, wiki edits, issues, and other contributions -that are not aligned to this Code of Conduct, or to ban temporarily or -permanently any contributor for other behaviors that they deem inappropriate, -threatening, offensive, or harmful. +If someone makes you feel uncomfortable through their behaviours or actions, report it as soon as possible. -## Scope +You can reach out to members of the [nf-core core team](https://nf-co.re/about) and they will forward your concerns to the safety officer(s). -This Code of Conduct applies both within project spaces and in public spaces -when an individual is representing the project or its community. Examples of -representing a project or community include using an official project e-mail -address, posting via an official social media account, or acting as an -appointed representative at an online or offline event. Representation of a -project may be further defined and clarified by project maintainers. +Issues directly concerning members of the core team will be dealt with by other members of the core team and the safety manager, and possible conflicts of interest will be taken into account. nf-core is also in discussions about having an ombudsperson, and details will be shared in due course. -## Enforcement +All reports will be handled with utmost discretion and confidentially. -Instances of abusive, harassing, or otherwise unacceptable behavior may be -reported by contacting the project team on -[Slack](https://nf-co.re/join/slack). The project team will review -and investigate all complaints, and will respond in a way that it deems -appropriate to the circumstances. The project team is obligated to maintain -confidentiality with regard to the reporter of an incident. Further details -of specific enforcement policies may be posted separately. +## Attribution and Acknowledgements -Project maintainers who do not follow or enforce the Code of Conduct in good -faith may face temporary or permanent repercussions as determined by other -members of the project's leadership. +- The [Contributor Covenant, version 1.4](http://contributor-covenant.org/version/1/4) +- The [OpenCon 2017 Code of Conduct](http://www.opencon2017.org/code_of_conduct) (CC BY 4.0 OpenCon organisers, SPARC and Right to Research Coalition) +- The [eLife innovation sprint 2020 Code of Conduct](https://sprint.elifesciences.org/code-of-conduct/) +- The [Mozilla Community Participation Guidelines v3.1](https://www.mozilla.org/en-US/about/governance/policies/participation/) (version 3.1, CC BY-SA 3.0 Mozilla) -## Attribution +## Changelog -This Code of Conduct is adapted from the [Contributor Covenant][homepage], -version 1.4, available at -[https://www.contributor-covenant.org/version/1/4/code-of-conduct/][version] +### v1.0 - March 12th, 2021 -[homepage]: https://contributor-covenant.org -[version]: https://www.contributor-covenant.org/version/1/4/code-of-conduct/ +- Complete rewrite from original [Contributor Covenant](http://contributor-covenant.org/) CoC. diff --git a/Dockerfile b/Dockerfile index 3c8d019..05547b6 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,19 +1,19 @@ -FROM nfcore/base:1.10.2 - +FROM nfcore/base:1.14 LABEL authors="Nicolas Servant" \ description="Docker image containing all software requirements for the nf-core/hic pipeline" ## Install gcc for pip iced install RUN apt-get update && apt-get install -y gcc g++ && apt-get clean -y +# Install the conda environment COPY environment.yml / RUN conda env create --quiet -f /environment.yml && conda clean -a # Add conda installation dir to PATH (instead of doing 'conda activate') -ENV PATH /opt/conda/envs/nf-core-hic-1.2.2/bin:$PATH +ENV PATH /opt/conda/envs/nf-core-hic-1.3.0/bin:$PATH # Dump the details of the installed packages to a file for posterity -RUN conda env export --name nf-core-hic-1.2.2 > nf-core-hic-1.2.2.yml +RUN conda env export --name nf-core-hic-1.3.0 > nf-core-hic-1.3.0.yml # Instruct R processes to use these empty files instead of clashing with a local version RUN touch .Rprofile diff --git a/README.md b/README.md index 4ffcce8..cb88454 100644 --- a/README.md +++ b/README.md @@ -1,10 +1,10 @@ -# ![nf-core/hic](docs/images/nfcore-hic_logo.png) +# ![nf-core/hic](docs/images/nf-core-hic_logo.png) **Analysis of Chromosome Conformation Capture data (Hi-C)**. [![GitHub Actions CI Status](https://github.com/nf-core/hic/workflows/nf-core%20CI/badge.svg)](https://github.com/nf-core/hic/actions) [![GitHub Actions Linting Status](https://github.com/nf-core/hic/workflows/nf-core%20linting/badge.svg)](https://github.com/nf-core/hic/actions) -[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A519.10.0-brightgreen.svg)](https://www.nextflow.io/) +[![Nextflow](https://img.shields.io/badge/nextflow-%E2%89%A520.04.0-brightgreen.svg)](https://www.nextflow.io/) [![install with bioconda](https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg)](https://bioconda.github.io/) [![Docker](https://img.shields.io/docker/automated/nfcore/hic.svg)](https://hub.docker.com/r/nfcore/hic) @@ -14,7 +14,7 @@ ## Introduction -This pipeline is based on the +This pipeline was originally set up from the [HiC-Pro workflow](https://github.com/nservant/HiC-Pro). It was designed to process Hi-C data from raw FastQ files (paired-end Illumina data) to normalized contact maps. @@ -24,6 +24,10 @@ In practice, this workflow was successfully applied to many data-sets including dilution Hi-C, in situ Hi-C, DNase Hi-C, Micro-C, capture-C, capture Hi-C or HiChip data. +Contact maps are generated in standard formats including HiC-Pro, and cooler for +downstream analysis and visualization. +Addition analysis steps such as compartments and TADs calling are also available. + The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker / singularity containers making installation trivial and @@ -31,65 +35,49 @@ results highly reproducible. ## Pipeline summary -1. Mapping using a two steps strategy to rescue reads spanning the ligation -sites (bowtie2) -2. Detection of valid interaction products -3. Duplicates removal -4. Create genome-wide contact maps at various resolution -5. Contact maps normalization using the ICE algorithm (iced) -6. Quality controls and report (MultiQC) -7. Addition export for visualisation and downstream analysis (cooler) +1. HiC-Pro data processing ([`HiC-Pro`](https://github.com/nservant/HiC-Pro)) + 1. Mapping using a two steps strategy to rescue reads spanning the ligation + sites ([`bowtie2`](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml)) + 2. Detection of valid interaction products + 3. Duplicates removal + 4. Generate raw and normalized contact maps ([`iced`](https://github.com/hiclib/iced)) +2. Create genome-wide contact maps at various resolutions ([`cooler`](https://github.com/open2c/cooler)) +3. Contact maps normalization using balancing algorithm ([`cooler`](https://github.com/open2c/cooler)) +4. Export to various contact maps formats ([`HiC-Pro`](https://github.com/nservant/HiC-Pro), [`cooler`](https://github.com/open2c/cooler)) +5. Quality controls ([`HiC-Pro`](https://github.com/nservant/HiC-Pro), [`HiCExplorer`](https://github.com/deeptools/HiCExplorer)) +6. Compartments calling ([`cooltools`](https://cooltools.readthedocs.io/en/latest/)) +7. TADs calling ([`HiCExplorer`](https://github.com/deeptools/HiCExplorer), [`cooltools`](https://cooltools.readthedocs.io/en/latest/)) +8. Quality control report ([`MultiQC`](https://multiqc.info/)) ## Quick Start -i. Install [`nextflow`](https://nf-co.re/usage/installation) - -ii. Install either [`Docker`](https://docs.docker.com/engine/installation/) -or [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/) -for full pipeline reproducibility (please only use [`Conda`](https://conda.io/miniconda.html) -as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles)) +1. Install [`nextflow`](https://nf-co.re/usage/installation) (`>=20.04.0`) -iii. Download the pipeline and test it on a minimal dataset with a single command +2. Install any of [`Docker`](https://docs.docker.com/engine/installation/), [`Singularity`](https://www.sylabs.io/guides/3.0/user-guide/), [`Podman`](https://podman.io/), [`Shifter`](https://nersc.gitlab.io/development/shifter/how-to-use/) or [`Charliecloud`](https://hpc.github.io/charliecloud/) for full pipeline reproducibility _(please only use [`Conda`](https://conda.io/miniconda.html) as a last resort; see [docs](https://nf-co.re/usage/configuration#basic-configuration-profiles))_ -```bash -nextflow run nf-core/hic -profile test, -``` +3. Download the pipeline and test it on a minimal dataset with a single command -> Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) -to see if a custom config file to run nf-core pipelines already exists for your Institute. -If so, you can simply use `-profile ` in your command. -This will enable either `docker` or `singularity` and set the appropriate execution -settings for your local compute environment. + ```bash + nextflow run nf-core/hic -profile test, + ``` -iv. Start running your own analysis! + > Please check [nf-core/configs](https://github.com/nf-core/configs#documentation) + to see if a custom config file to run nf-core pipelines already exists for your Institute. + If so, you can simply use `-profile ` in your command. + This will enable either `docker` or `singularity` and set the appropriate execution + settings for your local compute environment. -```bash -nextflow run nf-core/hic -profile --reads '*_R{1,2}.fastq.gz' --genome GRCh37 -``` +4. Start running your own analysis! -See [usage docs](https://nf-co.re/hic/usage) for all of the available options when running -the pipeline. + ```bash + nextflow run nf-core/hic -profile --input '*_R{1,2}.fastq.gz' --genome GRCh37 + ``` ## Documentation -The nf-core/hic pipeline comes with documentation about the pipeline, -found in the `docs/` directory: - -1. [Installation](https://nf-co.re/usage/installation) -2. Pipeline configuration - * [Local installation](https://nf-co.re/usage/local_installation) - * [Adding your own system config](https://nf-co.re/usage/adding_own_config) - * [Reference genomes](https://nf-co.re/usage/reference_genomes) -3. [Running the pipeline](docs/usage.md) -4. [Output and how to interpret the results](docs/output.md) -5. [Troubleshooting](https://nf-co.re/usage/troubleshooting) - -The nf-core/hic pipeline comes with documentation about the pipeline which -you can read at [https://nf-co.re/hic/usage](https://nf-co.re/hic/usage) or -find in the [`docs/` directory](docs). +The nf-core/hic pipeline comes with documentation about the pipeline: [usage](https://nf-co.re/hic/usage) and [output](https://nf-co.re/hic/output). -For further information or help, don't hesitate to get in touch on -[Slack](https://nfcore.slack.com/channels/hic). +For further information or help, don't hesitate to get in touch on [Slack](https://nfcore.slack.com/channels/hic). You can join with [this invite](https://nf-co.re/join/slack). ## Credits @@ -100,9 +88,7 @@ nf-core/hic was originally written by Nicolas Servant. If you would like to contribute to this pipeline, please see the [contributing guidelines](.github/CONTRIBUTING.md). -For further information or help, don't hesitate to get in touch on the -[Slack `#hic` channel](https://nfcore.slack.com/channels/hic) -(you can join with [this invite](https://nf-co.re/join/slack)). +For further information or help, don't hesitate to get in touch on the [Slack `#hic` channel](https://nfcore.slack.com/channels/hic) (you can join with [this invite](https://nf-co.re/join/slack)). ## Citation @@ -113,8 +99,14 @@ You can cite the `nf-core` publication as follows: > **The nf-core framework for community-curated bioinformatics pipelines.** > -> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, -Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. +> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > > _Nat Biotechnol._ 2020 Feb 13. doi: [10.1038/s41587-020-0439-x](https://dx.doi.org/10.1038/s41587-020-0439-x). -> ReadCube: [Full Access Link](https://rdcu.be/b1GjZ) + +In addition, references of tools and data used in this pipeline are as follows: + +> **HiC-Pro: An optimized and flexible pipeline for Hi-C processing.** +> +> Nicolas Servant, Nelle Varoquaux, Bryan R. Lajoie, Eric Viara, Chongjian Chen, Jean-Philippe Vert, Job Dekker, Edith Heard, Emmanuel Barillot. +> +> Genome Biology 2015, 16:259 doi: [10.1186/s13059-015-0831-x](https://dx.doi.org/10.1186/s13059-015-0831-x) diff --git a/assets/email_template.html b/assets/email_template.html index 177bccd..d207f01 100644 --- a/assets/email_template.html +++ b/assets/email_template.html @@ -1,6 +1,5 @@ - diff --git a/assets/nf-core-hic_logo.png b/assets/nf-core-hic_logo.png index 6b36416..37461d9 100644 Binary files a/assets/nf-core-hic_logo.png and b/assets/nf-core-hic_logo.png differ diff --git a/assets/nf-core-hic_social_preview.png b/assets/nf-core-hic_social_preview.png deleted file mode 100644 index 54784f0..0000000 Binary files a/assets/nf-core-hic_social_preview.png and /dev/null differ diff --git a/assets/nf-core-hic_social_preview.svg b/assets/nf-core-hic_social_preview.svg deleted file mode 100644 index bc2e2a3..0000000 --- a/assets/nf-core-hic_social_preview.svg +++ /dev/null @@ -1,448 +0,0 @@ - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - image/svg+xml - - - - - - - Analysis of Chromosome Conformation Capture data (Hi-C) - hic - - - - - - - - - - - - - - - - - - - - - - - - - diff --git a/assets/sendmail_template.txt b/assets/sendmail_template.txt index 9afc480..bdf9058 100644 --- a/assets/sendmail_template.txt +++ b/assets/sendmail_template.txt @@ -14,7 +14,7 @@ Content-Transfer-Encoding: base64 Content-ID: Content-Disposition: inline; filename="nf-core-hic_logo.png" -<% out << new File("$baseDir/assets/nf-core-hic_logo.png"). +<% out << new File("$projectDir/assets/nf-core-hic_logo.png"). bytes. encodeBase64(). toString(). diff --git a/bin/mergeSAM.py b/bin/mergeSAM.py index 12917b1..a907fd7 100755 --- a/bin/mergeSAM.py +++ b/bin/mergeSAM.py @@ -52,16 +52,16 @@ def get_args(): def is_unique_bowtie2(read): - ret = False - if not read.is_unmapped and read.has_tag('AS'): - if read.has_tag('XS'): - primary = read.get_tag('AS') - secondary = read.get_tag('XS') - if (primary > secondary): - ret = True - else: - ret = True - return ret + ret = False + if not read.is_unmapped and read.has_tag('AS'): + if read.has_tag('XS'): + primary = read.get_tag('AS') + secondary = read.get_tag('XS') + if (primary > secondary): + ret = True + else: + ret = True + return ret ## Remove everything after "/" or " " in read's name def get_read_name(read): @@ -71,249 +71,239 @@ def get_read_name(read): def sam_flag(read1, read2, hr1, hr2): - f1 = read1.flag - f2 = read2.flag - - if r1.is_unmapped == False: - r1_chrom = hr1.get_reference_name(r1.reference_id) - else: - r1_chrom = "*" - if r2.is_unmapped == False: - r2_chrom = hr2.get_reference_name(r2.reference_id) - else: - r2_chrom="*" - - - ##Relevant bitwise flags (flag in an 11-bit binary number) - ##1 The read is one of a pair - ##2 The alignment is one end of a proper paired-end alignment - ##4 The read has no reported alignments - ##8 The read is one of a pair and has no reported alignments - ##16 The alignment is to the reverse reference strand - ##32 The other mate in the paired-end alignment is aligned to the reverse reference strand - ##64 The read is the first (#1) mate in a pair - ##128 The read is the second (#2) mate in a pair + f1 = read1.flag + f2 = read2.flag + + if r1.is_unmapped == False: + r1_chrom = hr1.get_reference_name(r1.reference_id) + else: + r1_chrom = "*" + if r2.is_unmapped == False: + r2_chrom = hr2.get_reference_name(r2.reference_id) + else: + r2_chrom="*" + + ##Relevant bitwise flags (flag in an 11-bit binary number) + ##1 The read is one of a pair + ##2 The alignment is one end of a proper paired-end alignment + ##4 The read has no reported alignments + ##8 The read is one of a pair and has no reported alignments + ##16 The alignment is to the reverse reference strand + ##32 The other mate in the paired-end alignment is aligned to the reverse reference strand + ##64 The read is the first (#1) mate in a pair + ##128 The read is the second (#2) mate in a pair - ##The reads were mapped as single-end data, so should expect flags of - ##0 (map to the '+' strand) or 16 (map to the '-' strand) - ##Output example: a paired-end read that aligns to the reverse strand - ##and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1) + ##The reads were mapped as single-end data, so should expect flags of + ##0 (map to the '+' strand) or 16 (map to the '-' strand) + ##Output example: a paired-end read that aligns to the reverse strand + ##and is the first mate in the pair will have flag 83 (= 64 + 16 + 2 + 1) - if f1 & 0x4: - f1 = f1 | 0x8 + if f1 & 0x4: + f1 = f1 | 0x8 - if f2 & 0x4: - f2 = f2 | 0x8 + if f2 & 0x4: + f2 = f2 | 0x8 - if (not (f1 & 0x4) and not (f2 & 0x4)): - ##The flag should now indicate this is paired-end data - f1 = f1 | 0x1 - f1 = f1 | 0x2 - f2 = f2 | 0x1 - f2 = f2 | 0x2 - + if (not (f1 & 0x4) and not (f2 & 0x4)): + ##The flag should now indicate this is paired-end data + f1 = f1 | 0x1 + f1 = f1 | 0x2 + f2 = f2 | 0x1 + f2 = f2 | 0x2 - ##Indicate if the pair is on the reverse strand - if f1 & 0x10: - f2 = f2 | 0x20 + ##Indicate if the pair is on the reverse strand + if f1 & 0x10: + f2 = f2 | 0x20 - if f2 & 0x10: - f1 = f1 | 0x20 + if f2 & 0x10: + f1 = f1 | 0x20 - ##Is this first or the second pair? - f1 = f1 | 0x40 - f2 = f2 | 0x80 + ##Is this first or the second pair? + f1 = f1 | 0x40 + f2 = f2 | 0x80 ##Insert the modified bitwise flags into the reads - read1.flag = f1 - read2.flag = f2 + read1.flag = f1 + read2.flag = f2 - ##Determine the RNEXT and PNEXT values (i.e. the positional values of a read's pair) - #RNEXT - if r1_chrom == r2_chrom: - read1.next_reference_id = r1.reference_id - read2.next_reference_id = r1.reference_id - else: - read1.next_reference_id = r2.reference_id - read2.next_reference_id = r1.reference_id - #PNEXT - read1.next_reference_start = read2.reference_start - read2.next_reference_start = read1.reference_start + ##Determine the RNEXT and PNEXT values (i.e. the positional values of a read's pair) + #RNEXT + if r1_chrom == r2_chrom: + read1.next_reference_id = r1.reference_id + read2.next_reference_id = r1.reference_id + else: + read1.next_reference_id = r2.reference_id + read2.next_reference_id = r1.reference_id + #PNEXT + read1.next_reference_start = read2.reference_start + read2.next_reference_start = read1.reference_start - return(read1, read2) + return(read1, read2) if __name__ == "__main__": ## Read command line arguments - opts = get_args() - inputFile = None - outputFile = None - mapq = None - report_single = False - report_multi = False - verbose = False - stat = False - output = "-" - - if len(opts) == 0: - usage() - sys.exit() - - for opt, arg in opts: - if opt in ("-h", "--help"): - usage() - sys.exit() - elif opt in ("-f", "--forward"): - R1file = arg - elif opt in ("-r", "--reverse"): - R2file = arg - elif opt in ("-o", "--output"): - output = arg - elif opt in ("-q", "--qual"): - mapq = arg - elif opt in ("-s", "--single"): - report_single = True - elif opt in ("-m", "--multi"): - report_multi = True - elif opt in ("-t", "--stat"): - stat = True - elif opt in ("-v", "--verbose"): - verbose = True - else: - assert False, "unhandled option" + opts = get_args() + inputFile = None + outputFile = None + mapq = None + report_single = False + report_multi = False + verbose = False + stat = False + output = "-" + + if len(opts) == 0: + usage() + sys.exit() + + for opt, arg in opts: + if opt in ("-h", "--help"): + usage() + sys.exit() + elif opt in ("-f", "--forward"): + R1file = arg + elif opt in ("-r", "--reverse"): + R2file = arg + elif opt in ("-o", "--output"): + output = arg + elif opt in ("-q", "--qual"): + mapq = arg + elif opt in ("-s", "--single"): + report_single = True + elif opt in ("-m", "--multi"): + report_multi = True + elif opt in ("-t", "--stat"): + stat = True + elif opt in ("-v", "--verbose"): + verbose = True + else: + assert False, "unhandled option" ## Verbose mode - if verbose: - print("## mergeBAM.py") - print("## forward=", R1file) - print("## reverse=", R2file) - print("## output=", output) - print("## min mapq=", mapq) - print("## report_single=", report_single) - print("## report_multi=", report_multi) - print("## verbose=", verbose) + if verbose: + print("## mergeBAM.py") + print("## forward=", R1file) + print("## reverse=", R2file) + print("## output=", output) + print("## min mapq=", mapq) + print("## report_single=", report_single) + print("## report_multi=", report_multi) + print("## verbose=", verbose) ## Initialize variables - tot_pairs_counter = 0 - multi_pairs_counter = 0 - uniq_pairs_counter = 0 - unmapped_pairs_counter = 0 - lowq_pairs_counter = 0 - multi_singles_counter = 0 - uniq_singles_counter = 0 - lowq_singles_counter = 0 + tot_pairs_counter = 0 + multi_pairs_counter = 0 + uniq_pairs_counter = 0 + unmapped_pairs_counter = 0 + lowq_pairs_counter = 0 + multi_singles_counter = 0 + uniq_singles_counter = 0 + lowq_singles_counter = 0 #local_counter = 0 - paired_reads_counter = 0 - singleton_counter = 0 - reads_counter = 0 - r1 = None - r2 = None + paired_reads_counter = 0 + singleton_counter = 0 + reads_counter = 0 + r1 = None + r2 = None ## Reads are 0-based too (for both SAM and BAM format) ## Loop on all reads - if verbose: - print("## Merging forward and reverse tags ...") - with pysam.Samfile(R1file, "rb") as hr1, pysam.Samfile(R2file, "rb") as hr2: - if output == "-": - outfile = pysam.AlignmentFile(output, "w", template=hr1) - else: - outfile = pysam.AlignmentFile(output, "wb", template=hr1) - for r1, r2 in zip(hr1.fetch(until_eof=True), hr2.fetch(until_eof=True)): - reads_counter +=1 - - #print r1 - #print r2 - #print hr1.getrname(r1.tid) - #print hr2.getrname(r2.tid) - - if (reads_counter % 1000000 == 0 and verbose): - print("##", reads_counter) + if verbose: + print("## Merging forward and reverse tags ...") + + with pysam.Samfile(R1file, "rb") as hr1, pysam.Samfile(R2file, "rb") as hr2: + if output == "-": + outfile = pysam.AlignmentFile(output, "w", template=hr1) + else: + outfile = pysam.AlignmentFile(output, "wb", template=hr1) + + for r1, r2 in zip(hr1.fetch(until_eof=True), hr2.fetch(until_eof=True)): + reads_counter +=1 + if (reads_counter % 1000000 == 0 and verbose): + print("##", reads_counter) - if get_read_name(r1) == get_read_name(r2): + if get_read_name(r1) == get_read_name(r2): + ## both unmapped + if r1.is_unmapped == True and r2.is_unmapped == True: + unmapped_pairs_counter += 1 + continue - ## both unmapped - if r1.is_unmapped == True and r2.is_unmapped == True: - unmapped_pairs_counter += 1 - continue - ## both mapped - elif r1.is_unmapped == False and r2.is_unmapped == False: - ## quality - if mapq != None and (r1.mapping_quality < int(mapq) or r2.mapping_quality < int(mapq)): - lowq_pairs_counter += 1 - continue + elif r1.is_unmapped == False and r2.is_unmapped == False: + ## quality + if mapq != None and (r1.mapping_quality < int(mapq) or r2.mapping_quality < int(mapq)): + lowq_pairs_counter += 1 + continue - ## Unique mapping - if is_unique_bowtie2(r1) == True and is_unique_bowtie2(r2) == True: - uniq_pairs_counter += 1 - else: - multi_pairs_counter += 1 - if report_multi == False: - continue - # one end mapped, other is not - else: - singleton_counter += 1 - if report_single == False: - continue - if r1.is_unmapped == False: ## first end is mapped, second is not - ## quality - if mapq != None and (r1.mapping_quality < int(mapq)): - lowq_singles_counter += 1 - continue - ## Unique mapping - if is_unique_bowtie2(r1) == True: - uniq_singles_counter += 1 - else: - multi_singles_counter += 1 - if report_multi == False: - continue - else: ## second end is mapped, first is not - ## quality - if mapq != None and (r2.mapping_quality < int(mapq)): - lowq_singles_counter += 1 - continue - ## Unique mapping - if is_unique_bowtie2(r2) == True: - uniq_singles_counter += 1 - else: - multi_singles_counter += 1 - if report_multi == False: - continue + ## Unique mapping + if is_unique_bowtie2(r1) == True and is_unique_bowtie2(r2) == True: + uniq_pairs_counter += 1 + else: + multi_pairs_counter += 1 + if report_multi == False: + continue + + ## One mate maped + else: + singleton_counter += 1 + if report_single == False: + continue + if r1.is_unmapped == False: ## first end is mapped, second is not + ## quality + if mapq != None and (r1.mapping_quality < int(mapq)): + lowq_singles_counter += 1 + continue + ## Unique mapping + if is_unique_bowtie2(r1) == True: + uniq_singles_counter += 1 + else: + multi_singles_counter += 1 + if report_multi == False: + continue + else: ## second end is mapped, first is not + ## quality + if mapq != None and (r2.mapping_quality < int(mapq)): + lowq_singles_counter += 1 + continue + ## Unique mapping + if is_unique_bowtie2(r2) == True: + uniq_singles_counter += 1 + else: + multi_singles_counter += 1 + if report_multi == False: + continue + + tot_pairs_counter += 1 + (r1, r2) = sam_flag(r1,r2, hr1, hr2) - tot_pairs_counter += 1 - (r1, r2) = sam_flag(r1,r2, hr1, hr2) - - #print hr1.getrname(r1.tid) - #print hr2.getrname(r2.tid) - #print r1 - #print r2 ## Write output - outfile.write(r1) - outfile.write(r2) - - else: - print("Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.") - sys.exit(1) - - if stat: - if output == '-': - statfile = "pairing.stat" - else: - statfile = re.sub('\.bam$', '.pairstat', output) - with open(statfile, 'w') as handle_stat: - handle_stat.write("Total_pairs_processed\t" + str(reads_counter) + "\t" + str(round(float(reads_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Unmapped_pairs\t" + str(unmapped_pairs_counter) + "\t" + str(round(float(unmapped_pairs_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Low_qual_pairs\t" + str(lowq_pairs_counter) + "\t" + str(round(float(lowq_pairs_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Unique_paired_alignments\t" + str(uniq_pairs_counter) + "\t" + str(round(float(uniq_pairs_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Multiple_pairs_alignments\t" + str(multi_pairs_counter) + "\t" + str(round(float(multi_pairs_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Pairs_with_singleton\t" + str(singleton_counter) + "\t" + str(round(float(singleton_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Low_qual_singleton\t" + str(lowq_singles_counter) + "\t" + str(round(float(lowq_singles_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Unique_singleton_alignments\t" + str(uniq_singles_counter) + "\t" + str(round(float(uniq_singles_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Multiple_singleton_alignments\t" + str(multi_singles_counter) + "\t" + str(round(float(multi_singles_counter)/float(reads_counter)*100,3)) + "\n") - handle_stat.write("Reported_pairs\t" + str(tot_pairs_counter) + "\t" + str(round(float(tot_pairs_counter)/float(reads_counter)*100,3)) + "\n") - hr1.close() - hr2.close() - outfile.close() + outfile.write(r1) + outfile.write(r2) + + else: + print("Forward and reverse reads not paired. Check that BAM files have the same read names and are sorted.") + sys.exit(1) + + if stat: + if output == '-': + statfile = "pairing.stat" + else: + statfile = re.sub('\.bam$', '.pairstat', output) + with open(statfile, 'w') as handle_stat: + handle_stat.write("Total_pairs_processed\t" + str(reads_counter) + "\t" + str(round(float(reads_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Unmapped_pairs\t" + str(unmapped_pairs_counter) + "\t" + str(round(float(unmapped_pairs_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Low_qual_pairs\t" + str(lowq_pairs_counter) + "\t" + str(round(float(lowq_pairs_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Unique_paired_alignments\t" + str(uniq_pairs_counter) + "\t" + str(round(float(uniq_pairs_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Multiple_pairs_alignments\t" + str(multi_pairs_counter) + "\t" + str(round(float(multi_pairs_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Pairs_with_singleton\t" + str(singleton_counter) + "\t" + str(round(float(singleton_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Low_qual_singleton\t" + str(lowq_singles_counter) + "\t" + str(round(float(lowq_singles_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Unique_singleton_alignments\t" + str(uniq_singles_counter) + "\t" + str(round(float(uniq_singles_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Multiple_singleton_alignments\t" + str(multi_singles_counter) + "\t" + str(round(float(multi_singles_counter)/float(reads_counter)*100,3)) + "\n") + handle_stat.write("Reported_pairs\t" + str(tot_pairs_counter) + "\t" + str(round(float(tot_pairs_counter)/float(reads_counter)*100,3)) + "\n") + hr1.close() + hr2.close() + outfile.close() diff --git a/bin/scrape_software_versions.py b/bin/scrape_software_versions.py index 9f5650d..5ff3fcf 100755 --- a/bin/scrape_software_versions.py +++ b/bin/scrape_software_versions.py @@ -7,9 +7,9 @@ regexes = { 'nf-core/hic': ['v_pipeline.txt', r"(\S+)"], 'Nextflow': ['v_nextflow.txt', r"(\S+)"], - 'Bowtie2': ['v_bowtie2.txt', r"Bowtie2 v(\S+)"], - 'Python': ['v_python.txt', r"Python v(\S+)"], - 'Samtools': ['v_samtools.txt', r"Samtools v(\S+)"], + 'Bowtie2': ['v_bowtie2.txt', r"bowtie2-align-s version (\S+)"], + 'Python': ['v_python.txt', r"Python (\S+)"], + 'Samtools': ['v_samtools.txt', r"samtools (\S+)"], 'MultiQC': ['v_multiqc.txt', r"multiqc, version (\S+)"], } results = OrderedDict() @@ -36,11 +36,6 @@ if not results[k]: del results[k] -# Remove software set to false in results -for k in results: - if not results[k]: - del(results[k]) - # Dump to YAML print( """ @@ -61,4 +56,3 @@ with open("software_versions.csv", "w") as f: for k, v in results.items(): f.write("{}\t{}\n".format(k, v)) - diff --git a/conf/base.config b/conf/base.config index 157dd95..ddec1a8 100644 --- a/conf/base.config +++ b/conf/base.config @@ -10,7 +10,6 @@ */ process { - // nf-core: Check the defaults for all processes cpus = { check_max( 1 * task.attempt, 'cpus' ) } memory = { check_max( 7.GB * task.attempt, 'memory' ) } time = { check_max( 4.h * task.attempt, 'time' ) } @@ -43,4 +42,5 @@ process { withName:get_software_versions { cache = false } + } diff --git a/conf/hicpro.config b/conf/hicpro.config deleted file mode 100644 index cd0cf0b..0000000 --- a/conf/hicpro.config +++ /dev/null @@ -1,38 +0,0 @@ -/* - * ------------------------------------------------- - * Nextflow config file for Genomes paths - * ------------------------------------------------- - * Defines reference genomes - * Can be used by any config that customises the base - * path using $params.genomes_base / --genomes_base - */ - -params { - - // Alignment options - bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder' - bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder' - min_mapq = 10 - - // Digestion Hi-C - restriction_site = 'A^AGCTT' - ligation_site = 'AAGCTAGCTT' - min_restriction_fragment_size = - max_restriction_fragment_size = - min_insert_size = - max_insert_size = - - // Hi-C Processing - min_cis_dist = - rm_singleton = true - rm_multi = true - rm_dup = true - - bin_size = '1000000,500000' - - ice_max_iter = 100 - ice_filer_low_count_perc = 0.02 - ice_filer_high_count_perc = 0 - ice_eps = 0.1 -} - diff --git a/conf/test.config b/conf/test.config index 2ab8e57..5c5fc84 100644 --- a/conf/test.config +++ b/conf/test.config @@ -8,8 +8,7 @@ */ params { - -config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)' + config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)' config_profile_description = 'Minimal test dataset to check pipeline function' // Limit resources so that this can run on Travis @@ -24,19 +23,19 @@ config_profile_name = 'Hi-C test data from Schalbetter et al. (2017)' // Annotations fasta = 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa' - restriction_site = 'A^AGCTT' - ligation_site = 'AAGCTAGCTT' - - min_mapq = 2 - rm_dup = true - rm_singleton = true - rm_multi = true - + digestion = 'hindiii' + min_mapq = 10 min_restriction_fragment_size = 100 max_restriction_fragment_size = 100000 min_insert_size = 100 max_insert_size = 600 + + bin_size = '1000' + res_dist_decay = '1000' + res_tads = '1000' + tads_caller = 'insulation,hicexplorer' + res_compartments = '1000' - // Options - skip_cool = true + // Ignore `--input` as otherwise the parameter validation will throw an error + schema_ignore_params = 'genomes,digest,input_paths,input' } diff --git a/conf/test_full.config b/conf/test_full.config index 47d3176..1e793cc 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -18,19 +18,19 @@ params { // Annotations fasta = 'https://github.com/nf-core/test-datasets/raw/hic/reference/W303_SGD_2015_JRIU00000000.fsa' - restriction_site = 'A^AGCTT' - ligation_site = 'AAGCTAGCTT' - - min_mapq = 2 - rm_dup = true - rm_singleton = true - rm_multi = true - + digestion = 'hindiii' + min_mapq = 10 min_restriction_fragment_size = 100 max_restriction_fragment_size = 100000 min_insert_size = 100 max_insert_size = 600 + + bin_size = '1000' + res_dist_decay = '1000' + res_tads = '1000' + tads_caller = 'insulation,hicexplorer' + res_compartments = '1000' - // Options - skip_cool = true + // Ignore `--input` as otherwise the parameter validation will throw an error + schema_ignore_params = 'genomes,digest,input_paths,input' } diff --git a/docs/README.md b/docs/README.md index bdbc92a..a688954 100644 --- a/docs/README.md +++ b/docs/README.md @@ -3,11 +3,8 @@ The nf-core/hic documentation is split into the following pages: * [Usage](usage.md) - * An overview of how the pipeline works, how to run it and a - description of all of the different command-line flags. + * An overview of how the pipeline works, how to run it and a description of all of the different command-line flags. * [Output](output.md) - * An overview of the different results produced by the pipeline - and how to interpret them. + * An overview of the different results produced by the pipeline and how to interpret them. -You can find a lot more documentation about installing, configuring -and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) +You can find a lot more documentation about installing, configuring and running nf-core pipelines on the website: [https://nf-co.re](https://nf-co.re) diff --git a/docs/images/nf-core-hic_logo.png b/docs/images/nf-core-hic_logo.png index e5fead3..274eb3d 100644 Binary files a/docs/images/nf-core-hic_logo.png and b/docs/images/nf-core-hic_logo.png differ diff --git a/docs/output.md b/docs/output.md index 95aca42..8b3fd0a 100644 --- a/docs/output.md +++ b/docs/output.md @@ -1,33 +1,38 @@ # nf-core/hic: Output -This document describes the output produced by the pipeline. -Most of the plots are taken from the MultiQC report, which -summarises results at the end of the pipeline. +## Introduction -The directories listed below will be created in the results directory -after the pipeline has finished. All paths are relative to the top-level -results directory. +This document describes the output produced by the pipeline. Most of the plots are taken from the MultiQC report, which summarises results at the end of the pipeline. +The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. ## Pipeline overview The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -* [Reads alignment](#reads-alignment) -* [Valid pairs detection](#valid-pairs-detection) -* [Duplicates removal](#duplicates-removal) -* [Contact maps](#contact-maps) +* [HiC-Pro](#hicpro) + * [Reads alignment](#reads-alignment) + * [Valid pairs detection](#valid-pairs-detection) + * [Duplicates removal](#duplicates-removal) + * [Contact maps](#hicpro-contact-maps) +* [Hi-C contact maps](#hic-contact-maps) +* [Downstream analysis](#downstream-analysis) + * [Distance decay](#distance-decay) + * [Compartments calling](#compartments-calling) + * [TADs calling](#tads-calling) * [MultiQC](#multiqc) - aggregate report and quality controls, describing results of the whole pipeline * [Export](#exprot) - additionnal export for compatibility with downstream analysis tool and visualization +## HiC-Pro + The current version is mainly based on the [HiC-Pro](https://github.com/nservant/HiC-Pro) pipeline. For details about the workflow, see [Servant et al. 2015](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-015-0831-x) -## Reads alignment +### Reads alignment Using Hi-C data, each reads mate has to be independantly aligned on the reference genome. @@ -42,7 +47,7 @@ configuration parameters (`--rm-multi`). Note that if the `--dnase` mode is activated, HiC-Pro will skip the second mapping step. -**Output directory: `results/mapping`** +**Output directory: `results/hicpro/mapping`** * `*bwt2pairs.bam` - final BAM file with aligned paired data * `*.pairstat` - mapping statistics @@ -67,7 +72,7 @@ the fraction of unmapped reads. The fraction of singleton is usually close to the sum of unmapped R1 and R2 reads, as it is unlikely that both mates from the same pair were unmapped. -## Valid pairs detection +### Valid pairs detection with HiC-Pro Each aligned reads can be assigned to one restriction fragment according to the reference genome and the digestion protocol. @@ -95,6 +100,8 @@ DNase Hi-C or micro Hi-C, the assignment to a restriction is not possible Short range interactions that are likely to be spurious ligation products can thus be discarded using the `--min_cis_dist` parameter. +**Output directory: `results/hicpro/valid_pairs`** + * `*.validPairs` - List of valid ligation products * `*.DEpairs` - List of dangling-end products * `*.SCPairs` - List of self-circle products @@ -121,12 +128,14 @@ is skipped. The aligned pairs are therefore directly used to generate the contact maps. A filter of the short range contact (typically <1kb) is recommanded as this pairs are likely to be self ligation products. -## Duplicates removal +### Duplicates removal Note that validPairs file are generated per reads chunck. These files are then merged in the allValidPairs file, and duplicates are removed if the `--rm_dup` parameter is used. +**Output directory: `results/hicpro/valid_pairs`** + * `*allValidPairs` - combined valid pairs from all read chunks * `*mergestat` - statistics about duplicates removal and valid pairs information @@ -140,24 +149,29 @@ Finaly, an important metric is to look at the fraction of intra and inter-chromosomal interactions, as well as long range (>20kb) versus short range (<20kb) intra-chromosomal interactions. -## Contact maps +### Contact maps Intra et inter-chromosomal contact maps are build for all specified resolutions. The genome is splitted into bins of equal size. Each valid interaction is associated with the genomic bins to generate the raw maps. In addition, Hi-C data can contain several sources of biases which has to be corrected. -The current workflow uses the [Ƭced](https://github.com/hiclib/iced) and +The HiC-Pro workflow uses the [Ƭced](https://github.com/hiclib/iced) and [Varoquaux and Servant, 2018](http://joss.theoj.org/papers/10.21105/joss.01286) python package which proposes a fast implementation of the original ICE normalization algorithm (Imakaev et al. 2012), making the assumption of equal visibility of each fragment. +Importantly, the HiC-Pro maps are generated only if the `--hicpro_maps` option +is specified on the command line. + +**Output directory: `results/hicpro/matrix`** + * `*.matrix` - genome-wide contact maps * `*_iced.matrix` - genome-wide iced contact maps -The contact maps are generated for all specified resolution -(see `--bin_size` argument) +The contact maps are generated for all specified resolutions +(see `--bin_size` argument). A contact map is defined by : * A list of genomic intervals related to the specified resolution (BED format). @@ -179,6 +193,58 @@ files. This format is memory efficient, and is compatible with several software for downstream analysis. +## Hi-C contact maps + +Contact maps are usually stored as simple txt (`HiC-Pro`), .hic (`Juicer/Juicebox`) and .(m)cool (`cooler/Higlass`) formats. +Note that .cool and .hic format are compressed and usually much more efficient that the txt format. +In the current workflow, we propose to use the `cooler` format as a standard to build the raw and normalized maps +after valid pairs detection as it is used by several downstream analysis and visualization tools. + +Raw contact maps are therefore in **`results/contact_maps/raw`** which contains the different maps in `txt` and `cool` formats, at various resolutions. +Normalized contact maps are stored in **`results/contact_maps/norm`** which contains the different maps in `txt`, `cool`, and `mcool` format. + +Note that `txt` contact maps generated with `cooler` are identical to those generated by `HiC-Pro`. +However, differences can be observed on the normalized contact maps as the balancing algorithm is not the same. + +## Downstream analysis + +Downstream analysis are performed from `cool` files at specified resolution. + +### Distance decay + +The distance decay plot shows the relationship between contact frequencies and genomic distance. It gives a good indication of the compaction of the genome. +According to the organism, the slope of the curve should fit the expectation of polymer physics models. + +The results generated with the `HiCExplorer hicPlotDistVsCounts` tool (plot and table) are available in the **`results/dist_decay/`** folder. + +### Compartments calling + +Compartments calling is one of the most common analysis which aims at detecting A (open, active) / B (close, inactive) compartments. +In the first studies on the subject, the compartments were called at high/medium resolution (1000000 to 250000) which is enough to call A/B comparments. +Analysis at higher resolution has shown that these two main types of compartments can be further divided into compartments subtypes. + +Although different methods have been proposed for compartment calling, the standard remains the eigen vector decomposition from the normalized correlation maps. +Here, we use the implementation available in the [`cooltools`](https://cooltools.readthedocs.io/en/lates) package. + +Results are available in **`results/compartments/`** folder and includes : + +* `*cis.vecs.tsv`: eigenvectors decomposition along the genome +* `*cis.lam.txt`: eigenvalues associated with the eigenvectors + +### TADs calling + +TADs has been described as functional units of the genome. +While contacts between genes and regulatority elements can occur within a single TADs, contacts between TADs are much less frequent, mainly due to the presence of insulation protein (such as CTCF) at their boundaries. Looking at Hi-C maps, TADs look like triangles around the diagonal. According to the contact map resolutions, TADs appear as hierarchical structures with a median size around 1Mb (in mammals), as well as smaller structures usually called sub-TADs of smaller size. + +TADs calling remains a challenging task, and even if many methods have been proposed in the last decade, little overlap have been found between their results. + +Currently, the pipeline proposes two approaches : + +* Insulation score using the [`cooltools`](https://cooltools.readthedocs.io/en/latest/cli.html#cooltools-diamond-insulation) package. Results are availabe in **`results/tads/insulation`**. +* [`HiCExplorer TADs calling`](https://hicexplorer.readthedocs.io/en/latest/content/tools/hicFindTADs.html). Results are available at **`results/tads/hicexplorer`**. + +Usually, TADs results are presented as simple BED files, or bigWig files, with the position of boundaries along the genome. + ## MultiQC [MultiQC](http://multiqc.info) is a visualisation tool that generates a single @@ -191,12 +257,16 @@ reported in the MultiQC output for future traceability. **Output files:** -* `Project_multiqc_report.html` - * MultiQC report - a standalone HTML file that can be viewed in your -web browser -* `Project_multiqc_data/` - * Directory containing parsed statistics from the different tools used -in the pipeline +* `multiqc/` + * `multiqc_report.html`: a standalone HTML file that can be viewed in your web browser. + * `multiqc_data/`: directory containing parsed statistics from the different tools used in the pipeline. + * `multiqc_plots/`: directory containing static images from the report in various formats. + +## Pipeline information + +[Nextflow](https://www.nextflow.io/docs/latest/tracing.html) provides excellent functionality for generating various reports relevant to the running and execution of the pipeline. This will allow you to troubleshoot errors with the running of the pipeline, and also provide you with other information such as launch commands, run times and resource usage. + +**Output files:** * `pipeline_info/` * Reports generated by Nextflow: `execution_report.html`, `execution_timeline.html`, diff --git a/docs/usage.md b/docs/usage.md index 11b0653..800d447 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -1,5 +1,11 @@ # nf-core/hic: Usage +## :warning: Please read this documentation on the nf-core website: [https://nf-co.re/hic/usage](https://nf-co.re/hic/usage) + +> _Documentation of pipeline parameters is generated automatically from the pipeline schema and can no longer be found in markdown files._ + +## Introduction + ## Running the pipeline The typical command for running the pipeline is as follows: @@ -22,12 +28,7 @@ results # Finished results (configurable, see below) ### Updating the pipeline -When you run the above command, Nextflow automatically pulls the pipeline code -from GitHub and stores it as a cached version. When running the pipeline after -this, it will always use the cached version if available - even if the pipeline -has been updated since. To make sure that you're running the latest version of -the pipeline, make sure that you regularly update the cached version of the -pipeline: +When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: ```bash nextflow pull nf-core/hic @@ -35,17 +36,7 @@ nextflow pull nf-core/hic ### Reproducibility -It's a good idea to specify a pipeline version when running the pipeline on -your data. This ensures that a specific version of the pipeline code and -software are used when you run your pipeline. If you keep using the same tag, -you'll be running the same version of the pipeline, even if there have been -changes to the code since. - -It's a good idea to specify a pipeline version when running the pipeline on -your data. This ensures that a specific version of the pipeline code and -software are used when you run your pipeline. If you keep using the same tag, -you'll be running the same version of the pipeline, even if there have been -changes to the code since. +It's a good idea to specify a pipeline version when running the pipeline on your data. This ensures that a specific version of the pipeline code and software are used when you run your pipeline. If you keep using the same tag, you'll be running the same version of the pipeline, even if there have been changes to the code since. First, go to the [nf-core/hic releases page](https://github.com/nf-core/hic/releases) and find @@ -74,9 +65,7 @@ fails after three times then the pipeline is stopped. Use this parameter to choose a configuration profile. Profiles can give configuration presets for different compute environments. -Several generic profiles are bundled with the pipeline which instruct -the pipeline to use software packaged using different methods -(Docker, Singularity, Conda) - see below. +Several generic profiles are bundled with the pipeline which instruct the pipeline to use software packaged using different methods (Docker, Singularity, Podman, Shifter, Charliecloud, Conda) - see below. > We highly recommend the use of Docker or Singularity containers for full pipeline reproducibility, however when this is not possible, Conda is also supported. @@ -104,9 +93,17 @@ installed and available on the `PATH`. This is _not_ recommended. * `singularity` * A generic configuration profile to be used with [Singularity](https://sylabs.io/docs/) * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) +* `podman` + * A generic configuration profile to be used with [Podman](https://podman.io/) + * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) +* `shifter` + * A generic configuration profile to be used with [Shifter](https://nersc.gitlab.io/development/shifter/how-to-use/) + * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) +* `charliecloud` + * A generic configuration profile to be used with [Charliecloud](https://hpc.github.io/charliecloud/) + * Pulls software from Docker Hub: [`nfcore/hic`](https://hub.docker.com/r/nfcore/hic/) * `conda` - * Please only use Conda as a last resort i.e. when it's not possible to run the - pipeline with Docker or Singularity. + * Please only use Conda as a last resort i.e. when it's not possible to run the pipeline with Docker, Singularity, Podman, Shifter or Charliecloud. * A generic configuration profile to be used with [Conda](https://conda.io/docs/) * Pulls most software from [Bioconda](https://bioconda.github.io/) * `test` @@ -148,18 +145,11 @@ process { } ``` -See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) -for more information. +To find the exact name of a process you wish to modify the compute resources, check the live-status of a nextflow run displayed on your terminal or check the nextflow error for a line like so: `Error executing process > 'bowtie2_end_to_end'`. In this case the name to specify in the custom config file is `bowtie2_end_to_end`. + +See the main [Nextflow documentation](https://www.nextflow.io/docs/latest/config.html) for more information. -If you are likely to be running `nf-core` pipelines regularly it may be a -good idea to request that your custom config file is uploaded to the -`nf-core/configs` git repository. Before you do this please can you test -that the config file works with your pipeline of choice using the `-c` -parameter (see definition below). You can then create a pull request to the -`nf-core/configs` repository with the addition of your config file, associated -documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), -and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) -to include your custom profile. +If you are likely to be running `nf-core` pipelines regularly it may be a good idea to request that your custom config file is uploaded to the `nf-core/configs` git repository. Before you do this please can you test that the config file works with your pipeline of choice using the `-c` parameter (see definition above). You can then create a pull request to the `nf-core/configs` repository with the addition of your config file, associated documentation file (see examples in [`nf-core/configs/docs`](https://github.com/nf-core/configs/tree/master/docs)), and amending [`nfcore_custom.config`](https://github.com/nf-core/configs/blob/master/nfcore_custom.config) to include your custom profile. If you have any questions or issues please send us a message on [Slack](https://nf-co.re/join/slack) on the @@ -190,6 +180,30 @@ We recommend adding the following line to your environment to limit this NXF_OPTS='-Xms1g -Xmx4g' ``` +## Use case + +### Hi-C digestion protocol + +Here is an command line example for standard DpnII digestion protocols. +Alignment will be performed on the `mm10` genome with default parameters. +Multi-hits will not be considered and duplicates will be removed. +Note that by default, no filters are applied on DNA and restriction fragment sizes. + +```bash +nextflow run main.nf --input './*_R{1,2}.fastq.gz' --genome 'mm10' --digestion 'dnpii' +``` + +### DNase Hi-C protocol + +Here is an command line example for DNase protocol. +Alignment will be performed on the `mm10` genome with default paramters. +Multi-hits will not be considered and duplicates will be removed. +Contacts involving fragments separated by less than 1000bp will be discarded. + +```bash +nextflow run main.nf --input './*_R{1,2}.fastq.gz' --genome 'mm10' --dnase --min_cis 1000 +``` + ## Inputs ### `--input` @@ -209,16 +223,7 @@ notation to specify read pairs. If left unspecified, a default pattern is used: `data/*{1,2}.fastq.gz` -By default, the pipeline expects paired-end data. If you have single-end data, -you need to specify `--single_end` on the command line when you launch the pipeline. -A normal glob pattern, enclosed in quotation marks, can then be used for `--input`. -For example: - -```bash ---single_end --reads '*.fastq' -``` - -It is not possible to run a mixture of single-end and paired-end files in one run. +Note that the Hi-C data analysis requires paired-end data. ## Reference genomes @@ -245,13 +250,13 @@ run the pipeline: ### `--bwt2_index` -The bowtie2 indexes are required to run the Hi-C pipeline. If the +The bowtie2 indexes are required to align the data with the HiC-Pro workflow. If the `--bwt2_index` is not specified, the pipeline will either use the igenome bowtie2 indexes (see `--genome` option) or build the indexes on-the-fly (see `--fasta` option) ```bash ---bwt2_index '[path to bowtie2 index (with basename)]' +--bwt2_index '[path to bowtie2 index]' ``` ### `--chromosome_size` @@ -300,15 +305,15 @@ file with coordinates of restriction fragments. If not specified, this file will be automatically created by the pipline. In this case, the `--fasta` reference genome will be used. -Note that the `--restriction_site` parameter is mandatory to create this file. +Note that the `digestion` or `--restriction_site` parameter is mandatory to create this file. ## Hi-C specific options -The following options are defined in the `hicpro.config` file, and can be +The following options are defined in the `nextflow.config` file, and can be updated either using a custom configuration file (see `-c` option) or using command line parameter. -### Reads mapping +### HiC-pro mapping The reads mapping is currently based on the two-steps strategy implemented in the HiC-pro pipeline. The idea is to first align reads from end-to-end. @@ -347,17 +352,28 @@ Minimum mapping quality. Reads with lower quality are discarded. Default: 10 ### Digestion Hi-C +#### `--digestion` + +This parameter allows to automatically set the `--restriction_site` and +`--ligation_site` parameter according to the restriction enzyme you used. +Available keywords are 'hindiii', 'dpnii', 'mboi', 'arima'. + +```bash +--digestion 'hindiii' +``` + #### `--restriction_site` -Restriction motif(s) for Hi-C digestion protocol. The restriction motif(s) -is(are) used to generate the list of restriction fragments. +If the restriction enzyme is not available through the `--digestion` +parameter, you can also defined manually the restriction motif(s) for +Hi-C digestion protocol. +The restriction motif(s) is(are) used to generate the list of restriction fragments. The precise cutting site of the restriction enzyme has to be specified using the '^' character. Default: 'A^AGCTT' Here are a few examples: * MboI: ^GATC * DpnII: ^GATC -* BglII: A^GATCT * HindIII: A^AGCTT * ARIMA kit: ^GATC,G^ANTC @@ -382,10 +398,25 @@ Default: 'AAGCTAGCTT' Exemple of the ARIMA kit: GATCGATC,GANTGATC,GANTANTC,GATCANTC +### DNAse Hi-C + +#### `--dnase` + +In DNAse Hi-C mode, all options related to digestion Hi-C +(see previous section) are ignored. +In this case, it is highly recommanded to use the `--min_cis_dist` parameter +to remove spurious ligation products. + +```bash +--dnase' +``` + +### HiC-pro processing + #### `--min_restriction_fragment_size` Minimum size of restriction fragments to consider for the Hi-C processing. -Default: '' +Default: '0' - no filter ```bash --min_restriction_fragment_size '[numeric]' @@ -394,7 +425,7 @@ Default: '' #### `--max_restriction_fragment_size` Maximum size of restriction fragments to consider for the Hi-C processing. -Default: '' +Default: '0' - no filter ```bash --max_restriction_fragment_size '[numeric]' @@ -403,7 +434,7 @@ Default: '' #### `--min_insert_size` Minimum reads insert size. Shorter 3C products are discarded. -Default: '' +Default: '0' - no filter ```bash --min_insert_size '[numeric]' @@ -412,74 +443,78 @@ Default: '' #### `--max_insert_size` Maximum reads insert size. Longer 3C products are discarded. -Default: '' +Default: '0' - no filter ```bash --max_insert_size '[numeric]' ``` -### DNAse Hi-C - -#### `--dnase` +#### `--min_cis_dist` -In DNAse Hi-C mode, all options related to digestion Hi-C -(see previous section) are ignored. -In this case, it is highly recommanded to use the `--min_cis_dist` parameter -to remove spurious ligation products. +Filter short range contact below the specified distance. +Mainly useful for DNase Hi-C. Default: '0' ```bash ---dnase' +--min_cis_dist '[numeric]' ``` -### Hi-C processing +#### `--keep_dups` -#### `--min_cis_dist` - -Filter short range contact below the specified distance. -Mainly useful for DNase Hi-C. Default: '' +If specified, duplicates reads are not discarded before building contact maps. ```bash ---min_cis_dist '[numeric]' +--keep_dups ``` -#### `--rm_singleton` +#### `--keep_multi` -If specified, singleton reads are discarded at the mapping step. +If specified, reads that aligned multiple times on the genome are not discarded. +Note the default mapping options are based on random hit assignment, meaning +that only one position is kept per read. +Note that in this case the `--min_mapq` parameter is ignored. ```bash ---rm_singleton +--keep_multi ``` -#### `--rm_dup` +## Genome-wide contact maps + +Once the list of valid pairs is available, the standard is now to move on the `cooler` +framework to build the raw and balanced contact maps in txt and (m)cool formats. + +### `--bin_size` -If specified, duplicates reads are discarded before building contact maps. +Resolution of contact maps to generate (comma separated). +Default:'1000000,500000' ```bash ---rm_dup +--bins_size '[string]' ``` -#### `--rm_multi` +### `--res_zoomify` -If specified, reads that aligned multiple times on the genome are discarded. -Note the default mapping options are based on random hit assignment, meaning -that only one position is kept per read. +Define the maximum resolution to reach when zoomify the cool contact maps. +Default:'5000' ```bash ---rm_multi +--res_zoomify '[string]' ``` -## Genome-wide contact maps +### HiC-Pro contact maps -### `--bin_size` +By default, the contact maps are now generated with the `cooler` framework. +However, for backward compatibility, the raw and normalized maps can still be generated +by HiC-pro if the `--hicpro_maps` parameter is set. -Resolution of contact maps to generate (space separated). -Default:'1000000,500000' +#### `--hicpro_maps` + +If specified, the raw and ICE normalized contact maps will be generated by HiC-Pro. ```bash ---bins_size '[numeric]' +--hicpro_maps ``` -### `--ice_max_iter` +#### `--ice_max_iter` Maximum number of iteration for ICE normalization. Default: 100 @@ -488,7 +523,7 @@ Default: 100 --ice_max_iter '[numeric]' ``` -### `--ice_filer_low_count_perc` +#### `--ice_filer_low_count_perc` Define which pourcentage of bins with low counts should be force to zero. Default: 0.02 @@ -497,7 +532,7 @@ Default: 0.02 --ice_filter_low_count_perc '[numeric]' ``` -### `--ice_filer_high_count_perc` +#### `--ice_filer_high_count_perc` Define which pourcentage of bins with low counts should be discarded before normalization. Default: 0 @@ -506,7 +541,7 @@ normalization. Default: 0 --ice_filter_high_count_perc '[numeric]' ``` -### `--ice_eps` +#### `--ice_eps` The relative increment in the results before declaring convergence for ICE normalization. Default: 0.1 @@ -515,6 +550,54 @@ normalization. Default: 0.1 --ice_eps '[numeric]' ``` +## Downstream analysis + +### Additional quality controls + +#### `--res_dist_decay` + +Generates distance vs Hi-C counts plots at a given resolution using `HiCExplorer`. +Several resolution can be specified (comma separeted). Default: '250000' + +```bash +--res_dist_decay '[string]' +``` + +### Compartment calling + +Call open/close compartments for each chromosome, using the `cooltools` command. + +#### `--res_compartments` + +Resolution to call the chromosome compartments (comma separated). +Default: '250000' + +```bash +--res_compartments '[string]' +``` + +### TADs calling + +#### `--tads_caller` + +TADs calling can be performed using different approaches. +Currently available options are `insulation` and `hicexplorer`. +Note that all options can be specified (comma separated). +Default: 'insulation' + +```bash +--tads_caller '[string]' +``` + +#### `--res_tads` + +Resolution to run the TADs calling analysis (comma separated). +Default: '40000,20000' + +```bash +--res_tads '[string]' +``` + ## Inputs/Outputs ### `--split_fastq` @@ -569,13 +652,13 @@ genome-wide maps are not built. Usefult for capture-C analysis. Default: false --skip_maps ``` -### `--skip_ice` +### `--skip_balancing` -If defined, the ICE normalization is not run on the raw contact maps. +If defined, the contact maps normalization is not run on the raw contact maps. Default: false ```bash ---skip_ice +--skip_balancing ``` ### `--skip_cool` @@ -586,6 +669,30 @@ If defined, cooler files are not generated. Default: false --skip_cool ``` +### `skip_dist_decay` + +Do not run distance decay plots. Default: false + +```bash +--skip_dist_decay +``` + +### `skip_compartments` + +Do not call compartments. Default: false + +```bash +--skip_compartments +``` + +### `skip_tads` + +Do not call TADs. Default: false + +```bash +--skip_tads +``` + ### `--skip_multiQC` If defined, the MultiQC report is not generated. Default: false diff --git a/environment.yml b/environment.yml index ccca9c3..9d35759 100644 --- a/environment.yml +++ b/environment.yml @@ -1,6 +1,6 @@ # You can use this file to create a conda environment for this pipeline: # conda env create -f environment.yml -name: nf-core-hic-1.2.2 +name: nf-core-hic-1.3.0 channels: - conda-forge - bioconda @@ -8,6 +8,7 @@ channels: dependencies: - conda-forge::python=3.7.6 - pip=20.0.1 + - conda-forge::tbb=2020.2=hc9558a2_0 - conda-forge::scipy=1.4.1 - conda-forge::numpy=1.18.1 - bioconda::iced=0.5.6 @@ -26,5 +27,5 @@ dependencies: - bioconda::ucsc-bedgraphtobigwig=357 - conda-forge::cython=0.29.19 - pip: - - cooltools==0.3.2 + - cooltools==0.4.0 - fanc==0.8.30 \ No newline at end of file diff --git a/lib/Headers.groovy b/lib/Headers.groovy new file mode 100644 index 0000000..15d1d38 --- /dev/null +++ b/lib/Headers.groovy @@ -0,0 +1,43 @@ +/* + * This file holds several functions used to render the nf-core ANSI header. + */ + +class Headers { + + private static Map log_colours(Boolean monochrome_logs) { + Map colorcodes = [:] + colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" + colorcodes['dim'] = monochrome_logs ? '' : "\033[2m" + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['yellow_bold'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + colorcodes['red'] = monochrome_logs ? '' : "\033[1;91m" + return colorcodes + } + + static String dashed_line(monochrome_logs) { + Map colors = log_colours(monochrome_logs) + return "-${colors.dim}----------------------------------------------------${colors.reset}-" + } + + static String nf_core(workflow, monochrome_logs) { + Map colors = log_colours(monochrome_logs) + String.format( + """\n + ${dashed_line(monochrome_logs)} + ${colors.green},--.${colors.black}/${colors.green},-.${colors.reset} + ${colors.blue} ___ __ __ __ ___ ${colors.green}/,-._.--~\'${colors.reset} + ${colors.blue} |\\ | |__ __ / ` / \\ |__) |__ ${colors.yellow}} {${colors.reset} + ${colors.blue} | \\| | \\__, \\__/ | \\ |___ ${colors.green}\\`-._,-`-,${colors.reset} + ${colors.green}`._,._,\'${colors.reset} + ${colors.purple} ${workflow.manifest.name} v${workflow.manifest.version}${colors.reset} + ${dashed_line(monochrome_logs)} + """.stripIndent() + ) + } +} diff --git a/lib/NfcoreSchema.groovy b/lib/NfcoreSchema.groovy new file mode 100644 index 0000000..52ee730 --- /dev/null +++ b/lib/NfcoreSchema.groovy @@ -0,0 +1,573 @@ +/* + * This file holds several functions used to perform JSON parameter validation, help and summary rendering for the nf-core pipeline template. + */ + +import org.everit.json.schema.Schema +import org.everit.json.schema.loader.SchemaLoader +import org.everit.json.schema.ValidationException +import org.json.JSONObject +import org.json.JSONTokener +import org.json.JSONArray +import groovy.json.JsonSlurper +import groovy.json.JsonBuilder + +class NfcoreSchema { + + /* + * Function to loop over all parameters defined in schema and check + * whether the given paremeters adhere to the specificiations + */ + /* groovylint-disable-next-line UnusedPrivateMethodParameter */ + private static void validateParameters(params, jsonSchema, log) { + def has_error = false + //=====================================================================// + // Check for nextflow core params and unexpected params + def json = new File(jsonSchema).text + def Map schemaParams = (Map) new JsonSlurper().parseText(json).get('definitions') + def nf_params = [ + // Options for base `nextflow` command + 'bg', + 'c', + 'C', + 'config', + 'd', + 'D', + 'dockerize', + 'h', + 'log', + 'q', + 'quiet', + 'syslog', + 'v', + 'version', + + // Options for `nextflow run` command + 'ansi', + 'ansi-log', + 'bg', + 'bucket-dir', + 'c', + 'cache', + 'config', + 'dsl2', + 'dump-channels', + 'dump-hashes', + 'E', + 'entry', + 'latest', + 'lib', + 'main-script', + 'N', + 'name', + 'offline', + 'params-file', + 'pi', + 'plugins', + 'poll-interval', + 'pool-size', + 'profile', + 'ps', + 'qs', + 'queue-size', + 'r', + 'resume', + 'revision', + 'stdin', + 'stub', + 'stub-run', + 'test', + 'w', + 'with-charliecloud', + 'with-conda', + 'with-dag', + 'with-docker', + 'with-mpi', + 'with-notification', + 'with-podman', + 'with-report', + 'with-singularity', + 'with-timeline', + 'with-tower', + 'with-trace', + 'with-weblog', + 'without-docker', + 'without-podman', + 'work-dir' + ] + def unexpectedParams = [] + + // Collect expected parameters from the schema + def expectedParams = [] + for (group in schemaParams) { + for (p in group.value['properties']) { + expectedParams.push(p.key) + } + } + + for (specifiedParam in params.keySet()) { + // nextflow params + if (nf_params.contains(specifiedParam)) { + log.error "ERROR: You used a core Nextflow option with two hyphens: '--${specifiedParam}'. Please resubmit with '-${specifiedParam}'" + has_error = true + } + // unexpected params + def params_ignore = params.schema_ignore_params.split(',') + 'schema_ignore_params' + def expectedParamsLowerCase = expectedParams.collect{ it.replace("-", "").toLowerCase() } + def specifiedParamLowerCase = specifiedParam.replace("-", "").toLowerCase() + if (!expectedParams.contains(specifiedParam) && !params_ignore.contains(specifiedParam) && !expectedParamsLowerCase.contains(specifiedParamLowerCase)) { + // Temporarily remove camelCase/camel-case params #1035 + def unexpectedParamsLowerCase = unexpectedParams.collect{ it.replace("-", "").toLowerCase()} + if (!unexpectedParamsLowerCase.contains(specifiedParamLowerCase)){ + unexpectedParams.push(specifiedParam) + } + } + } + + //=====================================================================// + // Validate parameters against the schema + InputStream inputStream = new File(jsonSchema).newInputStream() + JSONObject rawSchema = new JSONObject(new JSONTokener(inputStream)) + + // Remove anything that's in params.schema_ignore_params + rawSchema = removeIgnoredParams(rawSchema, params) + + Schema schema = SchemaLoader.load(rawSchema) + + // Clean the parameters + def cleanedParams = cleanParameters(params) + + // Convert to JSONObject + def jsonParams = new JsonBuilder(cleanedParams) + JSONObject paramsJSON = new JSONObject(jsonParams.toString()) + + // Validate + try { + schema.validate(paramsJSON) + } catch (ValidationException e) { + println '' + log.error 'ERROR: Validation of pipeline parameters failed!' + JSONObject exceptionJSON = e.toJSON() + printExceptions(exceptionJSON, paramsJSON, log) + println '' + has_error = true + } + + // Check for unexpected parameters + if (unexpectedParams.size() > 0) { + Map colors = log_colours(params.monochrome_logs) + println '' + def warn_msg = 'Found unexpected parameters:' + for (unexpectedParam in unexpectedParams) { + warn_msg = warn_msg + "\n* --${unexpectedParam}: ${params[unexpectedParam].toString()}" + } + log.warn warn_msg + log.info "- ${colors.dim}Ignore this warning: params.schema_ignore_params = \"${unexpectedParams.join(',')}\" ${colors.reset}" + println '' + } + + if (has_error) { + System.exit(1) + } + } + + // Loop over nested exceptions and print the causingException + private static void printExceptions(exJSON, paramsJSON, log) { + def causingExceptions = exJSON['causingExceptions'] + if (causingExceptions.length() == 0) { + def m = exJSON['message'] =~ /required key \[([^\]]+)\] not found/ + // Missing required param + if (m.matches()) { + log.error "* Missing required parameter: --${m[0][1]}" + } + // Other base-level error + else if (exJSON['pointerToViolation'] == '#') { + log.error "* ${exJSON['message']}" + } + // Error with specific param + else { + def param = exJSON['pointerToViolation'] - ~/^#\// + def param_val = paramsJSON[param].toString() + log.error "* --${param}: ${exJSON['message']} (${param_val})" + } + } + for (ex in causingExceptions) { + printExceptions(ex, paramsJSON, log) + } + } + + // Remove an element from a JSONArray + private static JSONArray removeElement(jsonArray, element){ + def list = [] + int len = jsonArray.length() + for (int i=0;i + if(rawSchema.keySet().contains('definitions')){ + rawSchema.definitions.each { definition -> + for (key in definition.keySet()){ + if (definition[key].get("properties").keySet().contains(ignore_param)){ + // Remove the param to ignore + definition[key].get("properties").remove(ignore_param) + // If the param was required, change this + if (definition[key].has("required")) { + def cleaned_required = removeElement(definition[key].required, ignore_param) + definition[key].put("required", cleaned_required) + } + } + } + } + } + if(rawSchema.keySet().contains('properties') && rawSchema.get('properties').keySet().contains(ignore_param)) { + rawSchema.get("properties").remove(ignore_param) + } + if(rawSchema.keySet().contains('required') && rawSchema.required.contains(ignore_param)) { + def cleaned_required = removeElement(rawSchema.required, ignore_param) + rawSchema.put("required", cleaned_required) + } + } + return rawSchema + } + + private static Map cleanParameters(params) { + def new_params = params.getClass().newInstance(params) + for (p in params) { + // remove anything evaluating to false + if (!p['value']) { + new_params.remove(p.key) + } + // Cast MemoryUnit to String + if (p['value'].getClass() == nextflow.util.MemoryUnit) { + new_params.replace(p.key, p['value'].toString()) + } + // Cast Duration to String + if (p['value'].getClass() == nextflow.util.Duration) { + new_params.replace(p.key, p['value'].toString().replaceFirst(/d(?!\S)/, "day")) + } + // Cast LinkedHashMap to String + if (p['value'].getClass() == LinkedHashMap) { + new_params.replace(p.key, p['value'].toString()) + } + } + return new_params + } + + /* + * This method tries to read a JSON params file + */ + private static LinkedHashMap params_load(String json_schema) { + def params_map = new LinkedHashMap() + try { + params_map = params_read(json_schema) + } catch (Exception e) { + println "Could not read parameters settings from JSON. $e" + params_map = new LinkedHashMap() + } + return params_map + } + + private static Map log_colours(Boolean monochrome_logs) { + Map colorcodes = [:] + + // Reset / Meta + colorcodes['reset'] = monochrome_logs ? '' : "\033[0m" + colorcodes['bold'] = monochrome_logs ? '' : "\033[1m" + colorcodes['dim'] = monochrome_logs ? '' : "\033[2m" + colorcodes['underlined'] = monochrome_logs ? '' : "\033[4m" + colorcodes['blink'] = monochrome_logs ? '' : "\033[5m" + colorcodes['reverse'] = monochrome_logs ? '' : "\033[7m" + colorcodes['hidden'] = monochrome_logs ? '' : "\033[8m" + + // Regular Colors + colorcodes['black'] = monochrome_logs ? '' : "\033[0;30m" + colorcodes['red'] = monochrome_logs ? '' : "\033[0;31m" + colorcodes['green'] = monochrome_logs ? '' : "\033[0;32m" + colorcodes['yellow'] = monochrome_logs ? '' : "\033[0;33m" + colorcodes['blue'] = monochrome_logs ? '' : "\033[0;34m" + colorcodes['purple'] = monochrome_logs ? '' : "\033[0;35m" + colorcodes['cyan'] = monochrome_logs ? '' : "\033[0;36m" + colorcodes['white'] = monochrome_logs ? '' : "\033[0;37m" + + // Bold + colorcodes['bblack'] = monochrome_logs ? '' : "\033[1;30m" + colorcodes['bred'] = monochrome_logs ? '' : "\033[1;31m" + colorcodes['bgreen'] = monochrome_logs ? '' : "\033[1;32m" + colorcodes['byellow'] = monochrome_logs ? '' : "\033[1;33m" + colorcodes['bblue'] = monochrome_logs ? '' : "\033[1;34m" + colorcodes['bpurple'] = monochrome_logs ? '' : "\033[1;35m" + colorcodes['bcyan'] = monochrome_logs ? '' : "\033[1;36m" + colorcodes['bwhite'] = monochrome_logs ? '' : "\033[1;37m" + + // Underline + colorcodes['ublack'] = monochrome_logs ? '' : "\033[4;30m" + colorcodes['ured'] = monochrome_logs ? '' : "\033[4;31m" + colorcodes['ugreen'] = monochrome_logs ? '' : "\033[4;32m" + colorcodes['uyellow'] = monochrome_logs ? '' : "\033[4;33m" + colorcodes['ublue'] = monochrome_logs ? '' : "\033[4;34m" + colorcodes['upurple'] = monochrome_logs ? '' : "\033[4;35m" + colorcodes['ucyan'] = monochrome_logs ? '' : "\033[4;36m" + colorcodes['uwhite'] = monochrome_logs ? '' : "\033[4;37m" + + // High Intensity + colorcodes['iblack'] = monochrome_logs ? '' : "\033[0;90m" + colorcodes['ired'] = monochrome_logs ? '' : "\033[0;91m" + colorcodes['igreen'] = monochrome_logs ? '' : "\033[0;92m" + colorcodes['iyellow'] = monochrome_logs ? '' : "\033[0;93m" + colorcodes['iblue'] = monochrome_logs ? '' : "\033[0;94m" + colorcodes['ipurple'] = monochrome_logs ? '' : "\033[0;95m" + colorcodes['icyan'] = monochrome_logs ? '' : "\033[0;96m" + colorcodes['iwhite'] = monochrome_logs ? '' : "\033[0;97m" + + // Bold High Intensity + colorcodes['biblack'] = monochrome_logs ? '' : "\033[1;90m" + colorcodes['bired'] = monochrome_logs ? '' : "\033[1;91m" + colorcodes['bigreen'] = monochrome_logs ? '' : "\033[1;92m" + colorcodes['biyellow'] = monochrome_logs ? '' : "\033[1;93m" + colorcodes['biblue'] = monochrome_logs ? '' : "\033[1;94m" + colorcodes['bipurple'] = monochrome_logs ? '' : "\033[1;95m" + colorcodes['bicyan'] = monochrome_logs ? '' : "\033[1;96m" + colorcodes['biwhite'] = monochrome_logs ? '' : "\033[1;97m" + + return colorcodes + } + + static String dashed_line(monochrome_logs) { + Map colors = log_colours(monochrome_logs) + return "-${colors.dim}----------------------------------------------------${colors.reset}-" + } + + /* + Method to actually read in JSON file using Groovy. + Group (as Key), values are all parameters + - Parameter1 as Key, Description as Value + - Parameter2 as Key, Description as Value + .... + Group + - + */ + private static LinkedHashMap params_read(String json_schema) throws Exception { + def json = new File(json_schema).text + def Map schema_definitions = (Map) new JsonSlurper().parseText(json).get('definitions') + def Map schema_properties = (Map) new JsonSlurper().parseText(json).get('properties') + /* Tree looks like this in nf-core schema + * definitions <- this is what the first get('definitions') gets us + group 1 + title + description + properties + parameter 1 + type + description + parameter 2 + type + description + group 2 + title + description + properties + parameter 1 + type + description + * properties <- parameters can also be ungrouped, outside of definitions + parameter 1 + type + description + */ + + // Grouped params + def params_map = new LinkedHashMap() + schema_definitions.each { key, val -> + def Map group = schema_definitions."$key".properties // Gets the property object of the group + def title = schema_definitions."$key".title + def sub_params = new LinkedHashMap() + group.each { innerkey, value -> + sub_params.put(innerkey, value) + } + params_map.put(title, sub_params) + } + + // Ungrouped params + def ungrouped_params = new LinkedHashMap() + schema_properties.each { innerkey, value -> + ungrouped_params.put(innerkey, value) + } + params_map.put("Other parameters", ungrouped_params) + + return params_map + } + + /* + * Get maximum number of characters across all parameter names + */ + private static Integer params_max_chars(params_map) { + Integer max_chars = 0 + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (param.size() > max_chars) { + max_chars = param.size() + } + } + } + return max_chars + } + + /* + * Beautify parameters for --help + */ + private static String params_help(workflow, params, json_schema, command) { + Map colors = log_colours(params.monochrome_logs) + Integer num_hidden = 0 + String output = '' + output += 'Typical pipeline command:\n\n' + output += " ${colors.cyan}${command}${colors.reset}\n\n" + Map params_map = params_load(json_schema) + Integer max_chars = params_max_chars(params_map) + 1 + Integer desc_indent = max_chars + 14 + Integer dec_linewidth = 160 - desc_indent + for (group in params_map.keySet()) { + Integer num_params = 0 + String group_output = colors.underlined + colors.bold + group + colors.reset + '\n' + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (group_params.get(param).hidden && !params.show_hidden_params) { + num_hidden += 1 + continue; + } + def type = '[' + group_params.get(param).type + ']' + def description = group_params.get(param).description + def defaultValue = group_params.get(param).default ? " [default: " + group_params.get(param).default.toString() + "]" : '' + def description_default = description + colors.dim + defaultValue + colors.reset + // Wrap long description texts + // Loosely based on https://dzone.com/articles/groovy-plain-text-word-wrap + if (description_default.length() > dec_linewidth){ + List olines = [] + String oline = "" // " " * indent + description_default.split(" ").each() { wrd -> + if ((oline.size() + wrd.size()) <= dec_linewidth) { + oline += wrd + " " + } else { + olines += oline + oline = wrd + " " + } + } + olines += oline + description_default = olines.join("\n" + " " * desc_indent) + } + group_output += " --" + param.padRight(max_chars) + colors.dim + type.padRight(10) + colors.reset + description_default + '\n' + num_params += 1 + } + group_output += '\n' + if (num_params > 0){ + output += group_output + } + } + output += dashed_line(params.monochrome_logs) + if (num_hidden > 0){ + output += colors.dim + "\n Hiding $num_hidden params, use --show_hidden_params to show.\n" + colors.reset + output += dashed_line(params.monochrome_logs) + } + return output + } + + /* + * Groovy Map summarising parameters/workflow options used by the pipeline + */ + private static LinkedHashMap params_summary_map(workflow, params, json_schema) { + // Get a selection of core Nextflow workflow options + def Map workflow_summary = [:] + if (workflow.revision) { + workflow_summary['revision'] = workflow.revision + } + workflow_summary['runName'] = workflow.runName + if (workflow.containerEngine) { + workflow_summary['containerEngine'] = workflow.containerEngine + } + if (workflow.container) { + workflow_summary['container'] = workflow.container + } + workflow_summary['launchDir'] = workflow.launchDir + workflow_summary['workDir'] = workflow.workDir + workflow_summary['projectDir'] = workflow.projectDir + workflow_summary['userName'] = workflow.userName + workflow_summary['profile'] = workflow.profile + workflow_summary['configFiles'] = workflow.configFiles.join(', ') + + // Get pipeline parameters defined in JSON Schema + def Map params_summary = [:] + def blacklist = ['hostnames'] + def params_map = params_load(json_schema) + for (group in params_map.keySet()) { + def sub_params = new LinkedHashMap() + def group_params = params_map.get(group) // This gets the parameters of that particular group + for (param in group_params.keySet()) { + if (params.containsKey(param) && !blacklist.contains(param)) { + def params_value = params.get(param) + def schema_value = group_params.get(param).default + def param_type = group_params.get(param).type + if (schema_value != null) { + if (param_type == 'string') { + if (schema_value.contains('$projectDir') || schema_value.contains('${projectDir}')) { + def sub_string = schema_value.replace('\$projectDir', '') + sub_string = sub_string.replace('\${projectDir}', '') + if (params_value.contains(sub_string)) { + schema_value = params_value + } + } + if (schema_value.contains('$params.outdir') || schema_value.contains('${params.outdir}')) { + def sub_string = schema_value.replace('\$params.outdir', '') + sub_string = sub_string.replace('\${params.outdir}', '') + if ("${params.outdir}${sub_string}" == params_value) { + schema_value = params_value + } + } + } + } + + // We have a default in the schema, and this isn't it + if (schema_value != null && params_value != schema_value) { + sub_params.put(param, params_value) + } + // No default in the schema, and this isn't empty + else if (schema_value == null && params_value != "" && params_value != null && params_value != false) { + sub_params.put(param, params_value) + } + } + } + params_summary.put(group, sub_params) + } + return [ 'Core Nextflow options' : workflow_summary ] << params_summary + } + + /* + * Beautify parameters for summary and return as string + */ + private static String params_summary_log(workflow, params, json_schema) { + Map colors = log_colours(params.monochrome_logs) + String output = '' + def params_map = params_summary_map(workflow, params, json_schema) + def max_chars = params_max_chars(params_map) + for (group in params_map.keySet()) { + def group_params = params_map.get(group) // This gets the parameters of that particular group + if (group_params) { + output += colors.bold + group + colors.reset + '\n' + for (param in group_params.keySet()) { + output += " " + colors.blue + param.padRight(max_chars) + ": " + colors.green + group_params.get(param) + colors.reset + '\n' + } + output += '\n' + } + } + output += dashed_line(params.monochrome_logs) + output += colors.dim + "\n Only displaying parameters that differ from defaults.\n" + colors.reset + output += dashed_line(params.monochrome_logs) + return output + } + +} diff --git a/lib/nfcore_external_java_deps.jar b/lib/nfcore_external_java_deps.jar new file mode 100644 index 0000000..805c8bb Binary files /dev/null and b/lib/nfcore_external_java_deps.jar differ diff --git a/main.nf b/main.nf index 3528a41..a8611d5 100644 --- a/main.nf +++ b/main.nf @@ -9,137 +9,72 @@ ---------------------------------------------------------------------------------------- */ -def helpMessage() { - // Add to this help message with new command line parameters - log.info nfcoreHeader() - log.info""" - - Usage: - - The typical command for running the pipeline is as follows: - - nextflow run nf-core/hic --input '*_R{1,2}.fastq.gz' -profile docker - - Mandatory arguments: - --input [file] Path to input data (must be surrounded with quotes) - -profile [str] Configuration profile to use. Can use multiple (comma separated) - Available: conda, docker, singularity, awsbatch, test and more. - - References If not specified in the configuration file or you wish to overwrite any of the references. - --genome [str] Name of iGenomes reference - --bwt2_index [file] Path to Bowtie2 index - --fasta [file] Path to Fasta reference - --chromosome_size [file] Path to chromosome size file - --restriction_fragments [file] Path to restriction fragment file (bed) - --save_reference [bool] Save reference genome to output folder. Default: False - - Alignments - --split_fastq [bool] Split fastq files in reads chunks to speed up computation. Default: false - --fastq_chunks_size [int] Size of read chunks if split_fastq is true. Default: 20000000 - --save_aligned_intermediates [bool] Save intermediates alignment files. Default: False - --bwt2_opts_end2end [str] Options for bowtie2 end-to-end mappinf (first mapping step). See hic.config for default. - --bwt2_opts_trimmed [str] Options for bowtie2 mapping after ligation site trimming. See hic.config for default. - --min_mapq [int] Minimum mapping quality values to consider. Default: 10 - --restriction_site [str] Cutting motif(s) of restriction enzyme(s) (comma separated). Default: 'A^AGCTT' - --ligation_site [str] Ligation motifs to trim (comma separated). Default: 'AAGCTAGCTT' - --rm_singleton [bool] Remove singleton reads. Default: true - --rm_multi [bool] Remove multi-mapped reads. Default: true - --rm_dup [bool] Remove duplicates. Default: true - - Contacts calling - --min_restriction_fragment_size [int] Minimum size of restriction fragments to consider. Default: 0 - --max_restriction_fragment_size [int] Maximum size of restriction fragments to consider. Default: 0 - --min_insert_size [int] Minimum insert size of mapped reads to consider. Default: 0 - --max_insert_size [int] Maximum insert size of mapped reads to consider. Default: 0 - --save_interaction_bam [bool] Save BAM file with interaction tags (dangling-end, self-circle, etc.). Default: False - - --dnase [bool] Run DNase Hi-C mode. All options related to restriction fragments are not considered. Default: False - --min_cis_dist [int] Minimum intra-chromosomal distance to consider. Default: 0 - - Contact maps - --bin_size [int] Bin size for contact maps (comma separated). Default: '1000000,500000' - --ice_max_iter [int] Maximum number of iteration for ICE normalization. Default: 100 - --ice_filter_low_count_perc [float] Percentage of low counts columns/rows to filter before ICE normalization. Default: 0.02 - --ice_filter_high_count_perc [float] Percentage of high counts columns/rows to filter before ICE normalization. Default: 0 - --ice_eps [float] Convergence criteria for ICE normalization. Default: 0.1 - - - Workflow - --skip_maps [bool] Skip generation of contact maps. Useful for capture-C. Default: False - --skip_ice [bool] Skip ICE normalization. Default: False - --skip_cool [bool] Skip generation of cool files. Default: False - --skip_multiqc [bool] Skip MultiQC. Default: False - - Other options: - --outdir [file] The output directory where the results will be saved - --publish_dir_mode [str] Mode for publishing results in the output directory. Available: symlink, rellink, link, copy, copyNoFollow, move (Default: copy) - --email [email] Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits. Default: None - --email_on_fail [email] Same as --email, except only send mail if the workflow is not successful - --max_multiqc_email_size [str] Theshold size for MultiQC report to be attached in notification email. If file generated by pipeline exceeds the threshold, it will not be attached (Default: 25MB) - -name [str] Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic. Default: None - - AWSBatch options: - --awsqueue [str] The AWSBatch JobQueue that needs to be set when running on AWSBatch - --awsregion [str] The AWS Region for your AWS Batch job to run on - --awscli [str] Path to the AWS CLI tool - """.stripIndent() +log.info Headers.nf_core(workflow, params.monochrome_logs) + +//////////////////////////////////////////////////// +/* -- PRINT HELP -- */ +////////////////////////////////////////////////////+ +def json_schema = "$projectDir/nextflow_schema.json" +if (params.help) { + def command = "nextflow run nf-core/hic --input '*_R{1,2}.fastq.gz' -profile docker" + log.info NfcoreSchema.params_help(workflow, params, json_schema, command) + exit 0 } -/********************************************************** - * SET UP CONFIGURATION VARIABLES - */ - -// Show help message -if (params.help){ - helpMessage() - exit 0 +//////////////////////////////////////////////////// +/* -- VALIDATE PARAMETERS -- */ +////////////////////////////////////////////////////+ +if (params.validate_params) { + NfcoreSchema.validateParameters(params, json_schema, log) } // Check if genome exists in the config file if (params.genomes && params.genome && !params.genomes.containsKey(params.genome)) { - exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(", ")}" + exit 1, "The provided genome '${params.genome}' is not available in the iGenomes file. Currently the available genomes are ${params.genomes.keySet().join(', ')}" +} + +if (params.digest && params.digestion && !params.digest.containsKey(params.digestion)) { + exit 1, "Unknown digestion protocol. Currently, the available digestion options are ${params.digest.keySet().join(", ")}. Please set manually the '--restriction_site' and '--ligation_site' parameters." } +params.restriction_site = params.digestion ? params.digest[ params.digestion ].restriction_site ?: false : false +params.ligation_site = params.digestion ? params.digest[ params.digestion ].ligation_site ?: false : false + // Check Digestion or DNase Hi-C mode if (!params.dnase && !params.ligation_site) { - exit 1, "Ligation motif not found. For DNase Hi-C, please use '--dnase' option" + exit 1, "Ligation motif not found. Please either use the `--digestion` parameters or specify the `--restriction_site` and `--ligation_site`. For DNase Hi-C, please use '--dnase' option" } // Reference index path configuration params.bwt2_index = params.genome ? params.genomes[ params.genome ].bowtie2 ?: false : false params.fasta = params.genome ? params.genomes[ params.genome ].fasta ?: false : false -// Has the run name been specified by the user? -// this has the bonus effect of catching both -name and --name -custom_runName = params.name -if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { - custom_runName = workflow.runName -} + +//////////////////////////////////////////////////// +/* -- Collect configuration parameters -- */ +//////////////////////////////////////////////////// // Check AWS batch settings if (workflow.profile.contains('awsbatch')) { // AWSBatch sanity checking - if (!params.awsqueue || !params.awsregion) exit 1, "Specify correct --awsqueue and --awsregion parameters on AWSBatch!" + if (!params.awsqueue || !params.awsregion) exit 1, 'Specify correct --awsqueue and --awsregion parameters on AWSBatch!' // Check outdir paths to be S3 buckets if running on AWSBatch // related: https://github.com/nextflow-io/nextflow/issues/813 - if (!params.outdir.startsWith('s3:')) exit 1, "Outdir not on S3 - specify S3 Bucket to run on AWSBatch!" + if (!params.outdir.startsWith('s3:')) exit 1, 'Outdir not on S3 - specify S3 Bucket to run on AWSBatch!' // Prevent trace files to be stored on S3 since S3 does not support rolling files. - if (params.tracedir.startsWith('s3:')) exit 1, "Specify a local tracedir or run without trace! S3 cannot be used for tracefiles." + if (params.tracedir.startsWith('s3:')) exit 1, 'Specify a local tracedir or run without trace! S3 cannot be used for tracefiles.' } // Stage config files -ch_multiqc_config = file("$baseDir/assets/multiqc_config.yaml", checkIfExists: true) +ch_multiqc_config = file("$projectDir/assets/multiqc_config.yaml", checkIfExists: true) ch_multiqc_custom_config = params.multiqc_config ? Channel.fromPath(params.multiqc_config, checkIfExists: true) : Channel.empty() -ch_output_docs = file("$baseDir/docs/output.md", checkIfExists: true) -ch_output_docs_images = file("$baseDir/docs/images/", checkIfExists: true) - -/********************************************************** - * SET UP CHANNELS - */ +ch_output_docs = file("$projectDir/docs/output.md", checkIfExists: true) +ch_output_docs_images = file("$projectDir/docs/images/", checkIfExists: true) /* * input read files */ + if (params.input_paths){ raw_reads = Channel.create() @@ -148,27 +83,37 @@ if (params.input_paths){ Channel .from( params.input_paths ) .map { row -> [ row[0], [file(row[1][0]), file(row[1][1])]] } - .separate( raw_reads, raw_reads_2 ) { a -> [tuple(a[0], a[1][0]), tuple(a[0], a[1][1])] } - }else{ + .separate( raw_reads, raw_reads_2 ) { a -> [tuple(a[0] + "_R1", a[1][0]), tuple(a[0] + "_R2", a[1][1])] } +}else{ raw_reads = Channel.create() raw_reads_2 = Channel.create() - Channel - .fromFilePairs( params.input ) - .separate( raw_reads, raw_reads_2 ) { a -> [tuple(a[0], a[1][0]), tuple(a[0], a[1][1])] } + if ( params.split_fastq ){ + Channel + .fromFilePairs( params.input, flat:true ) + .splitFastq( by: params.fastq_chunks_size, pe:true, file: true, compress:true) + .separate( raw_reads, raw_reads_2 ) { a -> [tuple(a[0] + "_R1", a[1]), tuple(a[0] + "_R2", a[2])] } + }else{ + Channel + .fromFilePairs( params.input ) + .separate( raw_reads, raw_reads_2 ) { a -> [tuple(a[0] + "_R1", a[1][0]), tuple(a[0] + "_R2", a[1][1])] } + } } -// SPlit fastq files -// https://www.nextflow.io/docs/latest/operator.html#splitfastq - -if ( params.split_fastq ){ - raw_reads_full = raw_reads.concat( raw_reads_2 ) - raw_reads = raw_reads_full.splitFastq( by: params.fastq_chunks_size, file: true) - }else{ - raw_reads = raw_reads.concat( raw_reads_2 ).dump(tag: "data") +// Update sample name if splitFastq is used +def updateSampleName(x) { + if ((matcher = x[1] =~ /\s*(\.[\d]+).fastq.gz/)) { + res = matcher[0][1] + } + return [x[0] + res, x[1]] } +if (params.split_fastq ){ + raw_reads = raw_reads.concat( raw_reads_2 ).map{it -> updateSampleName(it)}.dump(tag:'input') +}else{ + raw_reads = raw_reads.concat( raw_reads_2 ).dump(tag:'input') +} /* * Other input channels @@ -176,23 +121,16 @@ if ( params.split_fastq ){ // Reference genome if ( params.bwt2_index ){ - lastPath = params.bwt2_index.lastIndexOf(File.separator) - bwt2_dir = params.bwt2_index.substring(0,lastPath+1) - bwt2_base = params.bwt2_index.substring(lastPath+1) - Channel.fromPath( bwt2_dir , checkIfExists: true) + Channel.fromPath( params.bwt2_index , checkIfExists: true) .ifEmpty { exit 1, "Genome index: Provided index not found: ${params.bwt2_index}" } .into { bwt2_index_end2end; bwt2_index_trim } } else if ( params.fasta ) { - lastPath = params.fasta.lastIndexOf(File.separator) - fasta_base = params.fasta.substring(lastPath+1) - bwt2_base = fasta_base.toString() - ~/(\.fa)?(\.fasta)?(\.fas)?(\.fsa)?$/ - Channel.fromPath( params.fasta ) .ifEmpty { exit 1, "Genome index: Fasta file not found: ${params.fasta}" } - .set { fasta_for_index } + .into { fasta_for_index } } else { exit 1, "No reference genome specified!" @@ -201,7 +139,7 @@ else { // Chromosome size if ( params.chromosome_size ){ Channel.fromPath( params.chromosome_size , checkIfExists: true) - .into {chromosome_size; chromosome_size_cool} + .into {chrsize; chrsize_build; chrsize_raw; chrsize_balance; chrsize_zoom; chrsize_compartments} } else if ( params.fasta ){ Channel.fromPath( params.fasta ) @@ -222,57 +160,108 @@ else if ( params.fasta && params.restriction_site ){ .ifEmpty { exit 1, "Restriction fragments: Fasta file not found: ${params.fasta}" } .set { fasta_for_resfrag } } -else { +else if (! params.dnase) { exit 1, "No restriction fragments file specified!" } // Resolutions for contact maps -map_res = Channel.from( params.bin_size.tokenize(',') ) +map_res = Channel.from( params.bin_size ).splitCsv().flatten() +all_res = params.bin_size +if (params.res_tads && !params.skip_tads){ + Channel.from( "${params.res_tads}" ) + .splitCsv() + .flatten() + .into {tads_bin; tads_res_hicexplorer; tads_res_insulation} + map_res = map_res.concat(tads_bin) + all_res = all_res + ',' + params.res_tads +}else{ + tads_res_hicexplorer=Channel.empty() + tads_res_insulation=Channel.empty() + tads_bin=Channel.empty() + if (!params.skip_tads){ + log.warn "[nf-core/hic] Hi-C resolution for TADs calling not specified. See --res_tads" + } +} -/********************************************************** - * SET UP LOGS - */ +if (params.res_dist_decay && !params.skip_dist_decay){ + Channel.from( "${params.res_dist_decay}" ) + .splitCsv() + .flatten() + .into {ddecay_res; ddecay_bin } + map_res = map_res.concat(ddecay_bin) + all_res = all_res + ',' + params.res_dist_decay +}else{ + ddecay_res = Channel.create() + ddecay_bin = Channel.create() + if (!params.skip_dist_decay){ + log.warn "[nf-core/hic] Hi-C resolution for distance decay not specified. See --res_dist_decay" + } +} + +if (params.res_compartments && !params.skip_compartments){ + Channel.fromPath( params.fasta ) + .ifEmpty { exit 1, "Compartments calling: Fasta file not found: ${params.fasta}" } + .set { fasta_for_compartments } + Channel.from( "${params.res_compartments}" ) + .splitCsv() + .flatten() + .into {comp_bin; comp_res} + map_res = map_res.concat(comp_bin) + all_res = all_res + ',' + params.res_compartments +}else{ + fasta_for_compartments = Channel.empty() + comp_res = Channel.create() + if (!params.skip_compartments){ + log.warn "[nf-core/hic] Hi-C resolution for compartment calling not specified. See --res_compartments" + } +} + +map_res + .unique() + .into { map_res_summary; map_res; map_res_cool; map_comp } + + +//////////////////////////////////////////////////// +/* -- PRINT PARAMETER SUMMARY -- */ +//////////////////////////////////////////////////// +log.info NfcoreSchema.params_summary_log(workflow, params, json_schema) // Header log info -log.info nfcoreHeader() def summary = [:] -if(workflow.revision) summary['Pipeline Release'] = workflow.revision -summary['Run Name'] = custom_runName ?: workflow.runName +if (workflow.revision) summary['Pipeline Release'] = workflow.revision +summary['Run Name'] = workflow.runName summary['Input'] = params.input summary['splitFastq'] = params.split_fastq if (params.split_fastq) summary['Read chunks Size'] = params.fastq_chunks_size summary['Fasta Ref'] = params.fasta -summary['Restriction Motif']= params.restriction_site -summary['Ligation Motif'] = params.ligation_site -summary['DNase Mode'] = params.dnase -summary['Remove Dup'] = params.rm_dup -summary['Remove MultiHits'] = params.rm_multi +if (params.restriction_site){ + summary['Digestion'] = params.digestion + summary['Restriction Motif']= params.restriction_site + summary['Ligation Motif'] = params.ligation_site + summary['Min Fragment Size']= params.min_restriction_fragment_size + summary['Max Fragment Size']= params.max_restriction_fragment_size + summary['Min Insert Size'] = params.min_insert_size + summary['Max Insert Size'] = params.max_insert_size +}else{ + summary['DNase Mode'] = params.dnase + summary['Min CIS dist'] = params.min_cis_dist +} summary['Min MAPQ'] = params.min_mapq -summary['Min Fragment Size']= params.min_restriction_fragment_size -summary['Max Fragment Size']= params.max_restriction_fragment_size -summary['Min Insert Size'] = params.min_insert_size -summary['Max Insert Size'] = params.max_insert_size -summary['Min CIS dist'] = params.min_cis_dist -summary['Maps resolution'] = params.bin_size -summary['Max Memory'] = params.max_memory -summary['Max CPUs'] = params.max_cpus -summary['Max Time'] = params.max_time +summary['Keep Duplicates'] = params.keep_dups ? 'Yes' : 'No' +summary['Keep Multihits'] = params.keep_multi ? 'Yes' : 'No' +summary['Maps resolution'] = all_res +summary['Max Resources'] = "$params.max_memory memory, $params.max_cpus cpus, $params.max_time time per job" +if (workflow.containerEngine) summary['Container'] = "$workflow.containerEngine - $workflow.container" summary['Output dir'] = params.outdir +summary['Launch dir'] = workflow.launchDir summary['Working dir'] = workflow.workDir -summary['Container Engine'] = workflow.containerEngine -if(workflow.containerEngine) - summary['Container'] = workflow.container -summary['Current home'] = "$HOME" -summary['Current user'] = "$USER" -summary['Current path'] = "$PWD" -summary['Working dir'] = workflow.workDir -summary['Output dir'] = params.outdir summary['Script dir'] = workflow.projectDir -summary['Config Profile'] = workflow.profile -if(workflow.profile == 'awsbatch'){ - summary['AWS Region'] = params.awsregion - summary['AWS Queue'] = params.awsqueue +summary['User'] = workflow.userName +if (workflow.profile.contains('awsbatch')) { + summary['AWS Region'] = params.awsregion + summary['AWS Queue'] = params.awsqueue + summary['AWS CLI'] = params.awscli } summary['Config Profile'] = workflow.profile if (params.config_profile_description) summary['Config Profile Description'] = params.config_profile_description @@ -284,8 +273,6 @@ if (params.email || params.email_on_fail) { summary['E-mail on failure'] = params.email_on_fail summary['MultiQC maxsize'] = params.max_multiqc_email_size } -log.info summary.collect { k,v -> "${k.padRight(18)}: $v" }.join("\n") -log.info "-\033[2m--------------------------------------------------\033[0m-" // Check the hostnames against configured profiles checkHostname() @@ -312,14 +299,11 @@ Channel.from(summary.collect{ [it.key, it.value] }) process get_software_versions { publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode, - saveAs: { filename -> - if (filename.indexOf(".csv") > 0) filename - else null - } + saveAs: { filename -> if (filename.indexOf('.csv') > 0) filename else null } - output: - file 'software_versions_mqc.yaml' into software_versions_yaml - file "software_versions.csv" + output: + file 'software_versions_mqc.yaml' into ch_software_versions_yaml + file 'software_versions.csv' script: """ @@ -333,35 +317,15 @@ process get_software_versions { """ } -def create_workflow_summary(summary) { - - def yaml_file = workDir.resolve('workflow_summary_mqc.yaml') - yaml_file.text = """ - id: 'nf-core-chipseq-summary' - description: " - this information is collected when the pipeline is started." - section_name: 'nf-core/chipseq Workflow Summary' - section_href: 'https://github.com/nf-core/chipseq' - plot_type: 'html' - data: | -
-${summary.collect { k,v -> "
$k
${v ?: 'N/A'}
" }.join("\n")} -
- """.stripIndent() - - return yaml_file -} - - - /**************************************************** * PRE-PROCESSING */ if(!params.bwt2_index && params.fasta){ process makeBowtie2Index { - tag "$bwt2_base" + tag "$fasta_base" label 'process_highmem' - publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, + publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, saveAs: { params.save_reference ? it : null }, mode: params.publish_dir_mode input: @@ -372,9 +336,10 @@ if(!params.bwt2_index && params.fasta){ file "bowtie2_index" into bwt2_index_trim script: + fasta_base = fasta.toString() - ~/(\.fa)?(\.fasta)?(\.fas)?(\.fsa)?$/ """ mkdir bowtie2_index - bowtie2-build ${fasta} bowtie2_index/${bwt2_base} + bowtie2-build ${fasta} bowtie2_index/${fasta_base} """ } } @@ -384,14 +349,14 @@ if(!params.chromosome_size && params.fasta){ process makeChromSize { tag "$fasta" label 'process_low' - publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, + publishDir path: { params.save_reference ? "${params.outdir}/reference_genome" : params.outdir }, saveAs: { params.save_reference ? it : null }, mode: params.publish_dir_mode input: file fasta from fasta_for_chromsize output: - file "*.size" into chromosome_size, chromosome_size_cool + file "*.size" into chrsize, chrsize_build, chrsize_raw, chrsize_balance, chrsize_zoom, chrsize_compartments script: """ @@ -426,42 +391,43 @@ if(!params.restriction_fragments && params.fasta && !params.dnase){ */ /* - * STEP 1 - Two-steps Reads Mapping -*/ + * HiC-pro - Two-steps Reads Mapping + */ process bowtie2_end_to_end { - tag "$prefix" + tag "$sample" label 'process_medium' - publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode + publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping/bwt2_end2end" : params.outdir }, + saveAs: { filename -> if (params.save_aligned_intermediates) filename }, mode: params.publish_dir_mode input: set val(sample), file(reads) from raw_reads file index from bwt2_index_end2end.collect() output: - set val(prefix), file("${prefix}_unmap.fastq") into unmapped_end_to_end - set val(prefix), file("${prefix}.bam") into end_to_end_bam + set val(sample), file("${prefix}_unmap.fastq") into unmapped_end_to_end + set val(sample), file("${prefix}.bam") into end_to_end_bam script: prefix = reads.toString() - ~/(\.fq)?(\.fastq)?(\.gz)?$/ def bwt2_opts = params.bwt2_opts_end2end - if (!params.dnase){ """ + INDEX=`find -L ./ -name "*.rev.1.bt2" | sed 's/.rev.1.bt2//'` bowtie2 --rg-id BMG --rg SM:${prefix} \\ ${bwt2_opts} \\ -p ${task.cpus} \\ - -x ${index}/${bwt2_base} \\ + -x \${INDEX} \\ --un ${prefix}_unmap.fastq \\ -U ${reads} | samtools view -F 4 -bS - > ${prefix}.bam """ }else{ """ + INDEX=`find -L ./ -name "*.rev.1.bt2" | sed 's/.rev.1.bt2//'` bowtie2 --rg-id BMG --rg SM:${prefix} \\ ${bwt2_opts} \\ -p ${task.cpus} \\ - -x ${index}/${bwt2_base} \\ + -x \${INDEX} \\ --un ${prefix}_unmap.fastq \\ -U ${reads} > ${prefix}.bam """ @@ -469,21 +435,22 @@ process bowtie2_end_to_end { } process trim_reads { - tag "$prefix" + tag "$sample" label 'process_low' - publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode - + publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping/bwt2_trimmed" : params.outdir }, + saveAs: { filename -> if (params.save_aligned_intermediates) filename }, mode: params.publish_dir_mode + when: !params.dnase input: - set val(prefix), file(reads) from unmapped_end_to_end + set val(sample), file(reads) from unmapped_end_to_end output: - set val(prefix), file("${prefix}_trimmed.fastq") into trimmed_reads + set val(sample), file("${prefix}_trimmed.fastq") into trimmed_reads script: + prefix = reads.toString() - ~/(\.fq)?(\.fastq)?(\.gz)?$/ """ cutsite_trimming --fastq $reads \\ --cutsite ${params.ligation_site} \\ @@ -492,49 +459,51 @@ process trim_reads { } process bowtie2_on_trimmed_reads { - tag "$prefix" + tag "$sample" label 'process_medium' - publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode + publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping/bwt2_trimmed" : params.outdir }, + saveAs: { filename -> if (params.save_aligned_intermediates) filename }, mode: params.publish_dir_mode when: !params.dnase input: - set val(prefix), file(reads) from trimmed_reads + set val(sample), file(reads) from trimmed_reads file index from bwt2_index_trim.collect() output: - set val(prefix), file("${prefix}_trimmed.bam") into trimmed_bam + set val(sample), file("${prefix}_trimmed.bam") into trimmed_bam script: prefix = reads.toString() - ~/(_trimmed)?(\.fq)?(\.fastq)?(\.gz)?$/ """ + INDEX=`find -L ./ -name "*.rev.1.bt2" | sed 's/.rev.1.bt2//'` bowtie2 --rg-id BMG --rg SM:${prefix} \\ ${params.bwt2_opts_trimmed} \\ -p ${task.cpus} \\ - -x ${index}/${bwt2_base} \\ + -x \${INDEX} \\ -U ${reads} | samtools view -bS - > ${prefix}_trimmed.bam """ } if (!params.dnase){ - process merge_mapping_steps{ - tag "$sample = $bam1 + $bam2" + process bowtie2_merge_mapping_steps{ + tag "$prefix = $bam1 + $bam2" label 'process_medium' - publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode + publishDir "${params.outdir}/hicpro/mapping", mode: params.publish_dir_mode, + saveAs: { filename -> if (params.save_aligned_intermediates && filename.endsWith("stat")) "stats/$filename" + else if (params.save_aligned_intermediates) filename} input: - set val(prefix), file(bam1), file(bam2) from end_to_end_bam.join( trimmed_bam ) + set val(prefix), file(bam1), file(bam2) from end_to_end_bam.join( trimmed_bam ).dump(tag:'merge') output: set val(sample), file("${prefix}_bwt2merged.bam") into bwt2_merged_bam set val(oname), file("${prefix}.mapstat") into all_mapstat script: - sample = prefix.toString() - ~/(_R1|_R2|_val_1|_val_2|_1$|_2)/ - tag = prefix.toString() =~/_R1|_val_1|_1/ ? "R1" : "R2" + sample = prefix.toString() - ~/(_R1|_R2)/ + tag = prefix.toString() =~/_R1/ ? "R1" : "R2" oname = prefix.toString() - ~/(\.[0-9]+)$/ """ samtools merge -@ ${task.cpus} \\ @@ -542,7 +511,7 @@ if (!params.dnase){ ${bam1} ${bam2} samtools sort -@ ${task.cpus} -m 800M \\ - -n -T /tmp/ \\ + -n \\ -o ${prefix}_bwt2merged.sorted.bam \\ ${prefix}_bwt2merged.bam @@ -561,46 +530,47 @@ if (!params.dnase){ } }else{ process dnase_mapping_stats{ - tag "$sample = $bam1" + tag "$sample = $bam" label 'process_medium' - publishDir path: { params.save_aligned_intermediates ? "${params.outdir}/mapping" : params.outdir }, - saveAs: { params.save_aligned_intermediates ? it : null }, mode: params.publish_dir_mode + publishDir "${params.outdir}/hicpro/mapping", mode: params.publish_dir_mode, + saveAs: { filename -> if (params.save_aligned_intermediates && filename.endsWith("stat")) "stats/$filename" + else if (params.save_aligned_intermediates) filename} input: - set val(prefix), file(bam1) from end_to_end_bam + set val(prefix), file(bam) from end_to_end_bam output: - set val(sample), file(bam1) into bwt2_merged_bam + set val(sample), file(bam) into bwt2_merged_bam set val(oname), file("${prefix}.mapstat") into all_mapstat script: - sample = prefix.toString() - ~/(_R1|_R2|_val_1|_val_2|_1|_2)/ - tag = prefix.toString() =~/_R1|_val_1|_1/ ? "R1" : "R2" + sample = prefix.toString() - ~/(_R1|_R2)/ + tag = prefix.toString() =~/_R1/ ? "R1" : "R2" oname = prefix.toString() - ~/(\.[0-9]+)$/ """ echo "## ${prefix}" > ${prefix}.mapstat echo -n "total_${tag}\t" >> ${prefix}.mapstat - samtools view -c ${bam1} >> ${prefix}.mapstat + samtools view -c ${bam} >> ${prefix}.mapstat echo -n "mapped_${tag}\t" >> ${prefix}.mapstat - samtools view -c -F 4 ${bam1} >> ${prefix}.mapstat + samtools view -c -F 4 ${bam} >> ${prefix}.mapstat echo -n "global_${tag}\t" >> ${prefix}.mapstat - samtools view -c -F 4 ${bam1} >> ${prefix}.mapstat + samtools view -c -F 4 ${bam} >> ${prefix}.mapstat echo -n "local_${tag}\t0" >> ${prefix}.mapstat """ } } -process combine_mapped_files{ +process combine_mates{ tag "$sample = $r1_prefix + $r2_prefix" label 'process_low' - publishDir "${params.outdir}/mapping", mode: params.publish_dir_mode, - saveAs: {filename -> filename.indexOf(".pairstat") > 0 ? "stats/$filename" : "$filename"} + publishDir "${params.outdir}/hicpro/mapping", mode: params.publish_dir_mode, + saveAs: {filename -> filename.endsWith(".pairstat") ? "stats/$filename" : "$filename"} input: set val(sample), file(aligned_bam) from bwt2_merged_bam.groupTuple() output: - set val(sample), file("${sample}_bwt2pairs.bam") into paired_bam + set val(oname), file("${sample}_bwt2pairs.bam") into paired_bam set val(oname), file("*.pairstat") into all_pairstat script: @@ -611,25 +581,28 @@ process combine_mapped_files{ oname = sample.toString() - ~/(\.[0-9]+)$/ def opts = "-t" - opts = params.rm_singleton ? "${opts}" : "--single ${opts}" - opts = params.rm_multi ? "${opts}" : "--multi ${opts}" - if ("$params.min_mapq".isInteger()) opts="${opts} -q ${params.min_mapq}" + if (params.keep_multi) { + opts="${opts} --multi" + }else if (params.min_mapq){ + opts="${opts} -q ${params.min_mapq}" + } """ mergeSAM.py -f ${r1_bam} -r ${r2_bam} -o ${sample}_bwt2pairs.bam ${opts} """ } - /* - * STEP2 - DETECT VALID PAIRS -*/ + * HiC-Pro - detect valid interaction from aligned data + */ if (!params.dnase){ process get_valid_interaction{ tag "$sample" label 'process_low' - publishDir "${params.outdir}/hic_results/data", mode: params.publish_dir_mode, - saveAs: {filename -> filename.indexOf("*stat") > 0 ? "stats/$filename" : "$filename"} + publishDir "${params.outdir}/hicpro/valid_pairs", mode: params.publish_dir_mode, + saveAs: {filename -> if (filename.endsWith("RSstat")) "stats/$filename" + else if (filename.endsWith(".validPairs")) filename + else if (params.save_nonvalid_pairs) filename} input: set val(sample), file(pe_bam) from paired_bam @@ -659,7 +632,7 @@ if (!params.dnase){ prefix = pe_bam.toString() - ~/.bam/ """ mapped_2hic_fragments.py -f ${frag_file} -r ${pe_bam} --all ${opts} - sort -T /tmp/ -k2,2V -k3,3n -k5,5V -k6,6n -o ${prefix}.validPairs ${prefix}.validPairs + sort -k2,2V -k3,3n -k5,5V -k6,6n -o ${prefix}.validPairs ${prefix}.validPairs """ } } @@ -667,8 +640,9 @@ else{ process get_valid_interaction_dnase{ tag "$sample" label 'process_low' - publishDir "${params.outdir}/hic_results/data", mode: params.publish_dir_mode, - saveAs: {filename -> filename.indexOf("*stat") > 0 ? "stats/$filename" : "$filename"} + publishDir "${params.outdir}/hicpro/valid_pairs", mode: params.publish_dir_mode, + saveAs: {filename -> if (filename.endsWith("RSstat")) "stats/$filename" + else filename} input: set val(sample), file(pe_bam) from paired_bam @@ -687,73 +661,80 @@ else{ prefix = pe_bam.toString() - ~/.bam/ """ mapped_2hic_dnase.py -r ${pe_bam} ${opts} - sort -T /tmp/ -k2,2V -k3,3n -k5,5V -k6,6n -o ${prefix}.validPairs ${prefix}.validPairs + sort -k2,2V -k3,3n -k5,5V -k6,6n -o ${prefix}.validPairs ${prefix}.validPairs """ } } - /* - * STEP3 - BUILD MATRIX -*/ + * Remove duplicates + */ process remove_duplicates { tag "$sample" label 'process_highmem' - publishDir "${params.outdir}/hic_results/data", mode: params.publish_dir_mode, - saveAs: {filename -> filename.indexOf("*stat") > 0 ? "stats/$sample/$filename" : "$filename"} - + publishDir "${params.outdir}/hicpro/valid_pairs", mode: params.publish_dir_mode, + saveAs: {filename -> if (filename.endsWith("mergestat")) "stats/$filename" + else if (filename.endsWith("allValidPairs")) "$filename"} input: set val(sample), file(vpairs) from valid_pairs.groupTuple() output: - set val(sample), file("*.allValidPairs") into all_valid_pairs - set val(sample), file("*.allValidPairs") into all_valid_pairs_4cool - file("stats/") into all_mergestat + set val(sample), file("*.allValidPairs") into ch_vpairs, ch_vpairs_cool + file("stats/") into mqc_mergestat + file("*mergestat") into all_mergestat script: - if ( params.rm_dup ){ + if ( ! params.keep_dups ){ """ mkdir -p stats/${sample} ## Sort valid pairs and remove read pairs with same starts (i.e duplicated read pairs) - sort -T /tmp/ -S 50% -k2,2V -k3,3n -k5,5V -k6,6n -m ${vpairs} | \ + sort -S 50% -k2,2V -k3,3n -k5,5V -k6,6n -m ${vpairs} | \\ awk -F"\\t" 'BEGIN{c1=0;c2=0;s1=0;s2=0}(c1!=\$2 || c2!=\$5 || s1!=\$3 || s2!=\$6){print;c1=\$2;c2=\$5;s1=\$3;s2=\$6}' > ${sample}.allValidPairs - echo -n "valid_interaction\t" > stats/${sample}/${sample}_allValidPairs.mergestat - cat ${vpairs} | wc -l >> stats/${sample}/${sample}_allValidPairs.mergestat - echo -n "valid_interaction_rmdup\t" >> stats/${sample}/${sample}_allValidPairs.mergestat - cat ${sample}.allValidPairs | wc -l >> stats/${sample}/${sample}_allValidPairs.mergestat + echo -n "valid_interaction\t" > ${sample}_allValidPairs.mergestat + cat ${vpairs} | wc -l >> ${sample}_allValidPairs.mergestat + echo -n "valid_interaction_rmdup\t" >> ${sample}_allValidPairs.mergestat + cat ${sample}.allValidPairs | wc -l >> ${sample}_allValidPairs.mergestat ## Count short range (<20000) vs long range contacts - awk 'BEGIN{cis=0;trans=0;sr=0;lr=0} \$2 == \$5{cis=cis+1; d=\$6>\$3?\$6-\$3:\$3-\$6; if (d<=20000){sr=sr+1}else{lr=lr+1}} \$2!=\$5{trans=trans+1}END{print "trans_interaction\\t"trans"\\ncis_interaction\\t"cis"\\ncis_shortRange\\t"sr"\\ncis_longRange\\t"lr}' ${sample}.allValidPairs >> stats/${sample}/${sample}_allValidPairs.mergestat - + awk 'BEGIN{cis=0;trans=0;sr=0;lr=0} \$2 == \$5{cis=cis+1; d=\$6>\$3?\$6-\$3:\$3-\$6; if (d<=20000){sr=sr+1}else{lr=lr+1}} \$2!=\$5{trans=trans+1}END{print "trans_interaction\\t"trans"\\ncis_interaction\\t"cis"\\ncis_shortRange\\t"sr"\\ncis_longRange\\t"lr}' ${sample}.allValidPairs >> ${sample}_allValidPairs.mergestat + + ## For MultiQC + mkdir -p stats/${sample} + cp ${sample}_allValidPairs.mergestat stats/${sample}/ """ }else{ """ - mkdir -p stats/${sample} cat ${vpairs} > ${sample}.allValidPairs - echo -n "valid_interaction\t" > stats/${sample}/${sample}_allValidPairs.mergestat - cat ${vpairs} | wc -l >> stats/${sample}/${sample}_allValidPairs.mergestat - echo -n "valid_interaction_rmdup\t" >> stats/${sample}/${sample}_allValidPairs.mergestat - cat ${sample}.allValidPairs | wc -l >> stats/${sample}/${sample}_allValidPairs.mergestat + echo -n "valid_interaction\t" > ${sample}_allValidPairs.mergestat + cat ${vpairs} | wc -l >> ${sample}_allValidPairs.mergestat + echo -n "valid_interaction_rmdup\t" >> ${sample}_allValidPairs.mergestat + cat ${sample}.allValidPairs | wc -l >> ${sample}_allValidPairs.mergestat - ## Count short range (<20000) vs long range contacts - awk 'BEGIN{cis=0;trans=0;sr=0;lr=0} \$2 == \$5{cis=cis+1; d=\$6>\$3?\$6-\$3:\$3-\$6; if (d<=20000){sr=sr+1}else{lr=lr+1}} \$2!=\$5{trans=trans+1}END{print "trans_interaction\\t"trans"\\ncis_interaction\\t"cis"\\ncis_shortRange\\t"sr"\\ncis_longRange\\t"lr}' ${sample}.allValidPairs >> stats/${sample}/${sample}_allValidPairs.mergestat + ## Count short range (<20000) vs long range contacts + awk 'BEGIN{cis=0;trans=0;sr=0;lr=0} \$2 == \$5{cis=cis+1; d=\$6>\$3?\$6-\$3:\$3-\$6; if (d<=20000){sr=sr+1}else{lr=lr+1}} \$2!=\$5{trans=trans+1}END{print "trans_interaction\\t"trans"\\ncis_interaction\\t"cis"\\ncis_shortRange\\t"sr"\\ncis_longRange\\t"lr}' ${sample}.allValidPairs >> ${sample}_allValidPairs.mergestat + + ## For MultiQC + mkdir -p stats/${sample} + cp ${sample}_allValidPairs.mergestat stats/${sample}/ """ } } -process merge_sample { +process merge_stats { tag "$ext" label 'process_low' - publishDir "${params.outdir}/hic_results/stats/${sample}", mode: params.publish_dir_mode + publishDir "${params.outdir}/hicpro/", mode: params.publish_dir_mode, + saveAs: {filename -> if (filename.endsWith("stat")) "stats/$filename"} input: set val(prefix), file(fstat) from all_mapstat.groupTuple().concat(all_pairstat.groupTuple(), all_rsstat.groupTuple()) output: - file("mstats/") into all_mstats + file("stats/") into mqc_mstats + file("*stat") into all_mstats script: sample = prefix.toString() - ~/(_R1|_R2|_val_1|_val_2|_1|_2)/ @@ -761,91 +742,305 @@ process merge_sample { if ( (fstat =~ /.pairstat/) ){ ext = "mpairstat" } if ( (fstat =~ /.RSstat/) ){ ext = "mRSstat" } """ - mkdir -p mstats/${sample} - merge_statfiles.py -f ${fstat} > mstats/${sample}/${prefix}.${ext} + merge_statfiles.py -f ${fstat} > ${prefix}.${ext} + mkdir -p stats/${sample} + cp ${prefix}.${ext} stats/${sample}/ """ } +/* + * HiC-Pro build matrix processes + * kept for backward compatibility + */ + + process build_contact_maps{ tag "$sample - $mres" label 'process_highmem' - publishDir "${params.outdir}/hic_results/matrix/raw", mode: params.publish_dir_mode + publishDir "${params.outdir}/hicpro/matrix/raw", mode: params.publish_dir_mode when: - !params.skip_maps + !params.skip_maps && params.hicpro_maps input: - set val(sample), file(vpairs), val(mres) from all_valid_pairs.combine(map_res) - file chrsize from chromosome_size.collect() + set val(sample), file(vpairs), val(mres) from ch_vpairs.combine(map_res) + file chrsize from chrsize.collect() output: - file("*.matrix") into raw_maps - file "*.bed" - + set val(sample), val(mres), file("*.matrix"), file("*.bed") into raw_maps, raw_maps_4cool + script: """ build_matrix --matrix-format upper --binsize ${mres} --chrsizes ${chrsize} --ifile ${vpairs} --oprefix ${sample}_${mres} """ } -/* - * STEP 4 - NORMALIZE MATRIX -*/ - process run_ice{ tag "$rmaps" label 'process_highmem' - publishDir "${params.outdir}/hic_results/matrix/iced", mode: params.publish_dir_mode + publishDir "${params.outdir}/hicpro/matrix/iced", mode: params.publish_dir_mode when: - !params.skip_maps && !params.skip_ice + !params.skip_maps && !params.skip_balancing && params.hicpro_maps input: - file(rmaps) from raw_maps - file "*.biases" + set val(sample), val(res), file(rmaps), file(bed) from raw_maps output: - file("*iced.matrix") into iced_maps + set val(sample), val(res), file("*iced.matrix"), file(bed) into hicpro_iced_maps + file ("*.biases") into hicpro_iced_bias script: prefix = rmaps.toString() - ~/(\.matrix)?$/ """ - ice --filter_low_counts_perc ${params.ice_filer_low_count_perc} \ + ice --filter_low_counts_perc ${params.ice_filter_low_count_perc} \ --results_filename ${prefix}_iced.matrix \ - --filter_high_counts_perc ${params.ice_filer_high_count_perc} \ + --filter_high_counts_perc ${params.ice_filter_high_count_perc} \ --max_iter ${params.ice_max_iter} --eps ${params.ice_eps} --remove-all-zeros-loci --output-bias 1 --verbose 1 ${rmaps} """ } /* - * STEP 5 - COOLER FILE + * Cooler */ -process generate_cool{ + +process convert_to_pairs { tag "$sample" label 'process_medium' - publishDir "${params.outdir}/export/cool", mode: params.publish_dir_mode when: - !params.skip_cool + !params.skip_maps input: - set val(sample), file(vpairs) from all_valid_pairs_4cool - file chrsize from chromosome_size_cool.collect() + set val(sample), file(vpairs) from ch_vpairs_cool + file chrsize from chrsize_build.collect() output: - file("*mcool") into cool_maps + set val(sample), file("*.txt.gz") into cool_build, cool_build_zoom script: """ - hicpro2higlass.sh -p ${task.cpus} -i $vpairs -r 5000 -c ${chrsize} -n + ## chr/pos/strand/chr/pos/strand + awk '{OFS="\t";print \$1,\$2,\$3,\$5,\$6,\$4,\$7}' $vpairs > contacts.txt + gzip contacts.txt """ } +process cooler_raw { + tag "$sample - ${res}" + label 'process_medium' + + publishDir "${params.outdir}/contact_maps/", mode: 'copy', + saveAs: {filename -> filename.endsWith(".cool") ? "raw/cool/$filename" : "raw/txt/$filename"} + + input: + set val(sample), file(contacts), val(res) from cool_build.combine(map_res_cool) + file chrsize from chrsize_raw.collect() + + output: + set val(sample), val(res), file("*cool") into raw_cool_maps + set file("*.bed"), file("${sample}_${res}.txt") into raw_txt_maps + + script: + """ + cooler makebins ${chrsize} ${res} > ${sample}_${res}.bed + cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 ${sample}_${res}.bed ${contacts} ${sample}_${res}.cool + cooler dump ${sample}_${res}.cool | awk '{OFS="\t"; print \$1+1,\$2+1,\$3}' > ${sample}_${res}.txt + """ +} + +process cooler_balance { + tag "$sample - ${res}" + label 'process_medium' + + publishDir "${params.outdir}/contact_maps/", mode: 'copy', + saveAs: {filename -> filename.endsWith(".cool") ? "norm/cool/$filename" : "norm/txt/$filename"} + + when: + !params.skip_balancing + + input: + set val(sample), val(res), file(cool) from raw_cool_maps + file chrsize from chrsize_balance.collect() + + output: + set val(sample), val(res), file("${sample}_${res}_norm.cool") into balanced_cool_maps + file("${sample}_${res}_norm.txt") into norm_txt_maps + + script: + """ + cp ${cool} ${sample}_${res}_norm.cool + cooler balance ${sample}_${res}_norm.cool -p ${task.cpus} --force + cooler dump ${sample}_${res}_norm.cool --balanced --na-rep 0 | awk '{OFS="\t"; print \$1+1,\$2+1,\$4}' > ${sample}_${res}_norm.txt + """ +} + +process cooler_zoomify { + tag "$sample" + label 'process_medium' + publishDir "${params.outdir}/contact_maps/norm/mcool", mode: 'copy' + + when: + !params.skip_mcool + + input: + set val(sample), file(contacts) from cool_build_zoom + file chrsize from chrsize_zoom.collect() + + output: + file("*mcool") into mcool_maps + + script: + """ + cooler makebins ${chrsize} ${params.res_zoomify} > bins.bed + cooler cload pairs -c1 2 -p1 3 -c2 4 -p2 5 bins.bed ${contacts} ${sample}.cool + cooler zoomify --nproc ${task.cpus} --balance ${sample}.cool + """ +} + + +/**************************************************** + * DOWNSTREAM ANALYSIS + */ + +(maps_cool_insulation, maps_cool_comp, maps_hicexplorer_ddecay, maps_hicexplorer_tads) = balanced_cool_maps.into(4) + +/* + * Counts vs distance QC + */ + +if (!params.skip_dist_decay){ + chddecay = maps_hicexplorer_ddecay.combine(ddecay_res).filter{ it[1] == it[3] }.dump(tag: "ddecay") +}else{ + chddecay = Channel.empty() +} + +process dist_decay { + tag "$sample" + label 'process_medium' + publishDir "${params.outdir}/dist_decay", mode: 'copy' + + when: + !params.skip_dist_decay + + input: + set val(sample), val(res), file(maps), val(r) from chddecay + + output: + file("*_distcount.txt") + file("*.png") + + + script: + """ + hicPlotDistVsCounts --matrices ${maps} \ + --plotFile ${maps.baseName}_distcount.png \ + --outFileData ${maps.baseName}_distcount.txt + """ +} + +/* + * Compartment calling + */ + +if(!params.skip_compartments){ + chcomp = maps_cool_comp.combine(comp_res).filter{ it[1] == it[3] }.dump(tag: "comp") +}else{ + chcomp = Channel.empty() +} + +process compartment_calling { + tag "$sample - $res" + label 'process_medium' + publishDir "${params.outdir}/compartments", mode: 'copy' + + when: + !params.skip_compartments + + input: + set val(sample), val(res), file(cool), val(r) from chcomp + file(fasta) from fasta_for_compartments.collect() + file(chrsize) from chrsize_compartments.collect() + + output: + file("*compartments*") optional true into out_compartments + + script: + """ + cooltools genome binnify --all-names ${chrsize} ${res} > genome_bins.txt + cooltools genome gc genome_bins.txt ${fasta} > genome_gc.txt + cooltools call-compartments --contact-type cis -o ${sample}_compartments ${cool} + awk -F"\t" 'NR>1{OFS="\t"; if(\$6==""){\$6=0}; print \$1,\$2,\$3,\$6}' ${sample}_compartments.cis.vecs.tsv | sort -k1,1 -k2,2n > ${sample}_compartments.cis.E1.bedgraph + """ +} + + /* - * STEP 6 - MultiQC + * TADs calling */ + +if (!params.skip_tads){ + chtads = maps_hicexplorer_tads.combine(tads_res_hicexplorer).filter{ it[1] == it[3] }.dump(tag: "hicexp") +}else{ + chtads = Channel.empty() +} + +process tads_hicexplorer { + tag "$sample - $res" + label 'process_medium' + publishDir "${params.outdir}/tads/hicexplorer", mode: 'copy' + + when: + !params.skip_tads && params.tads_caller =~ 'hicexplorer' + + input: + set val(sample), val(res), file(cool), val(r) from chtads + + output: + file("*.{bed,bedgraph,gff}") into hicexplorer_tads + + script: + """ + hicFindTADs --matrix ${cool} \ + --outPrefix tad \ + --correctForMultipleTesting fdr \ + --numberOfProcessors ${task.cpus} + """ +} + +if (!params.skip_tads){ + chIS = maps_cool_insulation.combine(tads_res_insulation).filter{ it[1] == it[3] }.dump(tag : "ins") +}else{ + chIS = Channel.empty() +} + +process tads_insulation { + tag "$sample - $res" + label 'process_medium' + publishDir "${params.outdir}/tads/insulation", mode: 'copy' + + when: + !params.skip_tads && params.tads_caller =~ 'insulation' + + input: + set val(sample), val(res), file(cool), val(r) from chIS + + output: + file("*tsv") into insulation_tads + + script: + """ + cooltools diamond-insulation --window-pixels ${cool} 15 25 50 > ${sample}_insulation.tsv + """ +} + + +/* + * MultiQC + */ + process multiqc { label 'process_low' publishDir "${params.outdir}/MultiQC", mode: params.publish_dir_mode @@ -856,24 +1051,29 @@ process multiqc { input: file multiqc_config from ch_multiqc_config file (mqc_custom_config) from ch_multiqc_custom_config.collect().ifEmpty([]) - file ('input_*/*') from all_mstats.concat(all_mergestat).collect() - file ('software_versions/*') from software_versions_yaml - file workflow_summary from create_workflow_summary(summary) + file ('input_*/*') from mqc_mstats.concat(mqc_mergestat).collect() + file ('software_versions/*') from ch_software_versions_yaml + file workflow_summary from ch_workflow_summary.collect() output: file "*multiqc_report.html" into multiqc_report file "*_data" script: - rtitle = custom_runName ? "--title \"$custom_runName\"" : '' - rfilename = custom_runName ? "--filename " + custom_runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" : '' + rtitle = '' + rfilename = '' + if (!(workflow.runName ==~ /[a-z]+_[a-z]+/)) { + rtitle = "--title \"${workflow.runName}\"" + rfilename = "--filename " + workflow.runName.replaceAll('\\W','_').replaceAll('_+','_') + "_multiqc_report" + } + custom_config_file = params.multiqc_config ? "--config $mqc_custom_config" : '' """ - multiqc -f $rtitle $rfilename --config $multiqc_config . + multiqc -f $rtitle $rfilename $custom_config_file . """ } /* - * STEP 7 - Output Description HTML + * Output Description HTML */ process output_documentation { publishDir "${params.outdir}/pipeline_info", mode: params.publish_dir_mode @@ -882,13 +1082,13 @@ process output_documentation { file output_docs from ch_output_docs file images from ch_output_docs_images - output: - file "results_description.html" + output: + file 'results_description.html' - script: - """ - markdown_to_html.py $output_docs -o results_description.html - """ + script: + """ + markdown_to_html.py $output_docs -o results_description.html + """ } /* @@ -904,7 +1104,7 @@ workflow.onComplete { } def email_fields = [:] email_fields['version'] = workflow.manifest.version - email_fields['runName'] = custom_runName ?: workflow.runName + email_fields['runName'] = workflow.runName email_fields['success'] = workflow.success email_fields['dateComplete'] = workflow.complete email_fields['duration'] = workflow.duration @@ -925,7 +1125,6 @@ workflow.onComplete { email_fields['summary']['Nextflow Build'] = workflow.nextflow.build email_fields['summary']['Nextflow Compile Timestamp'] = workflow.nextflow.timestamp - // If not using MultiQC, strip out this code (including params.maxMultiqcEmailFileSize) // On success try attach the multiqc report def mqc_report = null try { @@ -948,18 +1147,18 @@ workflow.onComplete { // Render the TXT template def engine = new groovy.text.GStringTemplateEngine() - def tf = new File("$baseDir/assets/email_template.txt") + def tf = new File("$projectDir/assets/email_template.txt") def txt_template = engine.createTemplate(tf).make(email_fields) def email_txt = txt_template.toString() // Render the HTML template - def hf = new File("$baseDir/assets/email_template.html") + def hf = new File("$projectDir/assets/email_template.html") def html_template = engine.createTemplate(hf).make(email_fields) def email_html = html_template.toString() // Render the sendmail template - def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, baseDir: "$baseDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] - def sf = new File("$baseDir/assets/sendmail_template.txt") + def smail_fields = [ email: email_address, subject: subject, email_txt: email_txt, email_html: email_html, projectDir: "$projectDir", mqcFile: mqc_report, mqcMaxSize: params.max_multiqc_email_size.toBytes() ] + def sf = new File("$projectDir/assets/sendmail_template.txt") def sendmail_template = engine.createTemplate(sf).make(smail_fields) def sendmail_html = sendmail_template.toString() @@ -1008,31 +1207,11 @@ workflow.onComplete { checkHostname() log.info "-${c_purple}[nf-core/hic]${c_red} Pipeline completed with errors${c_reset}-" } - } - -def nfcoreHeader() { - // Log colors ANSI codes - c_black = params.monochrome_logs ? '' : "\033[0;30m"; - c_blue = params.monochrome_logs ? '' : "\033[0;34m"; - c_cyan = params.monochrome_logs ? '' : "\033[0;36m"; - c_dim = params.monochrome_logs ? '' : "\033[2m"; - c_green = params.monochrome_logs ? '' : "\033[0;32m"; - c_purple = params.monochrome_logs ? '' : "\033[0;35m"; - c_reset = params.monochrome_logs ? '' : "\033[0m"; - c_white = params.monochrome_logs ? '' : "\033[0;37m"; - c_yellow = params.monochrome_logs ? '' : "\033[0;33m"; - - return """ -${c_dim}--------------------------------------------------${c_reset}- - ${c_green},--.${c_black}/${c_green},-.${c_reset} - ${c_blue} ___ __ __ __ ___ ${c_green}/,-._.--~\'${c_reset} - ${c_blue} |\\ | |__ __ / ` / \\ |__) |__ ${c_yellow}} {${c_reset} - ${c_blue} | \\| | \\__, \\__/ | \\ |___ ${c_green}\\`-._,-`-,${c_reset} - ${c_green}`._,._,\'${c_reset} - ${c_purple} nf-core/hic v${workflow.manifest.version}${c_reset} - -${c_dim}--------------------------------------------------${c_reset}- - """.stripIndent() +workflow.onError { + // Print unexpected parameters - easiest is to just rerun validation + NfcoreSchema.validateParameters(params, json_schema, log) } def checkHostname() { @@ -1041,15 +1220,15 @@ def checkHostname() { def c_red = params.monochrome_logs ? '' : "\033[1;91m" def c_yellow_bold = params.monochrome_logs ? '' : "\033[1;93m" if (params.hostnames) { - def hostname = "hostname".execute().text.trim() + def hostname = 'hostname'.execute().text.trim() params.hostnames.each { prof, hnames -> hnames.each { hname -> if (hostname.contains(hname) && !workflow.profile.contains(prof)) { - log.error "====================================================\n" + + log.error "${c_red}====================================================${c_reset}\n" + " ${c_red}WARNING!${c_reset} You are running with `-profile $workflow.profile`\n" + " but your machine hostname is ${c_white}'$hostname'${c_reset}\n" + " ${c_yellow_bold}It's highly recommended that you use `-profile $prof${c_reset}`\n" + - "============================================================" + "${c_red}====================================================${c_reset}\n" } } } diff --git a/nextflow.config b/nextflow.config index c765a4a..7296cc2 100644 --- a/nextflow.config +++ b/nextflow.config @@ -7,70 +7,107 @@ // Global default params, used in configs params { - - // Workflow flags + // Inputs / outputs genome = false - input = "data/*{1,2}.fastq.gz" - single_end = false - + input = null + input_paths = null outdir = './results' genome = false input_paths = false - split_fastq = false - fastq_chunks_size = 20000000 chromosome_size = false restriction_fragments = false - skip_maps = false - skip_ice = false - skip_cool = false - skip_multiqc = false save_reference = false + + // Mapping + split_fastq = false + fastq_chunks_size = 20000000 save_interaction_bam = false save_aligned_intermediates = false - bwt2_opts_end2end = '--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder' bwt2_opts_trimmed = '--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder' + keep_dups = false + keep_multi = false min_mapq = 10 + // Digestion Hi-C - restriction_site = 'A^AGCTT' - ligation_site = 'AAGCTAGCTT' + digestion = false + digest { + 'hindiii'{ + restriction_site='A^AGCTT' + ligation_site='AAGCTAGCTT' + } + 'mboi' { + restriction_site='^GATC' + ligation_site='GATCGATC' + } + 'dpnii' { + restriction_site='^GATC' + ligation_site='GATCGATC' + } + 'arima' { + restriction_site='^GATC,G^ANT' + ligation_site='GATCGATC,GATCGANT,GANTGATC,GANTGANT' + } + } min_restriction_fragment_size = 0 max_restriction_fragment_size = 0 min_insert_size = 0 max_insert_size = 0 + save_nonvalid_pairs = false + + // Dnase Hi-C dnase = false min_cis_dist = 0 - rm_dup = true - rm_singleton = true - rm_multi = true - bin_size = '1000000,500000' + + // Contact maps + bin_size = '1000000' + res_zoomify = '5000' + hicpro_maps = false ice_max_iter = 100 - ice_filer_low_count_perc = 0.02 - ice_filer_high_count_perc = 0 + ice_filter_low_count_perc = 0.02 + ice_filter_high_count_perc = 0 ice_eps = 0.1 - - publish_dir_mode = 'copy' + // Downstream Analysis + res_dist_decay = '250000' + tads_caller = 'insulation' + res_tads = '40000' + res_compartments = '250000' + + // Workflow + skip_maps = false + skip_balancing = false + skip_mcool = false + skip_dist_decay = false + skip_compartments = false + skip_tads = false + skip_multiqc = false + // Boilerplate options + publish_dir_mode = 'copy' multiqc_config = false - name = false email = false email_on_fail = false max_multiqc_email_size = 25.MB plaintext_email = false monochrome_logs = false help = false - igenomes_base = 's3://ngi-igenomes/igenomes/' + igenomes_base = 's3://ngi-igenomes/igenomes' tracedir = "${params.outdir}/pipeline_info" igenomes_ignore = false + //Config custom_config_version = 'master' custom_config_base = "https://raw.githubusercontent.com/nf-core/configs/${params.custom_config_version}" hostnames = false + config_profile_name = null config_profile_description = false config_profile_contact = false config_profile_url = false + validate_params = true + show_hidden_params = false + schema_ignore_params = 'genomes,digest,input_paths' // Defaults only, expecting to be overwritten max_memory = 24.GB @@ -80,7 +117,7 @@ params { // Container slug. Stable releases should specify release tag! // Developmental code should specify :dev -process.container = 'nfcore/hic:1.2.2' +process.container = 'nfcore/hic:1.3.0' // Load base.config by default for all pipelines includeConfig 'conf/base.config' @@ -94,10 +131,21 @@ try { // Create profiles profiles { - conda { process.conda = "$baseDir/environment.yml" } + conda { + docker.enabled = false + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false + process.conda = "$projectDir/environment.yml" + } debug { process.beforeScript = 'echo $HOSTNAME' } docker { docker.enabled = true + singularity.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false // Avoid this error: // WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. // Testing this in nf-core after discussion here https://github.com/nf-core/tools/pull/351 @@ -105,10 +153,36 @@ profiles { docker.runOptions = '-u \$(id -u):\$(id -g)' } singularity { + docker.enabled = false singularity.enabled = true + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = false singularity.autoMounts = true } + podman { + singularity.enabled = false + docker.enabled = false + podman.enabled = true + shifter.enabled = false + charliecloud.enabled = false + } + shifter { + singularity.enabled = false + docker.enabled = false + podman.enabled = false + shifter.enabled = true + charliecloud.enabled = false + } + charliecloud { + singularity.enabled = false + docker.enabled = false + podman.enabled = false + shifter.enabled = false + charliecloud.enabled = true + } test { includeConfig 'conf/test.config' } + test_full { includeConfig 'conf/test_full.config' } } // Load igenomes.config if required @@ -126,21 +200,22 @@ env { // Capture exit codes from upstream processes when piping process.shell = ['/bin/bash', '-euo', 'pipefail'] +def trace_timestamp = new java.util.Date().format( 'yyyy-MM-dd_HH-mm-ss') timeline { enabled = true - file = "${params.tracedir}/execution_timeline.html" + file = "${params.tracedir}/execution_timeline_${trace_timestamp}.html" } report { enabled = true - file = "${params.tracedir}/execution_report.html" + file = "${params.tracedir}/execution_report_${trace_timestamp}.html" } trace { enabled = true - file = "${params.tracedir}/execution_trace.txt" + file = "${params.tracedir}/execution_trace_${trace_timestamp}.txt" } dag { enabled = true - file = "${params.tracedir}/pipeline_dag.svg" + file = "${params.tracedir}/pipeline_dag_${trace_timestamp}.svg" } manifest { @@ -149,8 +224,8 @@ manifest { homePage = 'https://github.com/nf-core/hic' description = 'Analysis of Chromosome Conformation Capture data (Hi-C)' mainScript = 'main.nf' - nextflowVersion = '>=19.10.0' - version = '1.2.2' + nextflowVersion = '>=20.04.0' + version = '1.3.0' } // Function to ensure that resource requirements don't go beyond @@ -184,4 +259,4 @@ def check_max(obj, type) { return obj } } -} +} \ No newline at end of file diff --git a/nextflow_schema.json b/nextflow_schema.json index 9071bd2..7fe34b7 100644 --- a/nextflow_schema.json +++ b/nextflow_schema.json @@ -1,5 +1,5 @@ { - "$schema": "https://json-schema.org/draft-07/schema", + "$schema": "http://json-schema.org/draft-07/schema", "$id": "https://raw.githubusercontent.com/nf-core/hic/master/nextflow_schema.json", "title": "nf-core/hic pipeline parameters", "description": "Analysis of Chromosome Conformation Capture data (Hi-C)", @@ -26,23 +26,6 @@ "description": "Input FastQ files for test only", "default": "undefined" }, - "split_fastq": { - "type": "boolean", - "description": "Split the reads into chunks before running the pipelne", - "fa_icon": "fas fa-dna", - "default": "false" - }, - "fastq_chunks_size":{ - "type": "integer", - "description": "Read number per chunks if split_fastq is used", - "default": "20000000" - }, - "single_end": { - "type": "boolean", - "description": "Specifies that the input is single-end reads.", - "fa_icon": "fas fa-align-center", - "help_text": "By default, the pipeline expects paired-end data. If you have single-end data, you need to specify `--single_end` on the command line when you launch the pipeline. A normal glob pattern, enclosed in quotation marks, can then be used for `--input`. For example:\n\n```bash\n--single_end --input '*.fastq'\n```\n\nIt is not possible to run a mixture of single-end and paired-end files in one run." - }, "outdir": { "type": "string", "description": "The output directory where the results will be saved.", @@ -79,7 +62,7 @@ "igenomes_base": { "type": "string", "description": "Directory / URL base for iGenomes references.", - "default": "s3://ngi-igenomes/igenomes/", + "default": "s3://ngi-igenomes/igenomes", "fa_icon": "fas fa-cloud-download-alt", "hidden": true }, @@ -94,6 +77,29 @@ "type": "string", "description": "Full path to directory containing Bowtie index including base name. i.e. `/path/to/index/base`.", "fa_icon": "far fa-file-alt" + } + } + }, + "digestion_hi_c": { + "title": "Digestion Hi-C", + "type": "object", + "description": "Parameters for protocols based on restriction enzyme", + "default": "", + "properties": { + "digestion": { + "type": "string", + "default": "hindiii", + "description": "Name of restriction enzyme to automatically set the restriction_site and ligation_site options" + }, + "restriction_site": { + "type": "string", + "default": "'A^AGCTT'", + "description": "Restriction motifs used during digestion. Several motifs (comma separated) can be provided." + }, + "ligation_site": { + "type": "string", + "default": "'AAGCTAGCTT", + "description": "Expected motif after DNA ligation. Several motifs (comma separated) can be provided." }, "chromosome_size": { "type": "string", @@ -112,63 +118,62 @@ "description": "If generated by the pipeline save the annotation and indexes in the results directory.", "help_text": "Use this parameter to save all annotations to your results folder. These can then be used for future pipeline runs, reducing processing times.", "fa_icon": "fas fa-save" + }, + "save_nonvalid_pairs": { + "type": "boolean", + "description": "Save the non valid pairs detected by HiC-Pro.", + "help_text": "Use this parameter to save non valid pairs detected by HiC-Pro (dangling-end, self-circle, re-ligation, filtered).", + "fa_icon": "fas fa-save" } } }, - "data_processing_options": { - "title": "Data processing", + "dnase_hi_c": { + "title": "DNAse Hi-C", "type": "object", - "description": "Parameters for Hi-C data processing", + "description": "Parameters for protocols based on DNAse digestion", "default": "", - "fa_icon": "fas fa-bahai", "properties": { "dnase": { "type": "boolean", "description": "For Hi-C protocols which are not based on enzyme digestion such as DNase Hi-C" }, - "restriction_site": { - "type": "string", - "default": "'A^AGCTT'", - "description": "Restriction motifs used during digestion. Several motifs (comma separated) can be provided." - }, - "ligation_site": { - "type": "string", - "default": "'AAGCTAGCTT", - "description": "Expected motif after DNA ligation. Several motifs (comma separated) can be provided." - }, - "rm_dup": { - "type": "boolean", - "description": "Remove duplicates", - "default": true - }, - "rm_multi": { + "min_cis_dist": { + "type": "integer", + "description": "Minimum distance between loci to consider. Useful for --dnase mode to remove spurious ligation products. Only values > 0 are considered" + } + } + }, + "alignments": { + "title": "Alignments", + "type": "object", + "description": "Parameters for reads aligments", + "default": "", + "fa_icon": "fas fa-bahai", + "properties": { + "split_fastq": { "type": "boolean", - "description": "Remove multi-mapped reads", - "default": true + "description": "Split the reads into chunks before running the pipelne", + "fa_icon": "fas fa-dna" }, - "rm_singleton": { - "type": "boolean", - "description": "Remove singleton", - "default": true + "fastq_chunks_size": { + "type": "integer", + "description": "Read number per chunks if split_fastq is used", + "default": 20000000 }, "min_mapq": { "type": "integer", - "default": "10", + "default": 10, "description": "Keep aligned reads with a minimum quality value" }, "bwt2_opts_end2end": { "type": "string", "default": "'--very-sensitive -L 30 --score-min L,-0.6,-0.2 --end-to-end --reorder'", - "description": "Option for end-to-end bowtie mapping" + "description": "Option for HiC-Pro end-to-end bowtie mapping" }, "bwt2_opts_trimmed": { "type": "string", "default": "'--very-sensitive -L 20 --score-min L,-0.6,-0.2 --end-to-end --reorder'", - "description": "Option for trimmed reads mapping" - }, - "save_interaction_bam": { - "type": "boolean", - "description": "Save a BAM file where all reads are flagged by their interaction classes" + "description": "Option for HiC-Pro trimmed reads mapping" }, "save_aligned_intermediates": { "type": "boolean", @@ -176,41 +181,44 @@ } } }, - "contacts_calling_options": { - "title": "Contacts calling", + "valid_pairs_detection": { + "title": "Valid Pairs Detection", "type": "object", "description": "Options to call significant interactions", "default": "", "fa_icon": "fas fa-signature", "properties": { - "min_cis_dist": { - "type": "integer", - "default": "O", - "description": "Minimum distance between loci to consider. Useful for --dnase mode to remove spurious ligation products. Only values > 0 are considered" + "keep_dups": { + "type": "boolean", + "description": "Keep duplicated reads" + }, + "keep_multi": { + "type": "boolean", + "description": "Keep multi-aligned reads" }, "max_insert_size": { "type": "integer", - "default": "0", "description": "Maximum fragment size to consider. Only values > 0 are considered" }, "min_insert_size": { "type": "integer", - "default": "0", "description": "Minimum fragment size to consider. Only values > 0 are considered" }, "max_restriction_fragment_size": { "type": "integer", - "default": "0", "description": "Maximum restriction fragment size to consider. Only values > 0 are considered" }, "min_restriction_fragment_size": { "type": "integer", - "default": "0", "description": "Minimum restriction fragment size to consider. Only values > 0 are considered" + }, + "save_interaction_bam": { + "type": "boolean", + "description": "Save a BAM file where all reads are flagged by their interaction classes" } } }, - "contact_maps_options": { + "contact_maps": { "title": "Contact maps", "type": "object", "description": "Options to build Hi-C contact maps", @@ -219,28 +227,68 @@ "properties": { "bin_size": { "type": "string", - "default": "'1000000,500000'", + "pattern": "^(\\d+)(,\\d+)*$", + "default": "1000000,500000", "description": "Resolution to build the maps (comma separated)" }, - "ice_filer_low_count_perc": { - "type": "string", + "hicpro_maps": { + "type": "boolean", + "description": "Generate raw and normalized contact maps with HiC-Pro" + }, + "ice_filter_low_count_perc": { + "type": "number", "default": 0.02, - "description": "Filter low counts rows before normalization" + "description": "Filter low counts rows before HiC-Pro normalization" }, - "ice_filer_high_count_perc": { + "ice_filter_high_count_perc": { "type": "integer", - "default": "0", - "description": "Filter high counts rows before normalization" + "description": "Filter high counts rows before HiC-Pro normalization" }, "ice_eps": { - "type": "string", - "default": "0.1", - "description": "Threshold for ICE convergence" + "type": "number", + "default": 0.1, + "description": "Threshold for HiC-Pro ICE convergence" }, "ice_max_iter": { "type": "integer", - "default": "100", - "description": "Maximum number of iteraction for ICE normalization" + "default": 100, + "description": "Maximum number of iteraction for HiC-Pro ICE normalization" + }, + "res_zoomify": { + "type": "string", + "default": "5000", + "description": "Maximum resolution to build mcool file" + } + } + }, + "downstream_analysis": { + "title": "Downstream Analysis", + "type": "object", + "description": "Set up downstream analysis from contact maps", + "default": "", + "properties": { + "res_dist_decay": { + "type": "string", + "pattern": "^(\\d+)(,\\d+)*$", + "default": "1000000", + "description": "Resolution to build count/distance plot" + }, + "tads_caller": { + "type": "string", + "default": "hicexplorer,insulation", + "description": "Define methods for TADs calling" + }, + "res_tads": { + "type": "string", + "pattern": "^(\\d+)(,\\d+)*$", + "default": "40000,20000", + "description": "Resolution to run TADs callers (comma separated)" + }, + "res_compartments": { + "type": "string", + "pattern": "^(\\d+)(,\\d+)*$", + "default": "250000", + "description": "Resolution for compartments calling" } } }, @@ -255,13 +303,25 @@ "type": "boolean", "description": "Do not build contact maps" }, - "skip_ice": { + "skip_dist_decay": { + "type": "boolean", + "description": "Do not run distance/decay plot" + }, + "skip_tads": { "type": "boolean", - "description": "Do not normalize contact maps" + "description": "Do not run TADs calling" }, - "skip_cool": { + "skip_compartments": { + "type": "string", + "description": "Do not run compartments calling" + }, + "skip_balancing": { + "type": "boolean", + "description": "Do not run cooler balancing normalization" + }, + "skip_mcool": { "type": "boolean", - "description": "Do not generate cooler file" + "description": "Do not generate mcool file for Higlass visualization" }, "skip_multiqc": { "type": "boolean", @@ -295,15 +355,15 @@ "link", "copy", "copyNoFollow", - "mov" + "move" ] }, - "name": { - "type": "string", - "description": "Workflow name.", - "fa_icon": "fas fa-fingerprint", - "hidden": true, - "help_text": "A custom name for the pipeline run. Unlike the core nextflow `-name` option with one hyphen this parameter can be reused multiple times, for example if using `-resume`. Passed through to steps such as MultiQC and used for things like report filenames and titles." + "validate_params": { + "type": "boolean", + "description": "Boolean whether to validate parameters against the schema at runtime", + "default": true, + "fa_icon": "fas fa-check-square", + "hidden": true }, "email_on_fail": { "type": "string", @@ -347,6 +407,13 @@ "default": "${params.outdir}/pipeline_info", "fa_icon": "fas fa-cogs", "hidden": true + }, + "show_hidden_params": { + "type": "boolean", + "fa_icon": "far fa-eye-slash", + "description": "Show all params when using `--help`", + "hidden": true, + "help_text": "By default, parameters set as _hidden_ in the schema are not shown on the command line when a user runs with `--help`. Specifying this option will tell the pipeline to show all parameters." } } }, @@ -370,6 +437,7 @@ "description": "Maximum amount of memory that can be requested for any single job.", "default": "128.GB", "fa_icon": "fas fa-memory", + "pattern": "^\\d+(\\.\\d+)?\\.?\\s*(K|M|G|T)?B$", "hidden": true, "help_text": "Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. `--max_memory '8.GB'`" }, @@ -378,6 +446,7 @@ "description": "Maximum amount of time that can be requested for any single job.", "default": "240.h", "fa_icon": "far fa-clock", + "pattern": "^(\\d+\\.?\\s*(s|m|h|day)\\s*)+$", "hidden": true, "help_text": "Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. `--max_time '2.h'`" } @@ -412,6 +481,11 @@ "hidden": true, "fa_icon": "fas fa-users-cog" }, + "config_profile_name": { + "type": "string", + "description": "Institutional config name", + "hidden": true + }, "config_profile_description": { "type": "string", "description": "Institutional config description.", @@ -441,13 +515,22 @@ "$ref": "#/definitions/reference_genome_options" }, { - "$ref": "#/definitions/data_processing_options" + "$ref": "#/definitions/digestion_hi_c" + }, + { + "$ref": "#/definitions/dnase_hi_c" + }, + { + "$ref": "#/definitions/alignments" + }, + { + "$ref": "#/definitions/valid_pairs_detection" }, { - "$ref": "#/definitions/contacts_calling_options" + "$ref": "#/definitions/contact_maps" }, { - "$ref": "#/definitions/contact_maps_options" + "$ref": "#/definitions/downstream_analysis" }, { "$ref": "#/definitions/skip_options"