Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache data for GHA #933

Merged
merged 12 commits into from
Jan 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 125 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,33 @@ jobs:
with:
version: "${{ matrix.NXF_VER }}"

- name: Cache test data
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could swear there was a way where we don't have to repeat this big block of code for every test?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but we'll rewrote the tests soon enough, so we'll fix that in a future PR

id: cache-testdata
uses: actions/cache@v3
with:
path: test-datasets/
key: rnaseq3_10-test-data

- name: Check out test data
if: steps.cache-testdata.outputs.cache-hit != 'true'
uses: actions/checkout@v3
with:
repository: nf-core/test-datasets
ref: rnaseq3
path: test-datasets/

- name: Replace remote paths in samplesheets
run: |
for f in ${{ github.workspace }}/test-datasets/samplesheet/v3.10/*.csv; do
sed -i "s=https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/=${{ github.workspace }}/test-datasets/=g" $f
echo "========== $f ============"
cat $f
echo "========================================"
done;

- name: Run pipeline with test data
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --outdir ./results --test_data_base ${{ github.workspace }}/test-datasets/
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

star_salmon:
name: Test STAR Salmon with workflow parameters
Expand All @@ -62,14 +86,38 @@ jobs:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Cache test data
id: cache-testdata
uses: actions/cache@v3
with:
path: test-datasets/
key: rnaseq3_10-test-data

- name: Check out test data
if: steps.cache-testdata.outputs.cache-hit != 'true'
uses: actions/checkout@v3
with:
repository: nf-core/test-datasets
ref: rnaseq3
path: test-datasets/

- name: Replace remote paths in samplesheets
run: |
for f in ${{ github.workspace }}/test-datasets/samplesheet/v3.10/*.csv; do
sed -i "s=https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/=${{ github.workspace }}/test-datasets/=g" $f
echo "========== $f ============"
cat $f
echo "========================================"
done;

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline with STAR and various parameters
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --aligner star_salmon ${{ matrix.parameters }} --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --aligner star_salmon ${{ matrix.parameters }} --outdir ./results --test_data_base ${{ github.workspace }}/test-datasets/

star_rsem:
name: Test STAR RSEM with workflow parameters
Expand All @@ -84,14 +132,38 @@ jobs:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Cache test data
id: cache-testdata
uses: actions/cache@v3
with:
path: test-datasets/
key: rnaseq3_10-test-data

- name: Check out test data
if: steps.cache-testdata.outputs.cache-hit != 'true'
uses: actions/checkout@v3
with:
repository: nf-core/test-datasets
ref: rnaseq3
path: test-datasets/

- name: Replace remote paths in samplesheets
run: |
for f in ${{ github.workspace }}/test-datasets/samplesheet/v3.10/*.csv; do
sed -i "s=https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/=${{ github.workspace }}/test-datasets/=g" $f
echo "========== $f ============"
cat $f
echo "========================================"
done;

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline with RSEM STAR and various parameters
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --aligner star_rsem ${{ matrix.parameters }} --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --aligner star_rsem ${{ matrix.parameters }} --outdir ./results --test_data_base ${{ github.workspace }}/test-datasets/

hisat2:
name: Test HISAT2 with workflow parameters
Expand All @@ -106,14 +178,38 @@ jobs:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Cache test data
id: cache-testdata
uses: actions/cache@v3
with:
path: test-datasets/
key: rnaseq3_10-test-data

- name: Check out test data
if: steps.cache-testdata.outputs.cache-hit != 'true'
uses: actions/checkout@v3
with:
repository: nf-core/test-datasets
ref: rnaseq3
path: test-datasets/

- name: Replace remote paths in samplesheets
run: |
for f in ${{ github.workspace }}/test-datasets/samplesheet/v3.10/*.csv; do
sed -i "s=https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/=${{ github.workspace }}/test-datasets/=g" $f
echo "========== $f ============"
cat $f
echo "========================================"
done;

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline with HISAT2 and various parameters
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --aligner hisat2 ${{ matrix.parameters }} --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --aligner hisat2 ${{ matrix.parameters }} --outdir ./results --test_data_base ${{ github.workspace }}/test-datasets/

salmon:
name: Test Salmon with workflow parameters
Expand All @@ -128,11 +224,35 @@ jobs:
- name: Check out pipeline code
uses: actions/checkout@v2

- name: Cache test data
id: cache-testdata
uses: actions/cache@v3
with:
path: test-datasets/
key: rnaseq3_10-test-data

- name: Check out test data
if: steps.cache-testdata.outputs.cache-hit != 'true'
uses: actions/checkout@v3
with:
repository: nf-core/test-datasets
ref: rnaseq3
path: test-datasets/

- name: Replace remote paths in samplesheets
run: |
for f in ${{ github.workspace }}/test-datasets/samplesheet/v3.10/*.csv; do
sed -i "s=https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/=${{ github.workspace }}/test-datasets/=g" $f
echo "========== $f ============"
cat $f
echo "========================================"
done;

- name: Install Nextflow
run: |
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/

- name: Run pipeline with Salmon and various parameters
run: |
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --pseudo_aligner salmon ${{ matrix.parameters }} --outdir ./results
nextflow run ${GITHUB_WORKSPACE} -profile test,docker --pseudo_aligner salmon ${{ matrix.parameters }} --outdir ./results --test_data_base ${{ github.workspace }}/test-datasets/
22 changes: 11 additions & 11 deletions conf/test.config
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,19 @@ params {
max_time = '6.h'

// Input data
input = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.10/samplesheet_test.csv'
input = "${params.test_data_base}/samplesheet/v3.10/samplesheet_test.csv"

// Genome references
fasta = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fasta'
gtf = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf.gz'
gff = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gff.gz'
transcript_fasta = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/transcriptome.fasta'
additional_fasta = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/gfp.fa.gz'

bbsplit_fasta_list = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/bbsplit_fasta_list.txt'
hisat2_index = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/hisat2.tar.gz'
salmon_index = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/salmon.tar.gz'
rsem_index = 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/rsem.tar.gz'
fasta = "${params.test_data_base}/reference/genome.fasta"
gtf = "${params.test_data_base}/reference/genes.gtf.gz"
gff = "${params.test_data_base}/reference/genes.gff.gz"
transcript_fasta = "${params.test_data_base}/reference/transcriptome.fasta"
additional_fasta = "${params.test_data_base}/reference/gfp.fa.gz"

bbsplit_fasta_list = "${params.test_data_base}/reference/bbsplit_fasta_list.txt"
hisat2_index = "${params.test_data_base}/reference/hisat2.tar.gz"
salmon_index = "${params.test_data_base}/reference/salmon.tar.gz"
rsem_index = "${params.test_data_base}/reference/rsem.tar.gz"

// Other parameters
skip_bbsplit = false
Expand Down
1 change: 1 addition & 0 deletions nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -118,6 +118,7 @@ params {
config_profile_contact = null
config_profile_url = null
config_profile_name = null
test_data_base = 'https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3'
maxulysse marked this conversation as resolved.
Show resolved Hide resolved

// Max resource options
// Defaults only, expecting to be overwritten
Expand Down
7 changes: 7 additions & 0 deletions nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -580,6 +580,13 @@
"description": "Institutional config URL link.",
"hidden": true,
"fa_icon": "fas fa-users-cog"
},
"test_data_base": {
"type": "string",
"default": "https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq3",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why rnaseq3 ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a subset of rnaseq which contains only the data actually used in the tests

"description": "Base path / URL for data used in the test profiles",
"help_text": "Warning: The `-profile test` samplesheet file itself contains remote paths. Setting this parameter does not alter the contents of that file.",
"hidden": true
}
}
},
Expand Down