Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Publish WBcel235 #56

Merged
merged 8 commits into from
Dec 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ null/
tmp/
.nf-test
.vscode
assets/manifest.txt
assets/*_manifest.txt
assets/tmp_*.txt
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ Initial release of nf-core/references, created with the [nf-core](https://nf-co.
- [47](https://github.com/nf-core/references/pull/47) - Add fasta assets for files in igenomes
- [51](https://github.com/nf-core/references/pull/51) - Add fasta_fai assets for files in igenomes
- [52](https://github.com/nf-core/references/pull/52) - Add abundantsequences_fasta, bismark_index, bowtie1_index, bowtie2_index, bwamem1_index, bwamem2_index, chrom_info, chromosomes_fasta, dragmap_hashtable, fasta_dict, genes_bed, genes_refflat, genes_refgene, genome_size_xml, gtf, hairpin_fasta, mature_fasta, readme, source, source_vcf, species, star_index and vcf assets for files in igenomes
- [56](https://github.com/nf-core/references/pull/56) - Add fields for bowtie1_index, bowtie2_index, bwamem1_index, bwamem2_index, dragmap_hashtable, hisat2_index, kallisto_index, msisensorpro_list, rsem_index, salmon_index, star_index, vcf_tbi in assets
- [56](https://github.com/nf-core/references/pull/56) - Add new params: kallisto_make_unique to use the --make-unique option for kallisto
- [56](https://github.com/nf-core/references/pull/56) - New file assets/genomes/Caenorhabditis_elegans/NCBI/WBcel235_updated.yml, build from assets/genomes/Caenorhabditis_elegans/NCBI/WBcel235.yml

### Changed

Expand All @@ -44,6 +47,7 @@ Initial release of nf-core/references, created with the [nf-core](https://nf-co.
- [48](https://github.com/nf-core/references/pull/48) - Code refactoring (new subworfklows for each type of operations)
- [49](https://github.com/nf-core/references/pull/49) - Better publishing for all files
- [53](https://github.com/nf-core/references/pull/53) - Better publishing for all aligner indexes
- [56](https://github.com/nf-core/references/pull/56) - reference_version -> source_version

### Fixed

Expand All @@ -55,6 +59,7 @@ Initial release of nf-core/references, created with the [nf-core](https://nf-co.
- [39](https://github.com/nf-core/references/pull/39) - Fix gtf generation and dependencies
- [50](https://github.com/nf-core/references/pull/50) - Minimal JAVA is 17
- [51](https://github.com/nf-core/references/pull/51) - Fix missing fasta assets for GATK build
- [56](https://github.com/nf-core/references/pull/56) - Add new logic for skip creation of existing assets

### Dependencies

Expand Down
78 changes: 48 additions & 30 deletions assets/generate_yaml_asset.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,11 @@
# Download or regenerate manifest file

# Command line flags
while getopts ":dr" opt; do
while getopts ":adr" opt; do
case $opt in
a)
NEW_MANIFEST=true
;;
d)
DOWNLOAD_MANIFEST=true
;;
Expand All @@ -13,38 +16,53 @@ while getopts ":dr" opt; do
;;
\?)
echo "Invalid option: -$OPTARG" 1>&2
print_usage
exit 1;
;;
esac
done

if [[ ${DOWNLOAD_MANIFEST} ]]; then
rm -f igenomes_manifest.txt
rm -f manifest.txt
echo "Downloading manifest"
wget https://raw.githubusercontent.com/ewels/AWS-iGenomes/refs/heads/master/ngi-igenomes_file_manifest.txt -O igenomes_manifest.txt
wget https://raw.githubusercontent.com/ewels/AWS-iGenomes/refs/heads/master/ngi-igenomes_file_manifest.txt -O manifest.txt
fi

if [[ ${REGENERATE_MANIFEST} ]]; then
rm -f igenomes_manifest.txt
rm -f manifest.txt
echo "Regenerating manifest"
# cf https://github.com/ewels/AWS-iGenomes/pull/22
aws s3 --no-sign-request ls --recursive s3://ngi-igenomes/igenomes/ | cut -d "/" -f 2- > tmp
for i in `cat tmp`; do
if [[ ! $i =~ /$ ]]; then
echo s3://ngi-igenomes/igenomes/$i >> manifest
echo s3://ngi-igenomes/igenomes/$i >> tmp_manifest
fi
done
mv tmp_manifest manifest.txt

rm tmp
fi

if [[ ${NEW_MANIFEST} ]]; then
rm -f manifest.txt
echo "Regenerating NEW manifest"
# cf https://github.com/ewels/AWS-iGenomes/pull/22

aws s3 --profile igenomes ls --recursive s3://nf-core-references-scratch/genomes/ | grep -v "pipeline_info" | cut -d "/" -f 2- > tmp
for i in `cat tmp`; do
if [[ ! $i =~ /$ ]]; then
echo s3://nf-core-references-scratch/genomes/$i >> tmp_manifest
fi
done
mv manifest igenomes_manifest.txt
mv tmp_manifest manifest.txt

rm tmp
fi

total_files=$(wc -l igenomes_manifest.txt | cut -d " " -f 1)
total_files=$(wc -l manifest.txt | cut -d " " -f 1)

echo "Number of files in manifest: $total_files"

cp igenomes_manifest.txt leftover_manifest.txt
cp manifest.txt leftover_manifest.txt

# Remove existing assets
rm -rf igenomes/
Expand All @@ -55,7 +73,7 @@ rm -rf igenomes/
# ALL fai are coming from a fasta file of the same name
# Hence I use it to generate fasta + fai (and catch with that the fasta that are not following the gemome.fa name scheme)

cat igenomes_manifest.txt | grep "\.fai" | grep -v "Bowtie2Index" | grep -v "fai\.gz" > tmp_fai.txt
cat manifest.txt | grep "\.fai" | grep -v "Bowtie2Index" | grep -v "fai\.gz" > tmp_fai.txt

echo "Populating assets for fasta and fai"

Expand All @@ -79,7 +97,7 @@ do
done

# All source README
cat igenomes_manifest.txt | grep "README" | grep -v "Archives" | grep -v "beagle" | grep -v "plink" | grep -v "PhiX\/Illumina\/RTA\/Annotation\/README\.txt" > tmp_readme.txt
cat manifest.txt | grep "README" | grep -v "Archives" | grep -v "beagle" | grep -v "plink" | grep -v "PhiX\/Illumina\/RTA\/Annotation\/README\.txt" > tmp_readme.txt

echo "Populating assets for README"

Expand All @@ -93,7 +111,7 @@ do
done

# All source gtf (removing the onces coming from gencode)
cat igenomes_manifest.txt | grep "\.gtf" | grep -v "gtf\." | grep -v "STARIndex" | grep -v "Genes\.gencode" > tmp_gtf.txt
cat manifest.txt | grep "\.gtf" | grep -v "gtf\." | grep -v "STARIndex" | grep -v "Genes\.gencode" > tmp_gtf.txt

echo "Populating assets for GTF"

Expand All @@ -107,7 +125,7 @@ do
done

# All source fasta.dict
cat igenomes_manifest.txt | grep "\.dict" | grep -v "dict\.gz" | grep -v "dict\.old" > tmp_dict.txt
cat manifest.txt | grep "\.dict" | grep -v "dict\.gz" | grep -v "dict\.old" > tmp_dict.txt

echo "Populating assets for fasta.dict"

Expand All @@ -121,7 +139,7 @@ do
done

# All source genes.bed
cat igenomes_manifest.txt | grep "genes\.bed" > tmp_bed.txt
cat manifest.txt | grep "genes\.bed" > tmp_bed.txt

echo "Populating assets for genes.bed"

Expand All @@ -135,7 +153,7 @@ do
done

# All source BowtieIndex
cat igenomes_manifest.txt | grep "BowtieIndex" | grep -v "MDSBowtieIndex" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bowtie.txt
cat manifest.txt | grep "BowtieIndex" | grep -v "MDSBowtieIndex" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bowtie.txt

echo "Populating assets for BowtieIndex"

Expand All @@ -149,7 +167,7 @@ do
done

# All source Bowtie2Index
cat igenomes_manifest.txt | grep "Bowtie2Index" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bowtie2.txt
cat manifest.txt | grep "Bowtie2Index" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bowtie2.txt

echo "Populating assets for Bowtie2Index"

Expand All @@ -163,7 +181,7 @@ do
done

# All source BWAIndex (we have version0.6.0, version0.5.x, and no version specified)
cat igenomes_manifest.txt | grep "BWAIndex" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bwaindex.txt
cat manifest.txt | grep "BWAIndex" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bwaindex.txt

echo "Populating assets for BWAIndex"

Expand All @@ -180,7 +198,7 @@ do
done

# All source BWAmem2mem
cat igenomes_manifest.txt | grep "BWAmem2Index" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bwamem2mem.txt
cat manifest.txt | grep "BWAmem2Index" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bwamem2mem.txt

echo "Populating assets for BWAmem2Index"

Expand All @@ -194,7 +212,7 @@ do
done

# All source Dragmap
cat igenomes_manifest.txt | grep "dragmap" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_dragmap.txt
cat manifest.txt | grep "dragmap" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_dragmap.txt

echo "Populating assets for DragmapHashtable"

Expand All @@ -208,7 +226,7 @@ do
done

# All source BismarkIndex
cat igenomes_manifest.txt | grep "BismarkIndex\/genome\.fa" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bismark.txt
cat manifest.txt | grep "BismarkIndex\/genome\.fa" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_bismark.txt

echo "Populating assets for BismarkIndex"

Expand All @@ -222,7 +240,7 @@ do
done

# All source star Index
cat igenomes_manifest.txt | grep "STARIndex" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_star.txt
cat manifest.txt | grep "STARIndex" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_star.txt

echo "Populating assets for STARIndex"

Expand All @@ -236,7 +254,7 @@ do
done

# All source Chromosomes fasta
cat igenomes_manifest.txt | grep "Chromosomes" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_chromosomes.txt
cat manifest.txt | grep "Chromosomes" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_chromosomes.txt

echo "Populating assets for Chromosomes fasta"

Expand All @@ -250,7 +268,7 @@ do
done

# All source AbundantSequences fasta
cat igenomes_manifest.txt | grep "AbundantSequences" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_abundantsequences.txt
cat manifest.txt | grep "AbundantSequences" | rev | cut -d "/" -f 2- | rev | sort -u > tmp_abundantsequences.txt

echo "Populating assets for AbundantSequences fasta"

Expand All @@ -264,7 +282,7 @@ do
done

# All source refFlat (removing the ones coming from gencode)
cat igenomes_manifest.txt | grep "refFlat\.txt" | grep -v "\.gz\.bak" | grep -v "Genes\.gencode" > tmp_refflat.txt
cat manifest.txt | grep "refFlat\.txt" | grep -v "\.gz\.bak" | grep -v "Genes\.gencode" > tmp_refflat.txt

echo "Populating assets for refFlat"

Expand All @@ -278,7 +296,7 @@ do
done

# All source refgene
cat igenomes_manifest.txt | grep "refGene\.txt" | grep -v "Archives" | grep -v "\.gz\.bak" > tmp_refgene.txt
cat manifest.txt | grep "refGene\.txt" | grep -v "Archives" | grep -v "\.gz\.bak" > tmp_refgene.txt

echo "Populating assets for refgene"

Expand All @@ -292,7 +310,7 @@ do
done

# All source ChromInfo.txt
cat igenomes_manifest.txt | grep "ChromInfo\.txt" | grep -v "Archives" > tmp_chrominfo.txt
cat manifest.txt | grep "ChromInfo\.txt" | grep -v "Archives" > tmp_chrominfo.txt

echo "Populating assets for ChromInfo.txt"

Expand All @@ -306,7 +324,7 @@ do
done

# All source GenomeSize.xml (removing the old ones)
cat igenomes_manifest.txt | grep "GenomeSize\.xml" | grep -v "\.old" > tmp_GenomeSize.txt
cat manifest.txt | grep "GenomeSize\.xml" | grep -v "\.old" > tmp_GenomeSize.txt

echo "Populating assets for GenomeSize.xml"

Expand All @@ -320,7 +338,7 @@ do
done

# All source hairpin.fa
cat igenomes_manifest.txt | grep "SmallRNA\/hairpin\.fa" > tmp_hairpin.txt
cat manifest.txt | grep "SmallRNA\/hairpin\.fa" > tmp_hairpin.txt

echo "Populating assets for hairpin.fa"

Expand All @@ -334,7 +352,7 @@ do
done

# All source mature.fa
cat igenomes_manifest.txt | grep "SmallRNA\/mature\.fa" > tmp_mature.txt
cat manifest.txt | grep "SmallRNA\/mature\.fa" > tmp_mature.txt

echo "Populating assets for mature.fa"

Expand All @@ -348,7 +366,7 @@ do
done

# All source vcf
cat igenomes_manifest.txt | grep "\.vcf" | grep -v "\.idx" | grep -v "\.tbi" | grep -v "\.md5" > tmp_vcf.txt
cat manifest.txt | grep "\.vcf" | grep -v "\.idx" | grep -v "\.tbi" | grep -v "\.md5" > tmp_vcf.txt

echo "Populating assets for vcf"

Expand Down
4 changes: 2 additions & 2 deletions assets/genomes/Caenorhabditis_elegans/NCBI/WBcel235.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
- genome: WBcel235
fasta: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/985/GCF_000002985.6_WBcel235/GCF_000002985.6_WBcel235_genomic.fna.gz
gtf: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/985/GCF_000002985.6_WBcel235/GCF_000002985.6_WBcel235_genomic.gtf.gz
reference_version: GCF_000002985.6
source_version: GCF_000002985.6
site: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000002985.6
source: NCBI
species: Caenorhabditis_elegans
- genome: WBcel235
gff: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/002/985/GCF_000002985.6_WBcel235/GCF_000002985.6_WBcel235_genomic.gff.gz
reference_version: GCF_000002985.6
source_version: GCF_000002985.6
site: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000002985.6
source: NCBI
species: Caenorhabditis_elegans
24 changes: 24 additions & 0 deletions assets/genomes/Caenorhabditis_elegans/NCBI/WBcel235_updated.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
- genome: WBcel235
source: NCBI
source_version: GCF_000002985.6
species: Caenorhabditis_elegans
bowtie1_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/BowtieIndex/version1.3.1/
bowtie2_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/Bowtie2Index/version2.5.2/
bwamem1_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/BWAIndex/version0.7.18/
bwamem2_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/BWAmem2Index/version2.2.1/
dragmap_hashtable: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/dragmap/version1.2.1/
fasta: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/WholeGenomeFasta/GCF_000002985.6_WBcel235_genomic.fna
fasta_dict: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/WholeGenomeFasta/GCF_000002985.6_WBcel235_genomic.dict
fasta_fai: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/WholeGenomeFasta/GCF_000002985.6_WBcel235_genomic.fna.fai
fasta_sizes: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/WholeGenomeFasta/GCF_000002985.6_WBcel235_genomic.fna.sizes
gff: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Annotation/Genes/GCF_000002985.6_WBcel235_genomic.gff
gtf: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Annotation/Genes/GCF_000002985.6_WBcel235_genomic.gtf
hisat2_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/Hisat2Index/GCF_000002985.6/version2.2.1/
intervals_bed: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Annotation/intervals/WBcel235.bed
kallisto_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/KallistoIndex/GCF_000002985.6/version0.51.1/kallisto
msisensorpro_list: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Annotation/msisensorpro/WBcel235.msisensor_scan.list
rsem_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/RSEMIndex/GCF_000002985.6/version1.3.1/
salmon_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/SalmonIndex/GCF_000002985.6/version1.10.3/
splice_sites: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/SpliceSites/GCF_000002985.6_WBcel235_genomic.splice_sites.txt
star_index: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/STARIndex/GCF_000002985.6/version2.7.11b/
transcript_fasta: s3://nf-core-references-scratch/genomes/Caenorhabditis_elegans/NCBI/WBcel235/Sequence/TranscriptFasta/genome.transcripts.fa
2 changes: 1 addition & 1 deletion assets/genomes/Homo_sapiens/Gencode/GRCh38.p14.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
gff: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.chr_patch_hapl_scaff.annotation.gff3.gz
gtf: https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_45/gencode.v45.chr_patch_hapl_scaff.annotation.gtf.gz
mito_name: MT
reference_version: GCF_000001405.40
source_version: GCF_000001405.40
site: https://www.ncbi.nlm.nih.gov/datasets/genome/GCF_000001405.40
source: Gencode
species: Homo_sapiens
2 changes: 1 addition & 1 deletion assets/genomes/Homo_sapiens/NCBI/GRCh38.p14.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,6 @@
gff: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.gff.gz
gtf: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/001/405/GCF_000001405.40_GRCh38.p14/GCF_000001405.40_GRCh38.p14_genomic.gtf.gz
mito_name: MT
reference_version: GCF_000001405.40
source_version: GCF_000001405.40
source: NCBI
species: Homo_sapiens
Loading
Loading