Experimental additions to Assembly Checker
Note that everything except these new features is equally stable as in the previous release v0.9.1. Using the latest version is recommended.
This release adds 2 new experimental features to the assembly checker
The 2 new features were not present in v0.9.1 and might change its behaviour in the future.
1) Possibility of checking a VCF against a FASTA file, where they use a different chromosome naming system.
For instance, your VCF uses chromosome numbers:
#CHROM POS...
1 100 ...
but you have a FASTA with chromosome accessions:
>CM000001.3 chromosome 1
ATCG...
Now you can use the -a
parameter to provide the path to a file with the mapping. The file structure expected is that of NCBI's assembly reports such as ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/285/GCA_000002285.2_CanFam3.1/GCA_000002285.2_CanFam3.1_assembly_report.txt
For each chromosome, the assembly checker will try to find in the FASTA any synonym under the columns "Sequence-Name", "GenBank-Accn", "RefSeq-Accn" and "UCSC-style-name".
2) Remote sequence retrieval.
If no FASTA file is provided, EBI-ENA will be queried to download the sequence of each chromosome used in the VCF to check every reference allele.