Releases: EBIvariation/vcf-validator
v0.10.0
Highlights:
- Add support for VCF4.4
- Remove support for vcf-debugulator
What's Changed
- GA4GHTT-232 - vcf4.4 initial changes by @vasudeva8 in #240
- Add VCF 4.4 test files for SVCLAIM by @tcezard in #235
- GA4GHTT-261 - changes for CN, SVLEN, SVCLAIM by @vasudeva8 in #246
- GA4GHTT-261 - Add Test files for STR and new phasing notation by @tcezard in #244
- GA4GHTT-270: changes for CNV:TR and optional leading phasing info by @vasudeva8 in #249
- GA4GHTT-270: Merge changes from master (241, 247, 248) and porting 248 for 4.4 by @vasudeva8 in #251
- GA4GHTT-276 - minor update by @vasudeva8 in #252
- Add static linking for libz and libbz2 for macosx by @tcezard in #253
- GA4GHTT-276: 4.4 changes - optional meta, format CN, CICN, PSL, PSO, PSQ by @vasudeva8 in #254
- GA4GHTT-276: v4.4 test file update and minor fixes by @vasudeva8 in #255
- ragel output with -G2 and warning updates by @vasudeva8 in #258
- EVA-3674 - Use dynamic libraries from boost in OSX by @tcezard in #259
- GA4GHTT-303 - odb dependency removed, debugulator removed from build by @vasudeva8 in #257
- Remove debugulator from build and release by @tcezard in #262
- VCF 4.4 Feature branch by @tcezard in #261
- EVA-3662 - Enable build with dynamically linked dependencies by @tcezard in #263
- Bump version to 0.10.0 and update the README.md by @tcezard in #264
Full Changelog: v0.9.7...v0.10.0
v0.9.7
What's Changed
- EVA-3281 - Allow synonyms to be the same value if they are on the same line by @tcezard in #241
- EVA-3586 - Allow colon in chromosome names by @tcezard in #248
- Add description of conda installation by @tcezard in #239
- Update to generate mac arm binary by @vasudeva8 in #247
Full Changelog: v0.9.6...v0.9.7
v0.9.6
- Fix for the Mac OS-X build
v0.9.5
- Missing data is valid even if multiple values are expected
New flag "--require-evidence" and improved validation of strings and integers
This release includes 2 changes:
- Added a new flag
--require-evidence
to check the presence of genotypes, allele frequencies or allele counts. - Fix a bug where number parsing and validation was not as strict as expected.
Bgzip and Ubuntu 18 (locale) fixes
This is a patch release that includes just 2 important fixes:
- There was an error about locales when running in Ubuntu 18: #184
- Bgzipped VCFs had a small chance of being read incompletely.
We recommend everyone to use this version instead of the previous ones.
Experimental additions to Assembly Checker
Note that everything except these new features is equally stable as in the previous release v0.9.1. Using the latest version is recommended.
This release adds 2 new experimental features to the assembly checker
The 2 new features were not present in v0.9.1 and might change its behaviour in the future.
1) Possibility of checking a VCF against a FASTA file, where they use a different chromosome naming system.
For instance, your VCF uses chromosome numbers:
#CHROM POS...
1 100 ...
but you have a FASTA with chromosome accessions:
>CM000001.3 chromosome 1
ATCG...
Now you can use the -a
parameter to provide the path to a file with the mapping. The file structure expected is that of NCBI's assembly reports such as ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/002/285/GCA_000002285.2_CanFam3.1/GCA_000002285.2_CanFam3.1_assembly_report.txt
For each chromosome, the assembly checker will try to find in the FASTA any synonym under the columns "Sequence-Name", "GenBank-Accn", "RefSeq-Accn" and "UCSC-style-name".
2) Remote sequence retrieval.
If no FASTA file is provided, EBI-ENA will be queried to download the sequence of each chromosome used in the VCF to check every reference allele.
Duplicate sample detection and warnings for unused parameters
This small release contains only small fixes and the next improvements:
If the header line in a VCF file contains several samples with the same name, it is now flagged as an error, as recently clarified in the VCF specification.
Warnings are now logged if there are unused parameters in the command used to run any of the tools. Thanks @srbcheema1 for the contributions!
New reference checker tool and Windows support
A new tool has been added to the suite! This one checks that the REF column in a VCF matches the sequence contained in a FASTA file, and reports any mismatches in a summary or plain text file, in a similar fashion to the VCF validator reporting. A new report type that only outputs the valid lines is also included in this tool.
We have also added support for Windows, making the suite compatible with the 3 major operating systems. Please be aware that you will need to decompress your files before validating them on Windows due to a known issue.
You can find the binaries for all versions, ready for direct download, attached to these notes.
New MacOS version and built-in support for compressed files
MacOS users can now run the validation suite in their favorite OS, without needing Docker or admin permissions. Just copy the executable in the link into your machine and run it in exactly the same way as in Linux. Please let us know if you find any compatibility issues by creating a bug report.
The validator can also read files compressed in multiple formats without the need of a pipe. You can find instructions in the updated README file.
Thanks to @srbcheema1 for these contributions!