Skip to content

Releases: EBIvariation/vcf-validator

gVCF support, ploidy fixes and usability improvements

14 Mar 09:29
8caf422
Compare
Choose a tag to compare

The validator can now check fields specific of the gVCF extension. This includes <*> alternate alleles and how they relate to the END INFO field and sample genotypes.

Following some user reports (#101, #102) of incorrect counts being expected for FORMAT fields with Number=G, we confirmed with the specification that their cardinality depends on the ploidy of each sample genotype and not on the ALT column. The issue should be solved now, but if you find any problems please open a new ticket!

This version also introduces some usability improvements. The biggest is a summary report in addition to the existing text and database outputs. This is human-readable and lists each type of error detected, the number of times it occurred, and the first line where it was observed.

The --version option now reports which version of the validator are you running. Please note that in vcf-validator 0.4 or previous this option was used to note which version of the specification the input file should match.

And finally, the validator now warns the user if the input is compressed, instead of reporting a confusing list of errors.

You can download the Linux binaries using the links, and also visit this page if you are interested in the full list of changes.

Improved structural variation support

11 Sep 09:04
Compare
Choose a tag to compare

It has been a really productive summer thanks to @Anishka0107, the Google Summer of Code student who has improved the support for structural variants in the validator and the debugulator 😃

She has added new metadata validations to ensure that INFO and FORMAT fields match the header definition, and that said header matches the VCF specification itself. These validations apply not only to short variants but also to structural variation tags, which hadn't been fully supported until now!

She also expanded the checks (added to last version) that guarantee no duplicate values in the ID and FORMAT columns in a single line, to also include the FILTER and INFO columns. The debugulator can now automatically fix these duplicates, as well as the values assigned to some INFO tags (see #78 for more details).

The last phase of GSoC was more focused on the purely technical aspects of the project: cleaning up the code, improving the documentation and slightly simplifying the grammar that detects syntax errors.

Please download the Linux binaries using the links below, and visit this page if you are interested in the full list of changes.

VCF version detected automatically and checks on duplicate fields

04 May 15:48
Compare
Choose a tag to compare

This version simplifies the integration of the validation tool in automated pipelines, detecting the version of the VCF file before running the validation. This also prevents errors from being raised due to involuntary mismatches between the command line argument and the file.

New checks have been also included, to guarantee that no duplicate values are present in the ID and FORMAT columns in a single line. These checks are only applicable to version 4.3 of the specification!

The binaries can be downloaded using the links below.

Fixed bug when GT field is not listed in FORMAT column

22 Mar 11:57
Compare
Choose a tag to compare

The VCF specification allows not to list the GT field in the FORMAT column, but if present it must the first field. This release solves an issue that was making the validator raise a misleading error if GT was not present.

INFO CIGAR field and newline at end of file issues solved

05 Dec 10:03
Compare
Choose a tag to compare

This maintenance release solves a couple of issues reported for version 0.4.1:

  • Only a single value was considered valid as CIGAR field in the INFO column, when it should be a list as long as the number of alternate alleles. Thanks @sambrightman for your pull request!
  • Errors due to the lack of newline characters and the end of the file were not properly reported.

Memory usage issues solved

03 Nov 10:17
Compare
Choose a tag to compare

This maintenance release solves memory issues reported for version 0.4.

New dependencies were added to make possible to detect more complex errors, but the amount of memory consumed grew indefinitely. This has been solved and memory usage now remains constant at less than 10 MB of RAM.

The new executables, compatible with any Linux version, can be downloaded using the links below.

Fixing VCF files, fixing bugs...

18 Oct 14:01
Compare
Choose a tag to compare

In addition to the removal of duplicate variants introduced in the previous release, errors in the INFO and samples columns can be fixed now by removing the faulty field from the column. For instance, if an INFO value looks like AN=123;AF=not_a_frequency;DP=345, the fix would transform it into AN=123;DP=345.

Other improvements included in this version are:

  • Support for genomic ploidy different from 2
  • Ensuring all the variants that don't require fixing are written after running the vcf-debugulator
  • Simplified build process using a Docker image (recommended for developers only)

You can download the executables using the links below.

VCF v4.3 support and automatic error fixing

27 Jul 20:14
Compare
Choose a tag to compare

This release brings many exciting new features! VCF v4.3 is now supported and has been tested against more than 150 VCF files, so you can be sure it will catch a lot of pesky errors.

To make error solving a bit easier, vcf-validator now contains 2 different tools:

  • The validator, which can write reports to plain text and now also to a portable database (SQLite). Then the user doesn't need to fix every error by hand, because this database can be later processed by an automated tool such as...
  • The "debugulator", which reads the validator reports and automatically corrects as many errors as possible. This version can remove duplicated variants, and we will add more fixes in the future. In this release, the debugulator support is experimental and has some important bugs that were fixed in newer versions.

Compiler compatibility improved

03 Nov 13:31
Compare
Choose a tag to compare

Support for multiple compilers has been improved and it is automatically checked when committing changes to the repository. The list of fully supported compilers is:

  • Clang 3.5 to 3.7
  • GCC 4.8 to 5.0

Static linking supported during build

24 Sep 14:47
Compare
Choose a tag to compare

Static linking is now supported during the build process, benefiting those who can't install the dependencies in the machine that will run the validator.

If that is your case, please run the build in a system where you have root permissions, adding the -DBUILD_STATIC=1 option to the cmake command.