Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vcfanno does not typecast fields correctly when using the by_alt op #113

Closed
5 tasks
ptn24 opened this issue Jul 18, 2019 · 6 comments
Closed
5 tasks

Vcfanno does not typecast fields correctly when using the by_alt op #113

ptn24 opened this issue Jul 18, 2019 · 6 comments

Comments

@ptn24
Copy link

ptn24 commented Jul 18, 2019

According to https://github.com/brentp/vcfanno#typecasting-values, it should be possible to typecast fields by adding a _float suffix to the field names. However, when using the by_alt op, the annotated VCF fields do not have the desired type, and the _float suffixes are not removed

Op: self
Field name: good
Field number: bad
Field type: float

root@b3cca58b784e:/tmp# cat conf.toml 
[[annotation]]
names = [ "CADD_RAW_float",]
file = "/tmp/annotation.tsv.gz"
columns = [ 5,]
ops = [ "self",]
root@b3cca58b784e:/tmp# vcfanno conf.toml test.vcf.gz

=============================================
vcfanno version 0.3.1 [built with go1.11]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:143: using 2 worker threads to decompress query file
api.go:804: WARNING: using op 'self' when with Number='1' for '' from '/tmp/annotation.tsv.gz' can result in out-of-order values when the query is multi-allelic
api.go:805:        : this is not an issue if the query has been decomposed.
##fileformat=VCFv4.2
##contig=<ID=chr2,length=242193529,assembly=GRCh38>
##INFO=<ID=AF,Number=.,Type=Float,Description="">
##INFO=<ID=AQ,Number=.,Type=Integer,Description="">
##INFO=<ID=CADD_RAW,Number=1,Type=Float,Description="calculated by self of overlapping values in column 5 from /tmp/annotation.tsv.gz">
##hailversion=0.2.9-8588a25687af
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
chr2    41647   2_41647_A_G     A       G       1328.0  .       AF=1.56250e-02;AQ=1328;CADD_RAW=0.591814
vcfanno.go:241: annotated 1 variants in 0.00 seconds (2292.6 / second)

Op: by_alt
Field name: bad
Field number: good
Field type: string

root@b3cca58b784e:/tmp# cat conf.toml 
[[annotation]]
names = [ "CADD_RAW_float",]
file = "/tmp/annotation.tsv.gz"
columns = [ 5,]
ops = [ "by_alt",]
root@b3cca58b784e:/tmp# vcfanno conf.toml test.vcf.gz

=============================================
vcfanno version 0.3.1 [built with go1.11]

see: https://github.com/brentp/vcfanno
=============================================
vcfanno.go:115: found 1 sources from 1 files
vcfanno.go:143: using 2 worker threads to decompress query file
##fileformat=VCFv4.2
##contig=<ID=chr2,length=242193529,assembly=GRCh38>
##INFO=<ID=AF,Number=.,Type=Float,Description="">
##INFO=<ID=AQ,Number=.,Type=Integer,Description="">
##INFO=<ID=CADD_RAW_float,Number=A,Type=String,Description="calculated by by_alt of overlapping values in column 5 from /tmp/annotation.tsv.gz">
##hailversion=0.2.9-8588a25687af
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
chr2    41647   2_41647_A_G     A       G       1328.0  .       AF=1.56250e-02;AQ=1328;CADD_RAW_float=0.591814
vcfanno.go:241: annotated 1 variants in 0.00 seconds (3546.0 / second)

It would be good if the following was true:

  • Op: by_alt
  • Field name: CADD_RAW
  • Field number: A
  • Field type: float

  • minimal conf and lua files that you are using.
    See above

  • urls or actual files for annotations in conf file.

root@b3cca58b784e:/tmp# zcat annotation.tsv.gz
## CADD GRCh38-v1.4 (c) University of Washington, Hudson-Alpha Institute for Biotechnology and Berlin Institute of Health 2013-2018. All rights reserved.
#Chrom  Pos     Ref     Alt     RawScore        PHRED
2       41647   A       G       0.591814        8.493
  • minimal query file.
root@b3cca58b784e:/tmp# zcat test.vcf.gz 
##fileformat=VCFv4.2
##contig=<ID=chr2,length=242193529,assembly=GRCh38>
##INFO=<ID=AF,Number=.,Type=Float,Description="">
##INFO=<ID=AQ,Number=.,Type=Integer,Description="">
##hailversion=0.2.9-8588a25687af
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT
chr2    41647   2_41647_A_G     A       G       1328.0  .       AF=1.56250e-02;AQ=1328
  • the command you used to invoke vcfanno
    See above

  • the full error message
    None

@brentp
Copy link
Owner

brentp commented Jul 19, 2019

thanks for the clear report. i'll see if i can get a fix in shortly

@brentp brentp closed this as completed in f71c45f Jul 19, 2019
@brentp
Copy link
Owner

brentp commented Jul 19, 2019

Hi, this was an easy fix. If you want you can try the (linux) binary attached here. And
vcfanno_dev.gz

I should have a release out before august.

brentp added a commit that referenced this issue Jul 19, 2019
@ptn24
Copy link
Author

ptn24 commented Jul 19, 2019

Swift response. Thank you, @brentp!

@brentp
Copy link
Owner

brentp commented Jul 30, 2019

this is out in new release.

@ptn24
Copy link
Author

ptn24 commented Jul 30, 2019

Thank you, @brentp. I verified vcfanno 0.3.2

@brentp
Copy link
Owner

brentp commented Jul 30, 2019

cheers. thanks for following up and for providing the great test-case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants