Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenotypeGVCFs is reporting Buffer overflow errors #7976

Open
baoxingsong opened this issue Aug 5, 2022 · 6 comments
Open

GenotypeGVCFs is reporting Buffer overflow errors #7976

baoxingsong opened this issue Aug 5, 2022 · 6 comments

Comments

@baoxingsong
Copy link

Affected tool(s) or class(es)

GenotypeGVCFs is reporting:


[TileDB::ArrayIterator] Error: Cannot advance iterator; Buffer overflow.
terminate called after throwing an instance of 'VariantStorageManagerException'
  what():  VariantStorageManagerException exception : VariantArrayCellIterator increment failed
TileDB error message : [TileDB::ArrayIterator] Error: Cannot advance iterator; Buffer overflow

Affected version(s)

Using GATK jar /home/xuql/miniconda3/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/xuql/miniconda3/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar --version
The Genome Analysis Toolkit (GATK) v4.2.6.1
HTSJDK Version: 2.24.1
Picard Version: 2.27.1

Description

Hi, I developed AnchorWave to call long indels(could be a couple of Mb). We are trying to connect the AnchorWave variant calling result with GATK to generate VCF files.

We generated whole genome alignments for 26 maize accession via AnchorWave. And we wrote out own code to generate GVCF files from the outputs of AnchorWave. Those GVCF files works well with GATK GenomicsDBImport. While, the GenotypeGVCFs function is reporting Buffer overflow errors and could generate the complete VCF files.

Here is the command we used:

gatk --java-options "-Xmx100g" GenotypeGVCFs -R Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/NAM_anchorwave_song/NAM_out_gatk9 -O gatk9.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --tmp-dir /home/xuql/NAM_anchorwave_song/temp9 --gcs-max-retries 1000

@nalinigans
Copy link
Collaborator

Are you able to run gatk SelectVariants on the workspace?

@nalinigans
Copy link
Collaborator

How large is your machine, memory-wise? Can you reduce the -Xmx100g to something smaller, so the native GenomicsDB process does not get starved out?

@shuaiwang2
Copy link

Hi, @nalinigans
By using the same data, procedure and software like @baoxingsong, I found that the super-indel(the length is 34461688 at chromosome 9; 10668738 at chromosome 10 respectively) lead to the same Buffer overflow error and could you tell me if GenotypeGVCFs can identify >10M indels and what parameters can set it ? thank you.

@nalinigans
Copy link
Collaborator

@shuaiwang2, can you please paste your entire command to gatk GenotypGVCfs? And the error section from running it? Can you also paste your command to gatk GenomicsDBImport? Thanks.

@shuaiwang2
Copy link

@nalinigans ,I have put the command for chromosome 9 and chromosome 10 is similar, thanks.

gatk --java-options "-Xmx40g" GenotypeGVCFs -R /home/xuql/copyNAM/B73/Zm-B73-REFERENCE-NAM-5.0.fa -stand-call-conf 0 -ploidy 1 -V gendb:///home/xuql/copyNAM/NAM_out_gatk9_1 -O /home/xuql/copyNAM/gatk93.vcf.gz --cloud-prefetch-buffer 10000 --cloud-index-prefetch-buffer 10000 --genomicsdb-max-alternate-alleles 110 --max-alternate-alleles 100 --gcs-max-retries 1000

GenotypeGVCFs error

20:16:05.938 INFO  ProgressMeter -            9:3448230              8.8               3294000         373171.8
20:16:15.978 INFO  ProgressMeter -            9:3553392              9.0               3386000         376457.9
20:16:26.015 INFO  ProgressMeter -            9:3646052              9.2               3471000         378861.9
[TileDB::ArrayIterator] Error: Cannot advance iterator; Buffer overflow.
terminate called after throwing an instance of 'VariantStorageManagerException'
  what():  VariantStorageManagerException exception : VariantArrayCellIterator increment failed
TileDB error message : [TileDB::ArrayIterator] Error: Cannot advance iterator; Buffer overflow

gatk --java-options "-Xmx50g -Xms5g" GenomicsDBImport \
                -V /home/xuql/copyNAM/B97/B97ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML247/CML247ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML333/CML333ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/HP301/HP301ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki3/Ki3ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M37W/M37WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC350/NC350ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh7B/Oh7BToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tzi8/Tzi8ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML103/CML103ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML277/CML277ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML52/CML52ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Il14H/Il14HToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ky21/Ky21ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Mo18W/Mo18WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/NC358/NC358ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/P39/P39ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML228/CML228ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML322/CML322ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/CML69/CML69ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ki11/Ki11ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/M162W/M162WToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Ms71/Ms71ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Oh43/Oh43ToB73.gvcf.gz \
                -V /home/xuql/copyNAM/Tx303/Tx303ToB73.gvcf.gz \
        --batch-size 1 \
      --genomicsdb-workspace-path /home/xuql/copyNAM/NAM_out_gatk9_1 \
      --genomicsdb-segment-size 1048576 --genomicsdb-vcf-buffer-size 50000000 -L 9

#Elapsed time: 52.78 minutes. Runtime.totalMemory()=6761218048

I run LeftAlignAndTrimVariants at default parameter, ordered all length of > 200 indels from long to short for all chromosomes. the result is as followed, we found the aborted location nearby the super-indel.

['10:56:14.335 INFO  LeftAlignAndTrimVariants - Indel is too long (34461688) at position 9:3695105; skipping that record. Set --max-indel-length >= 34461688\n',
 '10:56:30.429 INFO  LeftAlignAndTrimVariants - Indel is too long (10668738) at position 10:33212598; skipping that record. Set --max-indel-length >= 10668738\n',
 '10:56:28.937 INFO  LeftAlignAndTrimVariants - Indel is too long (9101264) at position 10:14179; skipping that record. Set --max-indel-length >= 9101264\n',
 '10:56:30.038 INFO  LeftAlignAndTrimVariants - Indel is too long (7918835) at position 10:22996027; skipping that record. Set --max-indel-length >= 7918835\n',
 '11:31:49.968 INFO  LeftAlignAndTrimVariants - Indel is too long (7154442) at position 6:16715313; skipping that record. Set --max-indel-length >= 7154442\n',

@lbergelson
Copy link
Member

In general our germline tools are designed for short variants. I don't think any of them will handle a millions long indel well or at all. The SV or CNV tools sound like a better fit although I'm not sure exactly if they cover your use case exactly. Typically we process short variants and long variants like this separately.

We should be detecting this variant up front on when loading into genomicsDB if it's going to be problematic to retrieve it, and we should be giving a better error message. I don't think we'll be able to handle it through GenotypeGVCFs in any helpful way though. (The best I can imagine it doing is passing it through ungenotyped.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants