Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query] Erroneous error on export_vcf when exporting haploid calls #14330

Closed
shengqh opened this issue Feb 21, 2024 · 1 comment · Fixed by #14375
Closed

[query] Erroneous error on export_vcf when exporting haploid calls #14330

shengqh opened this issue Feb 21, 2024 · 1 comment · Fixed by #14375
Assignees

Comments

@shengqh
Copy link

shengqh commented Feb 21, 2024

What happened?

We have a dataset with joint call multi-sample VCF files (not from imputation). We converted those multi-sample VCFs to hailmatrix tables with following WDL code in Terra:

import hail as hl

hl.init(spark_conf={"spark.driver.memory": "~{memory_gb}g"})

callset = hl.import_vcf("~{source_vcf}",
                        array_elements_required=False,
                        force_bgz=True,
                        reference_genome='~{reference_genome}')

callset.write("~{target_prefix}", overwrite=True)

After sample filtering, we want to export it to VCF.

import hail as hl

hl.init(spark_conf={
    "spark.driver.memory": "~{memory_gb}g",
    "spark.local.dir": "./tmp"
  },
  tmp_dir="./tmp",
  local_tmpdir="./tmp",
  idempotent=True)
hl.default_reference("~{reference_genome}")

mt = hl.read_matrix_table("~{input_hail_mt_path}")
hl.export_vcf(mt, "~{hail_vcf}", tabix = False)

It worked on chr1 to chr22, but failed at chrX and chrY with error: VCF spec does not support phased haploid calls.

What should we do to export chrX and chrY?

Version

0.2.127-py3.11

Relevant log output

Traceback (most recent call last):
File "<stdin>", line 14, in <module>
File "<decorator-gen-1448>", line 2, in export_vcf
File "/usr/local/lib/python3.10/dist-packages/hail/typecheck/check.py", line 584, in wrapper
return __original_func(*args_, **kwargs_)
File "/usr/local/lib/python3.10/dist-packages/hail/methods/impex.py", line 634, in export_vcf
Env.backend().execute(ir.MatrixWrite(dataset._mir, writer))
File "/usr/local/lib/python3.10/dist-packages/hail/backend/backend.py", line 190, in execute
raise e.maybe_user_error(ir) from None
File "/usr/local/lib/python3.10/dist-packages/hail/backend/backend.py", line 188, in execute
result, timings = self._rpc(ActionTag.EXECUTE, payload)
File "/usr/local/lib/python3.10/dist-packages/hail/backend/py4j_backend.py", line 220, in _rpc
raise fatal_error_from_java_error_triplet(
hail.utils.java.FatalError: HailException: VCF spec does not support phased haploid calls.

Java stack trace:
is.hail.utils.HailException: VCF spec does not support phased haploid calls.
at __C83collect_distributed_array_matrix_vcf_writer.apply_region154_245(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply_region133_246(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply_region1_250(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply(Unknown Source)
at __C83collect_distributed_array_matrix_vcf_writer.apply(Unknown Source)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$19(BackendUtils.scala:142)
at is.hail.utils.package$.using(package.scala:665)
at is.hail.annotations.RegionPool.scopedRegion(RegionPool.scala:170)
at is.hail.backend.BackendUtils.$anonfun$collectDArray$18(BackendUtils.scala:141)
at is.hail.backend.spark.SparkBackend$$anon$5.compute(SparkBackend.scala:474)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:136)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)



Hail version: 0.2.127-bb535cd096c5
Error summary: HailException: VCF spec does not support phased haploid calls.
@shengqh shengqh added the needs-triage A brand new issue that needs triaging. label Feb 21, 2024
@chrisvittal chrisvittal added bug and removed needs-triage A brand new issue that needs triaging. labels Feb 28, 2024
@chrisvittal chrisvittal changed the title VCF spec does not support phased haploid calls when export chrX and chrY [query] Erroneous error on export_vcf when exporting haploid calls Feb 28, 2024
@chrisvittal
Copy link
Collaborator

chrisvittal commented Feb 28, 2024

Thanks for the report! This is a bug.

The export_vcf method will currently just error universally on haploid calls, rather than checking if the haploid call is phased.

case v: SCallValue =>
val ploidy = v.ploidy(cb)
cb.if_(ploidy.ceq(0), cb._fatal("VCF spec does not support 0-ploid calls."))
cb.if_(ploidy.ceq(1), cb._fatal("VCF spec does not support phased haploid calls."))
val c = v.canonicalCall(cb)
_writeB(cb, Code.invokeScalaObject1[Int, Array[Byte]](Call.getClass, "toUTF8", c))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants