Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[query/vds combiner] Change sanity checks on combiner construction #14087

Merged
merged 3 commits into from
Dec 9, 2023

Conversation

chrisvittal
Copy link
Collaborator

@chrisvittal chrisvittal commented Dec 8, 2023

  • Add assertion to load_combiner and new_combiner to fail if the output vds exists
  • Remove assertion that disallows empty gvcfs and vdses in VariantDatasetCombiner.__init__

Resolves #14079

If a finished VariantDatasetCombiner is loaded, it will have no vds
paths or gvcf paths. Before this change, this would cause the loading to
fail, especially in the logic to load extant plans in `new_combiner` for
resuming in progress combines. This will lead to the plan being
regenerated, potentially causing work to be done again. Remove that
sanity check in `__init__` as a nearly identical one exists in
`new_combiner`.

Also, add the string of the exception to the warning about invalid plans
when loading a combiner in `new_combiner`.
…ut exists

We don't want users to overwrite combiner output that already exists. So
in the public constructors, load_combiner and `new_combiner`,
(`VariantDatasetCombiner.__init__` is not a public constructor) add
a check for existance of the `_SUCCESS` files in the two matrix table
components of the output vds.
@danking
Copy link
Contributor

danking commented Dec 8, 2023

@chrisvittal

test/hail/conftest.py:10: in <module>
    from hail import current_backend, init, reset_global_randomness
/usr/local/lib/python3.9/dist-packages/hail/__init__.py:54: in <module>
    from . import vds  # noqa: E402
/usr/local/lib/python3.9/dist-packages/hail/vds/__init__.py:1: in <module>
    from . import combiner
/usr/local/lib/python3.9/dist-packages/hail/vds/combiner/__init__.py:2: in <module>
    from .variant_dataset_combiner import new_combiner, load_combiner, VariantDatasetCombiner, VDSMetadata
/usr/local/lib/python3.9/dist-packages/hail/vds/combiner/variant_dataset_combiner.py:15: in <module>
    from hail.vds import VariantDataset
E   ImportError: cannot import name 'VariantDataset' from partially initialized module 'hail.vds' (most likely due to a circular import) (/usr/local/lib/python3.9/dist-packages/hail/vds/__init__.py)
=========================== short test summary info ============================

@danking
Copy link
Contributor

danking commented Dec 8, 2023

from ..variant_dataset import VariantDataset would probably work.

@danking danking merged commit f355886 into hail-is:main Dec 9, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants