You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The first problem is that the @typecheck on hl.init, hl.init_spark, etc. only allows a built-in reference genome.
Even if we relax that requirement, we encounter a deeper problem: creating the reference genome initializes Hail. In particular, we call Env.backend() (which calls Env.hc(), which forces initialization) so that we can call add_reference.
What does initialization mean? Historically, it meant connection to or starting a JVM/Spark process. In QoB/ServiceBackend, initialization just loads configurations, it doesn't really do anything irreversible. Regardless of what it does, we only allow initialization once.
OK, so, there's two possible routes to fix this problem:
Rewrite ReferenceGenome.__init__ such that it does not initialize Hail. You have to decide how reference genomes are ultimately communicated to the backend. Do you hang a list of all created reference genomes off of the ReferenceGenome class? Do you require explicit registering a la hl.register_reference? The latter seems a bit silly. The former seems OK, but you could also ...
Allow modification of the default reference after initialization. The default reference genome is just a field on the HailContext: _default_ref which is accessed through hl.default_reference(). Just modify hl.default_reference to return the reference with no arguments and set the reference with one argument. Now this works:
We should add docs that describe how to do this to:
hl.default_reference, obviously
Deprecate the reference_genome parameter to hl.init and instruct users to use hl.default_reference. Inform that this parameter has confusing interactions with ReferenceGenome, so we're removing it.
hl.ReferenceGenome.__init__ should refer users to that.
I think we should also make a separate PR that improves the hl.import_vcf error message. If the backend throws an error like
HailException: Invalid locus '1:249367215' found. Position '249367215' is not within the range [1-249250621] for reference genome 'GRCh37'.
import_vcf should catch and wrap with another exception that suggests you use a reference_genome parameter or hl.default_reference.
CHANGELOG: Deprecate default_reference parameter to hl.init, users
should use `default_reference` with an argument to set new default
references usually shortly after init.
Resolves#13856
---------
Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
What happened?
Suppose you're working with the Wheat genome. The following is seemingly correct code but it doesn't work:
The first problem is that the
@typecheck
onhl.init
,hl.init_spark
, etc. only allows a built-in reference genome.Even if we relax that requirement, we encounter a deeper problem: creating the reference genome initializes Hail. In particular, we call
Env.backend()
(which callsEnv.hc()
, which forces initialization) so that we can calladd_reference
.What does initialization mean? Historically, it meant connection to or starting a JVM/Spark process. In QoB/ServiceBackend, initialization just loads configurations, it doesn't really do anything irreversible. Regardless of what it does, we only allow initialization once.
OK, so, there's two possible routes to fix this problem:
ReferenceGenome.__init__
such that it does not initialize Hail. You have to decide how reference genomes are ultimately communicated to the backend. Do you hang a list of all created reference genomes off of theReferenceGenome
class? Do you require explicit registering a lahl.register_reference
? The latter seems a bit silly. The former seems OK, but you could also ..._default_ref
which is accessed throughhl.default_reference()
. Just modifyhl.default_reference
to return the reference with no arguments and set the reference with one argument. Now this works:Version
0.2.124
Relevant log output
No response
The text was updated successfully, but these errors were encountered: