[qob] Support loading RGs from FASTA files #12736

daniel-goldstein · 2023-02-27T22:58:23Z

I left the changes to Query and Batch in separate commits for ease of review. I put these in the same PR because we don't really have standalone testing for JVM Jobs outside of Query-on-Batch so the FASTA use-case serves as a test here that cloudfuse is working properly for JVM Jobs. Would be great if Jackie you could review the batch commit and Tim could review the query commit.

Hail Query

Added support for the FROM_FASTA_FILE rpc and the service backend now passes sequence file information from RGs in every rpc
Refactored the liftover handling in service_backend to not redundantly store liftover maps and just take them from the ReferenceGenome objects like I did for sequence files. This means that add/remove liftover/sequence functions on the Backend are just intended to sync up the backend with python, which is a no-op for the service backend.
Don't localize the index file on fromFASTAFile/addSequence before creating the index object. FastaSequenceIndex just loads the whole file on construction so might as well stream it in from whatever storage it's in.
FASTA caching is left alone because those files will be mounted and unmounted from the jvm container over the life of the job. JVM doesn't have to worry about disk usage because that's handled by Batch XFS quotas, so long as the service backend requests enough storage to fit the FASTA file. Batch will make sure that a given bucket (and therefore a given FASTA file) is mounted once per-user on a batch worker.

Hail Batch

Added support for read-only cloudfuse mounts for JVM jobs
These mounts are shared between jobs on the same machine from the same user
I did not change DockerJobs, but they could be very easily adapted to use this new mount-sharing code.

jigold

This looks great! Will give it another look on Monday.

batch/batch/worker/worker.py

addressed

daniel-goldstein · 2023-03-21T18:36:18Z

@tpoterba I just realized I forgot to propagate the FUSE config through to worker jobs. Should I be and I got lucky that the singular test is just doing everything driver-side? Or is there a test we can write to ensure that worker jobs access the FASTA data?

danking · 2023-03-27T17:26:44Z

@daniel-goldstein, BackendUtils.scala collectDArray has an optimization for single partition jobs. They're run on the driver.

danking · 2023-03-27T17:27:48Z

Admittedly a debatable choice but worthwhile in the context of tests which generate tiny pipelines dominated by CDA overhead

… have the fuse mount

daniel-goldstein · 2023-04-06T14:25:39Z

@tpoterba fixed the config issue and changed n_partitions to ensure workers are scheduled for the FASTA reading. I tested this on a single batch worker so the jobs overlapped and flexed the shared mount code, but we don't really have a guarantee in our test setup because batch has no way to force collocation of jobs (and even so we can't exactly force that the runtimes will overlap). I suppose if there's an issue here it will bubble up as a nondeterministic failure. Not great but perhaps good enough for now?

tpoterba · 2023-04-06T14:28:08Z

Right now we run dataproc tests only on release, not on every commit, because they're too expensive/slow. That way we never release a version that can't pass. I wonder if that's also the right strategy here -- adding QoB release tests for things that only go wrong at scale.

That said, I don't want to block on that. Awesome change, thank you!

jigold

Give me a day to look this over one more time.

jigold

Just some small things.

jigold · 2023-04-07T18:17:59Z

hail/src/main/scala/is/hail/backend/service/ServiceBackend.scala

@@ -157,6 +159,9 @@ class ServiceBackend(
      if (backendContext.workerMemory != "None") {
        resources = resources.merge(JObject(("memory" -> JString(backendContext.workerMemory))))
      }
+      if (backendContext.storageRequirement != "0Gi") {


Just double checking -- do we control this number and are certain it will be "0Gi" and not "0", "0Mi", etc.?

Yes, this number is set in service_backend.py. There is also no need for any compatibility requirements between service_backend.py and ServiceBackend.scala so this can change as we wish as long as they agree with each other in the same commit.

jigold · 2023-04-07T18:55:30Z

hail/python/hail/backend/service_backend.py

+    # and there is no persistent backend to keep in sync.
+    # Sequence and liftover information are passed on RPC
+    def add_sequence(self, name, fasta_file, index_file):  # pylint: disable=unused-argument
+        # FIXME Not only should this be in the cloud, it should be in the *right* cloud


I thought I had made it easy to check this. Can we modify self.validate_file_scheme or add a new method that asserts certain URI schemes for a particular cloud? My VEP branch exposes a cloud endpoint to the front end so maybe we do this separately....

Ya I don't think we can do this until the client knows which cloud it is submitting jobs to, so I think best to come back to this after your VEP PR.

jigold · 2023-04-07T20:48:30Z

batch/batch/worker/worker.py

+                if config['mounted']:
+                    bucket = config['bucket']
+                    assert bucket
+                    mount_path = self.cloudfuse_data_path(bucket)


Can we set config['mounted'] to False here?

jigold · 2023-04-07T20:49:43Z

batch/batch/worker/worker.py

@@ -2509,6 +2668,8 @@ def __init__(self, client_session: httpx.ClientSession):

        self.headers: Optional[Dict[str, str]] = None

+        self.cloudfuse_mount_manager = ReadOnlyCloudfuseManager()


Do we need a shutdown method for the mount_manager?

I don't think so. When the worker exits, all jobs that were running on the worker should have gone through their cleanup step and unmounted their respective buckets, so there should be no remaining work for the ReadOnlyCloudfuseManager to do.

addressed

daniel-goldstein force-pushed the qob-fasta branch 4 times, most recently from eddf3bd to 8d89cdc Compare March 1, 2023 15:12

daniel-goldstein force-pushed the qob-fasta branch from 44b7c2c to a537793 Compare March 14, 2023 23:03

daniel-goldstein added the stacked PR label Mar 14, 2023

daniel-goldstein force-pushed the qob-fasta branch 2 times, most recently from 0ff232e to 93e60c1 Compare March 15, 2023 13:31

daniel-goldstein removed the stacked PR label Mar 17, 2023

daniel-goldstein force-pushed the qob-fasta branch from 93e60c1 to 857fdab Compare March 17, 2023 20:10

[batch] Create read-only fuse mounts shared between concurrent jvm jobs

b371e1c

daniel-goldstein force-pushed the qob-fasta branch from 857fdab to b36d77d Compare March 17, 2023 20:12

daniel-goldstein assigned jigold and tpoterba Mar 17, 2023

daniel-goldstein marked this pull request as ready for review March 17, 2023 20:16

[qob] Support loading RGs from FASTA files

37ca184

daniel-goldstein force-pushed the qob-fasta branch from b36d77d to 37ca184 Compare March 17, 2023 20:21

jigold previously requested changes Mar 17, 2023

View reviewed changes

batch/batch/worker/worker.py Show resolved Hide resolved

batch/batch/worker/worker.py Show resolved Hide resolved

batch/batch/worker/worker.py Show resolved Hide resolved

batch/batch/worker/worker.py Show resolved Hide resolved

add additional clarification to comment

3590f86

fix

c45d77b

daniel-goldstein added 6 commits March 30, 2023 09:38

add min_partitions to force worker jobs for the FASTA test

89b5083

allow storage requests

038b01a

should start to see this fail on the tests now with workers that dont…

a004e9b

… have the fuse mount

Merge branch 'main' into qob-fasta

d6a0b52

thread storage and cloudfuse config through to worker jobs

a982be0

Merge branch 'main' into qob-fasta

cd5bd2b

fixes

4d66b57

tpoterba approved these changes Apr 6, 2023

View reviewed changes

jigold suggested changes Apr 6, 2023

View reviewed changes

jigold previously requested changes Apr 7, 2023

View reviewed changes

set mounted to False after unmount

67ff3e1

jigold approved these changes Apr 10, 2023

View reviewed changes

danking merged commit 39b1606 into hail-is:main Apr 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[qob] Support loading RGs from FASTA files #12736

[qob] Support loading RGs from FASTA files #12736

daniel-goldstein commented Feb 27, 2023 •

edited

Loading

jigold left a comment

daniel-goldstein commented Mar 21, 2023 •

edited

Loading

danking commented Mar 27, 2023 •

edited

Loading

danking commented Mar 27, 2023 •

edited

Loading

daniel-goldstein commented Apr 6, 2023

tpoterba commented Apr 6, 2023

jigold left a comment

jigold left a comment

jigold Apr 7, 2023

daniel-goldstein Apr 10, 2023

jigold Apr 7, 2023

daniel-goldstein Apr 10, 2023

jigold Apr 7, 2023

jigold Apr 7, 2023

daniel-goldstein Apr 10, 2023

		@@ -2509,6 +2668,8 @@ def __init__(self, client_session: httpx.ClientSession):

		self.headers: Optional[Dict[str, str]] = None

		self.cloudfuse_mount_manager = ReadOnlyCloudfuseManager()

[qob] Support loading RGs from FASTA files #12736

[qob] Support loading RGs from FASTA files #12736

Conversation

daniel-goldstein commented Feb 27, 2023 • edited Loading

Hail Query

Hail Batch

jigold left a comment

Choose a reason for hiding this comment

daniel-goldstein commented Mar 21, 2023 • edited Loading

danking commented Mar 27, 2023 • edited Loading

danking commented Mar 27, 2023 • edited Loading

daniel-goldstein commented Apr 6, 2023

tpoterba commented Apr 6, 2023

jigold left a comment

Choose a reason for hiding this comment

jigold left a comment

Choose a reason for hiding this comment

jigold Apr 7, 2023

Choose a reason for hiding this comment

daniel-goldstein Apr 10, 2023

Choose a reason for hiding this comment

jigold Apr 7, 2023

Choose a reason for hiding this comment

daniel-goldstein Apr 10, 2023

Choose a reason for hiding this comment

jigold Apr 7, 2023

Choose a reason for hiding this comment

jigold Apr 7, 2023

Choose a reason for hiding this comment

daniel-goldstein Apr 10, 2023

Choose a reason for hiding this comment

daniel-goldstein commented Feb 27, 2023 •

edited

Loading

daniel-goldstein commented Mar 21, 2023 •

edited

Loading

danking commented Mar 27, 2023 •

edited

Loading

danking commented Mar 27, 2023 •

edited

Loading