Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hailctl] Add hailctl batch submit #12471

Merged
merged 15 commits into from
Dec 1, 2022

Conversation

daniel-goldstein
Copy link
Contributor

@daniel-goldstein daniel-goldstein commented Nov 16, 2022

CHANGELOG: Add hailctl batch submit command to run local scripts inside batch jobs.

Can run the following foo.py:

import hail as hl

with open('baz/bar.txt', 'r') as f:
    print(f.read())

hl.init(backend='batch')
print(hl.utils.range_table(10).collect())

with the command hailctl batch submit foo.py --name foobatch --files=baz

@danking
Copy link
Contributor

danking commented Nov 18, 2022

I'm inclined to set HAIL_QUERY_BACKEND=batch by default, though I can see how this would also be useful to run a local-mode Spark-Hail.

local_file = file['from']
cloud_file = file['to']
in_file = b.read_input(cloud_file)
j.command(f'ln -s {in_file} {local_file}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to defend against users mistakenly providing fully qualified paths to --files. I guess just os.basename is enough. Might want to verify the user didn't specify two files with the same name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to use the relative path from the user's current directory, so as to be compatible with scripts reading from relative paths.

@@ -0,0 +1,58 @@
import asyncio
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah this is so delightfully short!

@daniel-goldstein
Copy link
Contributor Author

I'm inclined to set HAIL_QUERY_BACKEND=batch by default, though I can see how this would also be useful to run a local-mode Spark-Hail.

I'm happy to be overruled here but I like the "just copy their config" approach for configurations like these where both the local and batch backend could make sense, so the behavior is as consistent as is reasonable across environments. It will Just Work in the way you want if the user has the backend set to batch in their config ;)

@danking
Copy link
Contributor

danking commented Nov 22, 2022

Let's enumerate the use-cases. In all cases, the point is to run the client script in the cloud.

  1. Submit a QoB pipeline.
  2. Submit a Batch pipeline.
  3. Submit a single-Python-job batch to execute some python code remotely.
  4. Same as (3) but using local-mode Hail (as opposed to (1)).

I think use-case (4) is rare, though useful. If we're pitching hailctl batch submit as the QoB replacement for hailctl dataproc submit, then I think people will be very confused if (4) happens.

I am somewhat frustrated that we have these different deployment strategies. It seems like unnecessary intellectual burden for a user to think about when they're just trying to run some Hail code.

One option is to have different commands. Submitting a QoB job is done like this:

hailctl qob submit ...

And submitting a batch job is done like this:

hailctl batch submit ...

It pains me to think about trying to explain the difference between (1) and (4) to a scientist. At least with two distinct commands we can sort of ignore (4) and say: if you want to run Hail Query on Hail Batch use hailctl qob and if you want to run a normal batch pipeline use hailctl batch

@danking
Copy link
Contributor

danking commented Nov 22, 2022

Or even hailctl query submit ?

@daniel-goldstein
Copy link
Contributor Author

Hmph, ya this seems annoyingly complicated, and I'd prefer to make one command with opinionated but configurable defaults than have different commands.

One thing that feels inconsistent here is what we do in the Batch interface. We don't have the equivalent of HAIL_QUERY_BACKEND and a user specifically has to create a ServiceBackend as opposed to relying on the environment dictating which model to use. I feel like it would be OK if we documented hailctl batch submit as "distribute everything on batch by default"

@daniel-goldstein
Copy link
Contributor Author

daniel-goldstein commented Nov 22, 2022

Or even hailctl query submit ?

I like this better than hailctl qob submit, but mostly prefer just having one command. It seems needlessly restrictive to market it strictly as a query thing, but I get the desire to well, market query

@daniel-goldstein daniel-goldstein marked this pull request as ready for review November 22, 2022 19:18
@daniel-goldstein
Copy link
Contributor Author

Re: testing, I wanted to wait on the spawned batch and ensure that it passed, but I had trouble doing that because it looks like the new rich progress bars are printed to stdout so I can't make use of json output and jq. Can we print the progress bars to stderr instead?

@danking
Copy link
Contributor

danking commented Nov 22, 2022

As an aside, we should definitely have a HAIL_BATCH_BACKEND and associated config variables. There is no end to my annoyance that hb.Batch() gives me a local backend batch.

It seems to me that, given the precedent of hailctl dataproc submit, hailctl batch submit conveys the intent to use QoB or Batch-in-Batch, not "local mode Batch" or "local mode Hail". It seems very reasonable to have a --local-mode-query override (I think we should ignore local mode Batch as much as possible since container-in-container is fraught).

We need a better name for local mode Spark or Query. I'm slowly realizing that lots of people don't realize you can use Hail on a laptop. Are there other tools that have already settled on terminology here?

@danking
Copy link
Contributor

danking commented Nov 22, 2022

Re: testing, I wanted to wait on the spawned batch and ensure that it passed, but I had trouble doing that because it looks like the new rich progress bars are printed to stdout so I can't make use of json output and jq. Can we print the progress bars to stderr instead?

I think this is possible but I haven't looked into it. I'm opposed to stderr in Notebooks because I don't want it to appear with red background. I've recently realized that a lot of users find it concerning that we print so much red background text. You might also have a quiet mode? If you don't wait on the batch, do you still get a progress bar?

@daniel-goldstein
Copy link
Contributor Author

daniel-goldstein commented Nov 22, 2022

Ya the progress bars I mentioned were the file upload summary and the batch submission. I can try to make a quiet mode so it can cleanly print just the json. My intention was not to wait it in submit.py but to print out a json and then use hailctl batch wait

@danking
Copy link
Contributor

danking commented Nov 22, 2022

My inclination is to assume the arg parser works correctly and to just test calling a function that returns the information you need.

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@daniel-goldstein
Copy link
Contributor Author

daniel-goldstein commented Nov 22, 2022

As an aside, we should definitely have a HAIL_BATCH_BACKEND and associated config variables. There is no end to my annoyance that hb.Batch() gives me a local backend batch.

It seems to me that, given the precedent of hailctl dataproc submit, hailctl batch submit conveys the intent to use QoB or Batch-in-Batch, not "local mode Batch" or "local mode Hail". It seems very reasonable to have a --local-mode-query override (I think we should ignore local mode Batch as much as possible since container-in-container is fraught).

We need a better name for local mode Spark or Query. I'm slowly realizing that lots of people don't realize you can use Hail on a laptop. Are there other tools that have already settled on terminology here?

@danking Wait so shall I set HAIL_QUERY_BACKEND=batch? I can follow up with a PR that adds a similar configuration variable for batch so we don't default to the local backend, but it would be nice to keep this behavior consistent from the start

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed I was too quick to approve.

build.yaml Outdated Show resolved Hide resolved
hail/python/hailtop/hailctl/batch/submit.py Outdated Show resolved Hide resolved
hail/python/hailtop/hailctl/batch/wait.py Outdated Show resolved Hide resolved
@daniel-goldstein
Copy link
Contributor Author

@danking bump

Copy link
Contributor

@danking danking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huzzah

@danking danking merged commit 60b43d4 into hail-is:main Dec 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants