[qob][batch] do not list all jobs on failure (plus: types!) #13500

danking · 2023-08-25T15:20:53Z

CHANGELOG: In Query-on-Batch, the client-side Python code will not try to list every job when a QoB batch fails. This could take hours for long-running pipelines or pipelines with many partitions.

I also added API types to the list jobs end point because I have to go hunting for this every time anyway. Seems better to have this information at our digital fingertips.

CHANGELOG: In Query-on-Batch, the client-side Python code will not try to list every job when a QoB batch fails. This could take hours for long-running pipelines or pipelines with many partitions. I also added API types to the list jobs end point because I have to go hunting for this every time anyway. Seems better to have this information at our digital fingertips.

daniel-goldstein · 2023-08-25T15:34:29Z

hail/python/hailtop/batch_client/aioclient.py

@@ -462,14 +474,21 @@ async def wait(self,
        with BatchProgressBar(disable=disable_progress_bar) as progress2:
            return await self._wait(description, progress2, disable_progress_bar, starting_job)

-    async def debug_info(self):
+    async def debug_info(self,
+                         _job_filter: Optional[Callable[[JobListEntry], bool]] = None,


Just a drive by comment: You can use the query string q to search for failed jobs instead of filtering client-side

jigold · 2023-08-25T15:33:22Z

hail/python/hailtop/batch_client/types.py

+from typing_extensions import NotRequired
+
+
+class JobListEntry(TypedDict):


These both need to be versioned. Otherwise, we're going to run into issues with backwards compatibility.

they have been so versioned.

jigold · 2023-08-25T15:52:21Z

I don't understand how the backwards compatibility is going to work in this case:

I have a V1 client, the server is sending me a V2 response with extra fields (or removed fields). What happens here?

danking · 2023-08-25T15:54:49Z

A V1 client cannot hit the V2 endpoint because it doesn't know it exists.

danking · 2023-08-25T16:01:02Z

To be clear, all this does is encode in the types what we already guarantee: if version X returns {'foo': int}, then all future versions of the server must at least return a dict containing the 'foo' key with an integer. If those servers start returning {'bar': int}, that would break all the old clients (and violate the types that we wrote in those clients).

jigold

Just so I understand correctly, we can add fields to the V1Alpha type later on, but not remove them or change the type?

jigold · 2023-08-25T16:49:54Z

batch/batch/front_end/front_end.py

@@ -252,7 +253,9 @@ async def _handle_api_error(f: Callable[P, Awaitable[T]], *args: P.args, **kwarg
        raise e.http_response()


-async def _query_batch_jobs(request: web.Request, batch_id: int, version: int, q: str, last_job_id: Optional[int]):
+async def _query_batch_jobs(


You're missing the V1Alpha here.

danking · 2023-08-31T18:55:48Z

I merged main and was forced to fix the pyright errors, which was great! I found two bugs with these new types:

It's possible for a spec file to be missing which means the spec is None which would fail our get job front end code when it tries to fix up the resources.
In the aioclient, we assumed the attributes was present and a dict but that key can be completely missing from the dict.

I also noticed cost_str has had the wrong type since always.

danking · 2023-08-31T18:57:46Z

we can add fields to the V1Alpha type later on, but not remove them or change the type?

We should not remove or change fields, the types of fields, or the meaning of fields of JSON dicts returned by our API endpoints because that would break backwards compatibility.

The types on the other hand can change willy nilly, mypy and pyright have no way of knowing that we changed the definition of, for example, GetJobResponseV1Alpha. They will however, go report all the places in our codebase where we expected certain fields to exist if we remove those fields from GetJobResponseV1Alpha.

x

danking · 2023-09-05T17:46:45Z

@jigold bump

danking assigned jigold Aug 25, 2023

rebase cruft

8b1ff3b

daniel-goldstein reviewed Aug 25, 2023

View reviewed changes

jigold previously requested changes Aug 25, 2023

View reviewed changes

KoalaQin mentioned this pull request Aug 25, 2023

update Java 8 installation resource and Hail version broadinstitute/gnomad_qc#408

Merged

add v1

4765598

mispelling

b0bd3b5

use query string isntead of client-side filtering

16649c7

match versioning in URLs

902ac13

jigold previously requested changes Aug 25, 2023

View reviewed changes

danking force-pushed the sb-should-not-list-all-jobs branch from 85381a0 to 902ac13 Compare August 25, 2023 17:36

Dan King added 2 commits August 31, 2023 14:30

also fix front_end.py

c86c4b9

Merge remote-tracking branch 'hi/main' into sb-should-not-list-all-jobs

0418da4

danking mentioned this pull request Aug 31, 2023

[release] 0.2.121 #13529

Merged

Dan King added 6 commits August 31, 2023 18:39

sort import blocks

eec7cb9

Merge remote-tracking branch 'hi/main' into sb-should-not-list-all-jobs

48e8a1d

cleanup merge conflict issues

02295e6

cast away types when creating UI elements

ffdcfad

sort

629c3be

Merge remote-tracking branch 'hi/main' into sb-should-not-list-all-jobs

3b5286a

jigold approved these changes Sep 5, 2023

View reviewed changes

danking added the prio:high label Sep 6, 2023

danking merged commit 4fb659c into hail-is:main Sep 6, 2023
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[qob][batch] do not list all jobs on failure (plus: types!) #13500

[qob][batch] do not list all jobs on failure (plus: types!) #13500

danking commented Aug 25, 2023

daniel-goldstein Aug 25, 2023

danking Aug 25, 2023

jigold Aug 25, 2023

jigold commented Aug 25, 2023

danking commented Aug 25, 2023

danking commented Aug 25, 2023

jigold left a comment

jigold Aug 25, 2023

danking commented Aug 31, 2023

danking commented Aug 31, 2023

danking commented Sep 5, 2023

		from typing_extensions import NotRequired


		class JobListEntry(TypedDict):

[qob][batch] do not list all jobs on failure (plus: types!) #13500

[qob][batch] do not list all jobs on failure (plus: types!) #13500

Conversation

danking commented Aug 25, 2023

daniel-goldstein Aug 25, 2023

Choose a reason for hiding this comment

danking Aug 25, 2023

Choose a reason for hiding this comment

jigold Aug 25, 2023

Choose a reason for hiding this comment

jigold commented Aug 25, 2023

danking commented Aug 25, 2023

danking commented Aug 25, 2023

jigold left a comment

Choose a reason for hiding this comment

jigold Aug 25, 2023

Choose a reason for hiding this comment

danking commented Aug 31, 2023

danking commented Aug 31, 2023

danking commented Sep 5, 2023