Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[qob][batch] do not list all jobs on failure (plus: types!) #13500

Merged
merged 14 commits into from
Sep 6, 2023

Conversation

danking
Copy link
Contributor

@danking danking commented Aug 25, 2023

CHANGELOG: In Query-on-Batch, the client-side Python code will not try to list every job when a QoB batch fails. This could take hours for long-running pipelines or pipelines with many partitions.

I also added API types to the list jobs end point because I have to go hunting for this every time anyway. Seems better to have this information at our digital fingertips.

CHANGELOG: In Query-on-Batch, the client-side Python code will not try to list every job when a QoB batch fails. This could take hours for long-running pipelines or pipelines with many partitions.

I also added API types to the list jobs end point because I have to go hunting for this every time anyway. Seems better to have this information at our digital fingertips.
@@ -462,14 +474,21 @@ async def wait(self,
with BatchProgressBar(disable=disable_progress_bar) as progress2:
return await self._wait(description, progress2, disable_progress_bar, starting_job)

async def debug_info(self):
async def debug_info(self,
_job_filter: Optional[Callable[[JobListEntry], bool]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a drive by comment: You can use the query string q to search for failed jobs instead of filtering client-side

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

jigold
jigold previously requested changes Aug 25, 2023
from typing_extensions import NotRequired


class JobListEntry(TypedDict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These both need to be versioned. Otherwise, we're going to run into issues with backwards compatibility.

@danking danking dismissed jigold’s stale review August 25, 2023 15:47

they have been so versioned.

@jigold
Copy link
Contributor

jigold commented Aug 25, 2023

I don't understand how the backwards compatibility is going to work in this case:

  1. I have a V1 client, the server is sending me a V2 response with extra fields (or removed fields). What happens here?

@danking
Copy link
Contributor Author

danking commented Aug 25, 2023

A V1 client cannot hit the V2 endpoint because it doesn't know it exists.

@danking
Copy link
Contributor Author

danking commented Aug 25, 2023

To be clear, all this does is encode in the types what we already guarantee: if version X returns {'foo': int}, then all future versions of the server must at least return a dict containing the 'foo' key with an integer. If those servers start returning {'bar': int}, that would break all the old clients (and violate the types that we wrote in those clients).

jigold
jigold previously requested changes Aug 25, 2023
Copy link
Contributor

@jigold jigold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just so I understand correctly, we can add fields to the V1Alpha type later on, but not remove them or change the type?

@@ -252,7 +253,9 @@ async def _handle_api_error(f: Callable[P, Awaitable[T]], *args: P.args, **kwarg
raise e.http_response()


async def _query_batch_jobs(request: web.Request, batch_id: int, version: int, q: str, last_job_id: Optional[int]):
async def _query_batch_jobs(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing the V1Alpha here.

@danking danking force-pushed the sb-should-not-list-all-jobs branch from 85381a0 to 902ac13 Compare August 25, 2023 17:36
@danking
Copy link
Contributor Author

danking commented Aug 31, 2023

I merged main and was forced to fix the pyright errors, which was great! I found two bugs with these new types:

  1. It's possible for a spec file to be missing which means the spec is None which would fail our get job front end code when it tries to fix up the resources.
  2. In the aioclient, we assumed the attributes was present and a dict but that key can be completely missing from the dict.

I also noticed cost_str has had the wrong type since always.

@danking
Copy link
Contributor Author

danking commented Aug 31, 2023

we can add fields to the V1Alpha type later on, but not remove them or change the type?

We should not remove or change fields, the types of fields, or the meaning of fields of JSON dicts returned by our API endpoints because that would break backwards compatibility.

The types on the other hand can change willy nilly, mypy and pyright have no way of knowing that we changed the definition of, for example, GetJobResponseV1Alpha. They will however, go report all the places in our codebase where we expected certain fields to exist if we remove those fields from GetJobResponseV1Alpha.

@danking
Copy link
Contributor Author

danking commented Sep 5, 2023

@jigold bump

@danking danking merged commit 4fb659c into hail-is:main Sep 6, 2023
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants