Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream 0.2.132 release #344

Merged
merged 88 commits into from
Jul 11, 2024
Merged

Upstream 0.2.132 release #344

merged 88 commits into from
Jul 11, 2024

Conversation

jmarshall
Copy link

Recent upstream changes include a rewrite of several of the web page templates. Hence this merge drops the functionality locally added by PRs #270, #272, and #311 to make the command <PRE> resizeable and to add job state quick links. We may re-add these improvements later by reimplementing them in the new template code.

daniel-goldstein and others added 30 commits April 10, 2024 17:47
Incorporates grafana which we did not have at the time of previous
revision and removes the nginx config that is no longer used. We now use
Envoy as our load balancer. I'll make a separate dev doc explaining the
gateways.
A long-standing fixme in the LocalBackend was to not rely on HadoopFS,
which we use with the SparkBackend for compatibility with dataproc and
hdfs urls.

By default, the HadoopFS doesn't understand gs urls. Users need to
install the gcs-hadoop-connector (preinstalled in dataproc) to
communicate with google cloud storage. Spark handles supplying
credentials to the connector.

Issue hail-is#13904 is caused by failing to properly supply the
gcs-hadoop-connector with credentials in the LocalBackend. In the
absence of config, the connector hangs while trying to fetch a token
form a non-existant metadata server.

The LocalBackend was designed to be a testing ground for lowered and
compiled code that would eventually be run on batch, where we use the
RouterFS. I propose a pragmatic fix for hail-is#13904 that ditches the HadoopFS
for all but local filesystem access in the LocalBackend instead of
identifying and fixing the root cause.

In doing so, I made a couple of changes to how the RouterFS is
configured: In the absence of the `HAIL_CLOUD` environment variable,
RouterFS can handle gs and az urls iff credentials are not supplied. If
the user supplies creditials, we use `HAIL_CLOUD` to decide which cloud
to route to.

fixes hail-is#13904
…ail-is#14451)

Before we can simplify the binding structure, we need to stop
duplicating it all over the place. This PR rewrites `FreeVariables` so
that it no longer needs special logic for particular nodes, hard coding
binding structure (redundantly). To do this, it takes advantage of the
new `Bindings`, which operates on a `GenericBindingEnv` interface. It
adds a new implementation of this interface specifically for computing
free variables, then simply does a generic traversal of the IR using
this custom binging environment.

While I find the new implementation far simpler and more obviously
correct than the old, I do expect it to further simplify once I'm able
to start modifying the core binding structure.
…ail-is#14461)

We can always revert this if there are issues, but the polling CI does
to refresh its understanding of PRs and batches is pretty light, and I
think every minute is still quite reasonable.
…ail-is#14458)

I missed that hail-is#14443 only got one of several `LOCK IN SHARE MODE` lines
in the billing projects query. This gets the rest in the two
`query_billing_projects` functions that do readonly selects
Fixes hail-is#13971

CHANGELOG: Hail now supports and primarily tests against Dataproc 2.2.5,
Spark 3.5.0, and Java 11. We strongly recommend updating to Spark 3.5.0
and Java 11. You should also update your GCS connector *after installing
Hail*: `curl https://broad.io/install-gcs-connector | python3`. Do not
try to update before installing Hail 0.2.131.


https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2

---------

Co-authored-by: Edmund Higham <edhigham@gmail.com>
…s#14277)

Later versions of IDEA can't run tests with our current testng and
scalatest versions. Update to latest and fix fallout.
Fixes: hail-is#14462
Restores ability to reuse spark contexts acrosss different hail
sessions, eg using multiple databricks notebook sessions.
In QoB, the Query Driver and all Query Worker jobs reside in the same
batch. Currently, the Query Driver submits worker jobs and then waits
for them to finish before collecting their results from GCS. The way it
waits is by polling the status of the batch and waiting for the number
of completed jobs to reach `n_total_jobs - 1` (to account for itself).
This is both awkward and prevents a couple potentially valuable pieces
of functionality: you cannot run multiple concurrent Query Drivers, and
you cannot take advantage of Batch's `cancel_after_n_failures`
functionality because you do not want to cancel the Query Driver itself.

The introduction of job groups addresses both these problems, as the
Query Driver can create a job group for the stage of worker jobs and
then await its completion. This PR makes a job group for the query
driver and then the query driver creates a nested job group per stage of
workers that it creates.

Utilizing `cancel_after_n_failures` to fail a stage early is future
work.
Use generic environment propagation in PruneDeadFields as much as
possible. This allowed the deletion of many cases in `rebuild`.
For Hail Batch on Terra Azure, the production artifact is Helm chart
containing the necessary kubernetes resources to run a Hail Batch
deployment in a Terra k8s cluster. This deployment contains slightly
modified containers of the batch front-end, batch driver and a mysql
database. This chart is currently built manually using the targets in
`batch/terra-chart/Makefile`. As this process is not currently
automatically tested, it's very prone to bit rot. This PR is an
amalgamation of fixes that I needed to make to get `main` to build in
the current Terra. A non-exhaustive list of the changes are:

- After changing from gradle to mill, the some Dockerfiles and make
targets needed to change to account for the new location of the JAR.
- I removed some redundancy in invocations of `docker build` by relying
on the generic targets that we now have in the top level Makefile.
- Terra changed how they handle identity management for the kubernetes
deployment, from `aadpodidentity` to `workloadIdentity`. This changes
the chart to work with their new inputs they provide.

Ultimately, terra should have a CI system that we can push charts to and
receive feedback on whether it passed our test suite in their test
environment.
)

Our call representation effectively uses unsigned integers. We have to
make sure that we are converting between signed and unsigned when
encoding/decoding them to/from the backend.
`curl` is already installed in `hail_ubuntu_image`, on which
`curl_image` is based.
`orjson` 3.9.15 fixed the rare segfault that we saw in `3.9.11`. Besides
just updating to latest patch and minor versions:

- Removed a redundant requirement of `orjson` in `gear/requirements.txt`
-- it inherits `orjson` from hailtop
- Bokeh `3.4` made a breaking change w.r.t. the `circle` method on
figures. I have restricted the bounds for `bokeh` to avoid this breaking
change but will follow up with a PR that changes our usage of bokeh to
follow the deprecation/upgrade advice and undo the bounds restriction
setup.py complains awfully when you invoke it directly. Apparently this
sort of thing is unacceptable and the recommended workflow for building
wheels is to use build. See the link below for more information.
https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html
[uv](https://github.com/astral-sh/uv) is a new package resolver by the
same folks who make `ruff`. It's boasted for being really fast, which
honestly it is, but mostly it's appealing to me because they support
generating lockfiles for alternative platforms and python versions than
the system you run it on, which allows us to delete all this dockerizing
`pip-compile` in order to generate lockfiles for linux.

It's a really green project, so I'm open to pushback on incorporating
it, but it seemed like a worthwhile simplification. I also quite like
that it allows for additional strategies in generating lockfiles. By
default, it behaves as would be expected, where it locks packages to the
highest version within the acceptable bounds. But you can also configure
it to generate the *lowest* acceptable pins, so we could actually verify
whether the lower bounds that we have in our requirements files are
actually acceptable or not.
…ex` (hail-is#14497)

hail-is#14085 fixed a change in semantics
for `parallelizeAndComputeWithIndex` didn't fail early anymore.
It also noted that the `SparkTaskContext.partitionId` could be wrong
when used with call caching, but left the fix to another change.
This change takes inspiration from `SparkContext.runJob` in its
`partitions` optional argument.
By supplying this, we can limit the number of partitions that we compute
(useful for call caching) without doing dodgy things that breaks the
`partitionId`.
This change also reduces code duplication wile preserving semantics -
`parallelizeAndComputeWithIndex` still fails early for the local and
spark backends.
This function is an unused relic of the first version of our billing
system, that just computed the cost of a slot on a Batch VM per mcpu
milliseconds based on some hard-coded prices. We long ago migrated to a
system where we track GCP SKUs and price changes through their API and
properly itemize usage by SKU in the database.
This is a long overdue description of how our K8s ingress / config is
set up. I've tried to keep it as concise as possible, so ask for
elaboration if there are unexplained gaps.
As a step towards coralling the specification of the binding structure
of the IR into one place, this rewrites `Bindings` to use only a single
method of the `GenericBindingEnv` interface, `newBlock`, which therefore
captures all possibilities of how a node can modify its parent's
environment in a child. Later work refactors this to return an object
encoding this modification, instead of returning a modified environment,
which allows the caller complete flexibility in how to maintain an
environment appropriately for their use case.

This PR leaves in the old `Bindings` implementation, with an assertion
that they agree. The PR stacked above this, hail-is#14495, deletes the old
implementation. This way CI asserts that this refactoring hasn't changed
any behavior.
Follows up on hail-is#14475 by deleting the old implementation of `Bindings` in
favor of the new one, which has now been checked by CI to not change the
behavior.
Refactors `Bindings` to return an object encoding the change to the
environment (any new bindings, whether the agg/scan env is promoted,
etc). This allows the deletion of `SegregatedBindingEnv`. Follow up work
will use this to replace the other specializations of
`GenericBindingEnv`, and to greatly simplify compiler passes, such as
`NormalizeNames` and `PruneDeadFields`, which currently need to
redundantly encode the binding structure of every node.
The memory that machines use is calculated based off of the number of
cores the machine has, with a fix ratio set of cores to Gb of RAM. The
code assumed a ratio of 3.75 cores: 1 Gb but this assumption does not
hold outside of the n1 machine family. This PR changes the functions
that calculate memory from number of cores to account for this by
passing the machine_type as the main parameter rather than the worker
type. Then, logic is added to use the appropriate ratio of 4:1 for the
g2 family of machines.

---------

Co-authored-by: Daniel Goldstein <danielgold95@gmail.com>
Co-authored-by: Sophie Parsa <parsa@wm9c6-e4e.broadinstitute.org>
…ts (hail-is#14508)

These docstrings weren't updated since the GCP multi-regional egress
changes. It is no longer free to egress from/to multi-regional buckets,
so the best thing to do would be to put your storage and compute into
the same region.
…is#14520)

We have the type, we should use it, saves a round trip especially for
QoB.
At time of writing dependabot doesn't have a great way to bulk update
dependencies across unrelated lockfiles in the repo, which often require
manual intervention because we assert that we're always using the same
package version everywhere. It's also just a lot of noise and hogs CI
time. We've decided to move to a periodic bulk-update process + using
repo Security alerts.
This PR extends a previous PR that added g2-standard-4 machines into
batch by adding the full machine family. The code adds a new Accelerator
Resource that takes the number of gpus.

---------

Co-authored-by: Sophie Parsa <parsa@wm9c6-e4e.broadinstitute.org>
chrisvittal and others added 26 commits May 31, 2024 15:32
…ail-is#14560)

CHANGELOG: The gvcf import stage of the VDS combiner now preserves the
GT of reference blocks. Some datasets have haploid calls on sex
chromosomes, and the fact that the reference was haploid should be
preserved.
…s#14571)

Fixes: hail-is#14559
`hl.nd.array`s constructed from stream pipelines can cause out of memory
exceptions owing to a limitation in the python CSE algorithm that does
not eliminate partially redundant expressions in if-expressions.
Explicitly `let`-binding the input collection prevents it from being
evaluated twice: once for the flattened data stream and once for the
original shape.
Hail's benchmarks were kind of their own thing and a little neglected.
This change moves the benchmarks into the `hail/python` folder and
updates them to use pytest with a custom plugin/set of pytest hooks.
Now, benchmarks can be run from the command line like any pytest.
This change removes the `benchmark-hail` (or `hailbench`) utility.

Benchmarks are marked by `pytest.mark.benchmark` (via the `@benchmark`
decorator).
By convention, benchmarks are python tests whose names are prefixed by
`benchmark_` and are located in files with the same prefix.
Nothing enforces this, however, so you could name your benchmarks
`test_*` and put them in files named `test_*.py`.
Benchmarks may import and use any test code or utilities defined in
`test/`.
The results of each benchmark are outputted as json lines (`.jsonl`) to
the file specified by the `--output` pytest arg or stdout.

The folder structure should be familiar, resembling our `test/`
directory.
I believe this is flexible enough to add `hailtop` benchmarks should we
so wish:
```
pytest.ini - hoisted from `test/` to include benchmark marks
benchmark/
- conftest.py for custom pytest command line args 
- hail/
  - confest.py for custom plugin that runs hail benchmarks
  - benchmark_*.py hail query benchmark code
- tools/
  - shared utilites, including the `@benchmark`
```
Supporting pytest fixtures required writing a custom plugin to run
benchmarks, as using off-the-shelf
solutions like `pytest-benchmark` would forbid method level fixtures
like `tmp_path` etc.
The plugin is designed to run "macro-benchmarks" (ie long-running tests)
and fully supports pytest parameterisation.
For each benchmark, the plugin initialises hail and then repeats (for a
number of iterations defined by the pytest mark)
acquiring fixtures, timing invocation and tearing-down fixtures, finally
stopping hail. It is therefore unsuitable for
microbenchmarks, for which we currenly have none in python. If we add
them we'd need to tweak this so support them.
Perhaps an inner loop or something.

The process of submitting benchmarks to batch is greatly simplified as
the old `Makefile` infrastructure for
building wheels and docker images etc has been replaced with the script
`benchmark_in_batch.py`.
Benchmark images are now based off the `hail-dev` image built in CI (or
via the `hail-dev-image` make target).
Furthermore, you can control the number of "replicate" jobs created for
each benchmark at the benchmark level using
the `@benchmark(batch_jobs=N)` decotator.

Limitations/shortcomings:
- Output is currently jsonl only. Some more human friendly output might
be nice on a per iteration basis.
- Old `benchmark-hail` utilities are broken. I'll restore these in
subsequent changes.

---------

Co-authored-by: Daniel Goldstein <danielgold95@gmail.com>
Co-authored-by: Christopher Vittal <chris@vittal.dev>
This revamps the `batch.front_end` web pages to be a bit less cluttered
and more usable. I brought in tailwindcss instead of using our existing
sass. I mainly find it easier to work with but am fine porting it to
sass if anyone feels strongly against changing css tools. In order to
avoid overhauling all our HTML in one PR, I added a flag to the jinja
context `use_tailwind` that is currently only on for the `batch`
service. `layout.html` then uses this flag to determine which CSS and
header it should use. If this gets to a state we're happy to merge, it
should be a lot less work to bring over auth and CI and delete that
flag.
…4583)

`make_variants_matrix_table` and `make_variant_stream`, shared a
function for processing entries. As did `make_reference_matrix_table`
and `make_reference_stream`. This deduplicates that functionality by
lifting out the internal functions to top level definitions.
…file uploads (hail-is#14576)

We use `nest_asyncio` to allow synchronous blocking on a coroutine
execution inside an async context. For example, if a user is using the
synchronous interface of hail inside a jupyter cell (which runs an event
loop). `nest_asyncio` achieves this by patching the `asyncio` event loop
to make it reentrant, and with that there are footguns. This allows us
to do things like create 100 tasks that all concurrently invoke
`run_until_complete`, each of which will add stack frames to the event
loop stack that can pile up and trigger a cryptic `RecursionError`.

But internally we never *need* to make concurrent calls to
`run_until_complete`, and more broadly we should never have `async /
sync / async` inside hail code. This change exposes an asynchronous
`validate_file` so that asynchronous methods in `hailtop` can use it
directly instead of inserting a synchronous layer (`async_to_blocking`)
between them.
These queries are responsible for getting the paginated list of jobs in
the batch UI. I believe this missing order by went unnoticed because the
queries were naturally getting returned in order based on the jobs
primary key, but we've recently seen a couple page loads of jumbled or
missing jobs. Seems like something around the query stats changed and
the database adjusted its plan slightly. Regardless, we want this order
by explicit.
Specify which benchmark to run by nodeid instead of by expression

---------

Co-authored-by: Christopher Vittal <chris@vittal.dev>
Adds tests for sparse block matrices in python, which are currently
failing in backends with bm lowering.
The original UI redesign was intended to make it easier to navigate and
focus on important information. I received very positive feedback on
some aspects like this, for example having the different job steps in
separate tabs. There was also feedback though that there was a lot of
dead space. This is an attempt to find a good middle ground where the UI
is not overwhelmingly cluttered but is more compact and makes better use
of screen real estate
We seem to be running into spark/yarn scheduling limitations causing
stages to often fail with a large number of partitions. Here, we
implement a very simple chunking strategy to run spark jobs with a
limited number of partitions at a time.

The maximum parallelism is controlled by a new
`spark_max_stage_parallelism` feature flag, which defaults to MAXINT
until we can figure out a good default.

Also, this change corrects a small error in logic for partition indices
for call caching. The `resultHandler` argument of [`runJob`] is called
with the job's partition index, not the index of the partition within
the RDD. So we need to index into the `partitions` sequence when
populating the results buffer.

CHANGELOG: Add 'spark_max_stage_parallelism' flag to allow users to run
pipelines with a large number of partitions in chunks. By default, hail
still attempts to run all partitions in a stage at once.

[`runJob`]:
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/SparkContext.html#runJob[T,U](rdd:org.apache.spark.rdd.RDD[T],func:(org.apache.spark.TaskContext,Iterator[T])=%3EU,partitions:Seq[Int],resultHandler:(Int,U)=%3EUnit)(implicitevidence$11:scala.reflect.ClassTag[U]):Unit
…hail-is#14589)

`'/api/v1alpha/batches/create-fast'` is a Batch API endpoint that allows
you to create a batch, submit jobs, and start them running all in one
HTTP request. It makes a few transactions within the endpoint and must
be idempotent. One of the failure modes it must watch for is when a
Batch Update to add some new jobs and/or job groups to a batch has
already successfully been committed, but the request is retried for X
reason. In that scenario, we want to just return a 200 to the client. We
do this already, but there was an oversight when creating job groups
where those branches did not include the full response payload, which
results in intermittent errors on the client when it can't retrieve
`resp['last_job_group_id']`. This fixes that so we always return the
same JSON.
…il-is#14579)

I believe hail-is#14547 introduced a bug that broke IR function deserialization
in QoB by changing `value_parameter_names` from `Array[String]` to
`Array[Name]`. This fixes the issue by pushing the introduction of the
`Name` wrapper object to after deserialization. Another option is to
incorporate the `{ str: String }` structure of `Name` into the python ->
scala payload, but I'm not sure I see a use case for that and we can
always do that later (there is no issue of backwards compatibility with
this communication between python and scala).

My main concern here is that clearly this isn't tested. I'd appreciate
guidance on the current advised practice for testing this behavior.
These make targets have been more trouble than they are worth. Since
both the requirements files and the pinned requirements files that they
generate are checked into the repo, updates to these files through PRs
don't play well with how make tracks timestamps, since my local git may
update them in different orders. In general, I've never had a need to
only update a single requirements file and since these are technically
dependencies of say, the Batch image, building the batch image can
incidentally trigger a re-generation of these files when I never
actually changed the dependencies. This PR removes those targets and
does a normal refresh of the requirements, as well as updates the
documentation on how we handle pip dependencies.
Recent upstream changes include a rewrite of several of the web page
templates. Hence this merge drops the functionality locally added
by PRs #270, #272, and #311 to make the command <PRE> resizeable
and to add job state quick links. We may re-add these improvements
later by reimplementing them in the new template code.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants