Merge upstream HEAD(bea04d9, 2024-05-21) [release] 0.2.130 (#14454) #335

milo-hyben · 2024-05-21T04:39:14Z

In the final there was just one conflict.

Oh Chrome, How do I call thee? Let me count the ways.

I've already deployed to the cluster to test. Unless master deploys, you can see it here: https://grafana.azure.hail.is/datasources/edit/kgzY5Io7k I also fixed the grafana Makefile.

…4122) Co-authored-by: Edmund Higham <ehigham@users.noreply.github.com>

See hail-is#14120, which will add the resultant commit from this PR to `.git-blame-ignore-revs` once this PR is squashed and merged.

…l-is#13617) Add staged code generation capabilities by lowering to and emitting `StreamLeftIntervalJoin`. The `EmitStream` rule for this node uses a code-generated min heap* of intervals that contain the current key in the left stream, ordered by right endpoint. The row type of the join result is the left row with an array of values** from the right table, ordered by their associated interval's right endpoint. Note that this implementation is by no means optimised. There are a number of opportunities that I'd like to consider in subsequent changes, including: - Laziness in restoring the heap property on push/pop from heap. - Not deep-copying elements of the right stream when no explicit memory management per element is required. \* This min heap is code generated instead of using an off-the-shelf implementation as: - we dont yet to have a mapping from `SType` to a java class or interface to parameterise the heap with - not obvious how to handle region memory management in an existing solution \** The value is a row with the key fields omitted. Additional Changes: - Make `this` an explicit argument to `CodeBuilder.invoke()` - Gives us control which object we're dispatching on. - Useful for generating more self-contained classes - Previously assumed all methods were defined in the same class. - Removed `referenceGenomes` from `ExecuteContext` - Delegate to `Backend`; in practice these come from the backend anyway. - These were being populated from the backend object - Backend is mutable meaning we can add/remove fake genomes for testing

Before this change, testing just the n_partitions method of Table takes nearly a minute, 52s of which is spent in "setup". Admittedly, this setup is shared across multiple tests, but this is an unacceptable burden for iterating on one method. ``` 52.30s setup hail/table.py::hail.table.Table.n_partitions 3.07s call hail/table.py::hail.table.Table.n_partitions ``` After this change, the setup time significantly reduces. The call gets slower, presumably because the JVM is not warm. I think the setup time is now dominated by Hail JVM initialization. ``` 11.77s call hail/table.py::hail.table.Table.n_partitions 9.68s setup hail/table.py::hail.table.Table.n_partitions ``` This reduces the practical runtime of this test by 50%. This commit adds 72kB to the repository: ``` $ git diff-tree -r -c -M -C --no-commit-id HEAD | awk '{print $4}' | git cat-file --batch-check | awk 'BEGIN {s=0} {s+= $3} END {print s}' 72998 ```

We do not yet have pyright running on the Hail python package nor the tests. Without this change, I get failures on push.

See hail-is#14119.

Currently, the k8s namespace field is used both for routing internal requests inside kubernetes but also external requests over the internet. It also has special logic based on whether the namespace indicates a production or dev environment. For example, if `namespace == 'default'`, then we route external `batch` requests to `batch.<domain>/path`, but if `namespace == foo_dev`, we route external `batch` requests to `internal.<domain>/foo_dev/path`. This PR decouples the namespace field from routing. Aside from being overall more straightforward in my opinion, this is necessary for batch on azure terra where batch is served out of a subpath it does not control and is unrelated to whatever namespace it might reside in. The guiding principle for routing is then as follows: If the config has no subpath, use a subdomain, otherwise put everything under domain + subpath. For example: - `{'domain': 'hail.is', 'subpath': null}` => `batch.hail.is` - `{'domain': 'internal.hail.is', 'subpath': '/foo_dev'}` => `internal.hail.is/foo_dev/batch` Since the CI pipeline runs on current production instances, there is a minor need to stay compatible with old deploy configs (or else hack up the CI build.yaml). It's quite a simple translation though, because if there is no subpath provided we can infer one based on the `default_namespace`.

@patrick-schultz

Split from hail-is#14103 Co-Authored by @patrick-schultz This change includes some edits to prevent build failures caused by using the scala formatter --------- Co-authored-by: patrick-schultz <pschultz@broadinstitute.org>

@patrick-schultz

Run scalafmt on compiler scala sources. Split from hail-is#14103. Co-authored by @patrick-schultz.

…op (hail-is#14097) Fixes hail-is#14130. We pervasively assume: 1. That our entire system is used within a single Python thread. 2. That once an event loop is created that's the only event loop that will exist forever. Pytest (and newer version of IPython, afaict) violate this pretty liberally. ~~pytest_asyncio has [explicit instructions on how to run every test in the same event loop](https://pytest-asyncio.readthedocs.io/en/latest/how-to-guides/run_session_tests_in_same_loop.html). I've implemented those here.~~ [These instructions don't work](pytest-dev/pytest-asyncio#744). It seems that the reliable way to ensure we're using one event loop everywhere is to use pytest-asyncio < 0.23 and to define an event_loop fixture with scope `'session'`. I also switched test_batch.py into pytest-only style. This allows me to use session-scoped fixtures so that they exist exactly once for the entire test suite execution. Also: - `RouterAsyncFS` methods must either be a static method or an async method. We must not create an FS in a sync method. Both `parse_url` and `copy_part_size` now both do not allocate an FS. - `httpx.py` now eagerly errors if the running event loop in `request` differs from that at allocation time. Annoying but much better error message than this nonsense about timeout context managers. - `hail_event_loop` either gets the current thread's event loop (running or not, doesn't matter to us) or creates a fresh event loop and sets it as the current thread's event loop. The previous code didn't guarantee we'd get an event loop b/c `get_event_loop` fails if `set_event_loop` was previously called. - `conftest.py` is inherited downward, so I lifted fixtures out of test_copy.py and friends and into a common `hailtop/conftest.py` - I added `make -C hail pytest-inter-cloud` for testing the inter cloud directory. You still need appropriate permissions and authn. - I removed extraneous pytest.mark.asyncio since we use auto mode everywhere. - `FailureInjectingClientSession` creates an `aiohttp.ClientSession` and therefore must be used while an event loop is running. Easiest fix was to make the test async.

CHANGELOG: Fixed bugs in the identity by descent implementation for Query on Batch This PR fixes hail-is#14052. There were two bugs in how we compute IBD. In addition, the tests weren't running in QoB and the test dataset we were using doesn't have enough variability to catch errors. I used Balding Nichols generated data instead. Do we need to set the seed in the tests here?

Plotly relies on behavior that Pandas is deprecated. See plotly/plotly.py#4363 and plotly/plotly.py#4379. The fix (plotly/plotly.py#4379) merged into [main and was released in 5.18.0](plotly/plotly.py@57e4d1d).

Here's a diff of `hailctl dataproc start foo --dry-run` on main and on this branch. Notice that the properties and metadata arguments gain a leading and trailing single quote. This ensure that things like `sys_platform!="win32"` are properly transmitted. In `start.py` we just use exec-style invocation, so there's no equivalent issue. ``` 7c7 < --properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=36g|||yarn:yarn.nodemanager.resource.memory-mb=29184|||yarn:yarn.scheduler.maximum-allocation-mb=14592|||spark:spark.executor.cores=4|||spark:spark.executor.memory=5837m|||spark:spark.executor.memoryOverhead=8755m|||spark:spark.memory.storageFraction=0.2|||spark:spark.executorEnv.HAIL_WORKER_OFF_HEAP_MEMORY_PER_CORE_MB=3648 \ --- > '--properties=^|||^spark:spark.task.maxFailures=20|||spark:spark.driver.extraJavaOptions=-Xss4M|||spark:spark.executor.extraJavaOptions=-Xss4M|||spark:spark.speculation=true|||hdfs:dfs.replication=1|||dataproc:dataproc.logging.stackdriver.enable=false|||dataproc:dataproc.monitoring.stackdriver.enable=false|||spark:spark.driver.memory=36g|||yarn:yarn.nodemanager.resource.memory-mb=29184|||yarn:yarn.scheduler.maximum-allocation-mb=14592|||spark:spark.executor.cores=4|||spark:spark.executor.memory=5837m|||spark:spark.executor.memoryOverhead=8755m|||spark:spark.memory.storageFraction=0.2|||spark:spark.executorEnv.HAIL_WORKER_OFF_HEAP_MEMORY_PER_CORE_MB=3648' \ 9c9 < --metadata=^|||^WHEEL=gs://hail-30-day/hailctl/dataproc/dking-dev/0.2.126-a51eabd65859/hail-0.2.126-py3-none-any.whl|||PKGS=aiodns==2.0.0|aiohttp==3.9.1|aiosignal==1.3.1|async-timeout==4.0.3|attrs==23.1.0|avro==1.11.3|azure-common==1.1.28|azure-core==1.29.5|azure-identity==1.15.0|azure-mgmt-core==1.4.0|azure-mgmt-storage==20.1.0|azure-storage-blob==12.19.0|bokeh==3.3.1|boto3==1.33.1|botocore==1.33.1|cachetools==5.3.2|certifi==2023.11.17|cffi==1.16.0|charset-normalizer==3.3.2|click==8.1.7|commonmark==0.9.1|contourpy==1.2.0|cryptography==41.0.7|decorator==4.4.2|deprecated==1.2.14|dill==0.3.7|frozenlist==1.4.0|google-auth==2.23.4|google-auth-oauthlib==0.8.0|humanize==1.1.0|idna==3.6|isodate==0.6.1|janus==1.0.0|jinja2==3.1.2|jmespath==1.0.1|jproperties==2.1.1|markupsafe==2.1.3|msal==1.25.0|msal-extensions==1.0.0|msrest==0.7.1|multidict==6.0.4|nest-asyncio==1.5.8|numpy==1.26.2|oauthlib==3.2.2|orjson==3.9.10|packaging==23.2|pandas==2.1.3|parsimonious==0.10.0|pillow==10.1.0|plotly==5.18.0|portalocker==2.8.2|protobuf==3.20.2|py4j==0.10.9.5|pyasn1==0.5.1|pyasn1-modules==0.3.0|pycares==4.4.0|pycparser==2.21|pygments==2.17.2|pyjwt[crypto]==2.8.0|python-dateutil==2.8.2|python-json-logger==2.0.7|pytz==2023.3.post1|pyyaml==6.0.1|regex==2023.10.3|requests==2.31.0|requests-oauthlib==1.3.1|rich==12.6.0|rsa==4.9|s3transfer==0.8.0|scipy==1.11.4|six==1.16.0|sortedcontainers==2.4.0|tabulate==0.9.0|tenacity==8.2.3|tornado==6.3.3|typer==0.9.0|typing-extensions==4.8.0|tzdata==2023.3|urllib3==1.26.18|uvloop==0.19.0;sys_platform!="win32"|wrapt==1.16.0|xyzservices==2023.10.1|yarl==1.9.3 \ --- > '--metadata=^|||^WHEEL=gs://hail-30-day/hailctl/dataproc/dking-dev/0.2.126-a51eabd65859/hail-0.2.126-py3-none-any.whl|||PKGS=aiodns==2.0.0|aiohttp==3.9.1|aiosignal==1.3.1|async-timeout==4.0.3|attrs==23.1.0|avro==1.11.3|azure-common==1.1.28|azure-core==1.29.5|azure-identity==1.15.0|azure-mgmt-core==1.4.0|azure-mgmt-storage==20.1.0|azure-storage-blob==12.19.0|bokeh==3.3.1|boto3==1.33.1|botocore==1.33.1|cachetools==5.3.2|certifi==2023.11.17|cffi==1.16.0|charset-normalizer==3.3.2|click==8.1.7|commonmark==0.9.1|contourpy==1.2.0|cryptography==41.0.7|decorator==4.4.2|deprecated==1.2.14|dill==0.3.7|frozenlist==1.4.0|google-auth==2.23.4|google-auth-oauthlib==0.8.0|humanize==1.1.0|idna==3.6|isodate==0.6.1|janus==1.0.0|jinja2==3.1.2|jmespath==1.0.1|jproperties==2.1.1|markupsafe==2.1.3|msal==1.25.0|msal-extensions==1.0.0|msrest==0.7.1|multidict==6.0.4|nest-asyncio==1.5.8|numpy==1.26.2|oauthlib==3.2.2|orjson==3.9.10|packaging==23.2|pandas==2.1.3|parsimonious==0.10.0|pillow==10.1.0|plotly==5.18.0|portalocker==2.8.2|protobuf==3.20.2|py4j==0.10.9.5|pyasn1==0.5.1|pyasn1-modules==0.3.0|pycares==4.4.0|pycparser==2.21|pygments==2.17.2|pyjwt[crypto]==2.8.0|python-dateutil==2.8.2|python-json-logger==2.0.7|pytz==2023.3.post1|pyyaml==6.0.1|regex==2023.10.3|requests==2.31.0|requests-oauthlib==1.3.1|rich==12.6.0|rsa==4.9|s3transfer==0.8.0|scipy==1.11.4|six==1.16.0|sortedcontainers==2.4.0|tabulate==0.9.0|tenacity==8.2.3|tornado==6.3.3|typer==0.9.0|typing-extensions==4.8.0|tzdata==2023.3|urllib3==1.26.18|uvloop==0.19.0;sys_platform!="win32"|wrapt==1.16.0|xyzservices==2023.10.1|yarl==1.9.3' \ ```

As written: - `make -C hail doctest` runs the normal tests. - `test_python_docs` only tests the RST files (e.g. [see this deploy](https://batch.hail.is/batches/8097884/jobs/127))

CHANGELOG: `hailctl dataproc` now creates clusters using Dataproc version 2.1.33. It previously used version 2.1.2. Dataproc verifications: 1. Memory available after startup: `mem avail: 42959 of 52223 MiB`. OK. 2. SparkMonitor still shows up. 3. Native code (`hl.identity_by_descent`) still works. The Dataproc version list only keeps the most recent five, sadly. Lucky for us, archive.org happens to have a capture with 2.1.2. I also requested a capture of the current state of the page. - https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.1 - 2023-03-07: https://web.archive.org/web/20230307225815/https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.1 - 2023-12-12: https://web.archive.org/web/20231212001716/https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.1 | Component | 2.1.33-debian11 | 2.1.2-debian11 | | --- | --- | --- | | Apache Atlas | 2.2.0 | 2.2.0 | | Apache Flink | 1.15.4 | 1.15.0 | | Apache Hadoop | 3.3.6 | 3.3.3 | | Apache Hive | 3.1.3 | 3.1.3 | | Apache Hive WebHCat | 3.1.3 | 3.1.3 | | Apache Hudi | 0.12.3 | 0.12.0 | | Apache Kafka | 3.1.0 | 3.1.0 | | Apache Pig | 0.18.0-SNAPSHOT | 0.18.0-SNAPSHOT | | Apache Spark | 3.3.2 | 3.3.0 | | Apache Sqoop v1 | 1.5.0-SNAPSHOT | 1.5.0-SNAPSHOT | | Apache Sqoop v2 | N/A | 1.99.6 | | Apache Tez | N/A | 0.10.1 | | Cloud Storage Connector | hadoop3-2.2.18 | hadoop3-2.2.9 | | Conscrypt | 2.5.2 | 2.5.2 | | Docker | 20.10 | 20.10 | | Hue | 4.10.0 | 4.10.0 | | Java | 11 | 11 | | JupyterLab Notebook | 3.4 | 3.4 | | Oozie | 5.2.1 | 5.2.1 | | Trino | 376 | 376 | | Python | Python 3.10 | Python 3.10 | | R | R 4.1 | R 4.1 | | Ranger | 2.2.0 | 2.2.0 | | Scala | 2.12.18 | 2.12.14 | | Solr | 9.0.0 | 9.0.0 | | Zeppelin Notebook | 0.10.1 | 0.10.1 | | Zookeeper | 3.8.3 | 3.8.0 | Here's a diff. I think the things that may affect us are: 1. Scala patch version 4. Cloud Storage Connector patch version (this is the motivation for this change) 5. Spark patch version 6. Total daemon memory use at start-up. ```diff diff -u /tmp/b /tmp/a --- /tmp/b 2023-12-11 19:20:34 +++ /tmp/a 2023-12-11 19:22:42 @@ -1,17 +1,17 @@ -Component 2.1.2-debian11 +Component 2.1.33-debian11 Apache Atlas 2.2.0 -Apache Flink 1.15.0 -Apache Hadoop 3.3.3 +Apache Flink 1.15.4 +Apache Hadoop 3.3.6 Apache Hive 3.1.3 Apache Hive WebHCat 3.1.3 -Apache Hudi 0.12.0 +Apache Hudi 0.12.3 Apache Kafka 3.1.0 Apache Pig 0.18.0-SNAPSHOT -Apache Spark 3.3.0 +Apache Spark 3.3.2 Apache Sqoop v1 1.5.0-SNAPSHOT -Apache Sqoop v2 1.99.6 -Apache Tez 0.10.1 -Cloud Storage Connector hadoop3-2.2.9 +Apache Sqoop v2 N/A +Apache Tez N/A +Cloud Storage Connector hadoop3-2.2.18 Conscrypt 2.5.2 Docker 20.10 Hue 4.10.0 @@ -22,7 +22,7 @@ Python Python 3.10 R R 4.1 Ranger 2.2.0 -Scala 2.12.14 +Scala 2.12.18 Solr 9.0.0 Zeppelin Notebook 0.10.1 -Zookeeper 3.8.0 \ No newline at end of file +Zookeeper 3.8.3 ```

@jigold

CHANGELOG: Hail Query-on-Batch previously used Class A Operations for all interaction with blobs. This change ensures that QoB only uses Class A Operations when necessary. Inspired by @jigold 's file system improvement campaign, I pursued the avoidance of "list" operations. I anticipate this reduces flakiness in Azure (which is tracked in hail-is#13351) and cost in Azure. I enforced aiotools.fs terminology on hail.fs and Scala: 1. `FileStatus`. Metadata about a blob or file. It does not know if a directory exists at this path. 2. `FileListEntry`. Metadata from a list operation. It knows if a directory exists at this path. Variable names were updated to reflect this distinction: 1. `fileStatus` / `fileStatuses` 2. `fle`/ `fles` / `fileListEntry` / `fileListEntries`, respectively. `listStatus` renamed to `listDirectory` for clarity. In both Azure and Google, `fileStatus` does not use a list operation. `fileListEntry` can be used when we must know if a directory exists. I just rewrote this from first principles because: 1. In neither Google nor Azure did it check if the path was a directory and a file. 2. In Google, if the directory entry wasn't in the first page, it would fail (NB: there are fifteen non-control characters in ASCII before `/`, if the page size is 15 or fewer, we'd miss the first entry with a `/` at the end). 3. In Azure, we issued both a get and a list. There are now unit tests for this method. --- 1. `copyMerge` and `concatenateFiles` previously used `O(N_FILES)` list operations, they now use `O(N_FILES)` get operations. 2. Writers that used `exists` to check for a _SUCCESS file now use a get operation. 3. Index readers, import BGEN, and import plink all now check file size with a get operation. That said, overall, the bulk of our Class A Operations are probably writes.

…#13812) The bug in hail-is#13785 was that we used a `relpath`, but the files that were to be included was the parent directory of the current working directory of the script. To solve this, I added a new option `--mounts` that uses the Docker syntax for `-v src:dest` to specify where the contents of the source should be mounted into the container. Fixes hail-is#13785

This change removes shim credential classes from the batch worker code and attempts to consolidate credential handling in the worker API classes.

…ts (hail-is#14105) Fixes hail-is#13346. Another user was confused by this: hail-is#14102. Unfortunately, the world appears to have embraced missing values in VCF array fields even though the single element case is ambiguous. In hail-is#13346, I proposed a scheme by which we can disambiguate many of the cases, but implementing it ran into challenges because LoadVCF.scala does not expose whether or not an INFO field was a literal "." or elided entirely from that line. Anyway, this error message actually points users to the fix. I also changed some method names such that every method is ArrayType and never TypeArray.

`ruff` is faster than `black` when formatting Python `files`, so this change replaces the formatting checks for our files with calls to `ruff` instead of `black`, and runs `ruff format` on all files that would cause those checks to fail.

CHANGELOG: Use indexed VEP cache files for GRCh38 on both dataproc and QoB. Fixes hail-is#13989 In this PR, I did the following: 1. Installed samtools into the Docker image to get rid of errors in the log output 2. Added the `--merged` flag so that VEP will use the directory `homo_sapiens_merged` for the cache Outstanding Issues: 1. The FASTA files that are in `homo_sapiens/` were not present in the merged dataset. Do we keep both the `homo_sapiens` and `homo_sapiens_merged/` directories in our bucket or do we transfer the FASTA files to the merged directory? 2. Once we decide the answer to (1), then I can fix this in dataproc. The easiest thing to do is to add the tar file with the `_merged` data to the dataproc vep folders and use the `--merged` flag. However, that will double the startup time for VEP on a worker node in dataproc. Before: <img width="617" alt="Screenshot 2023-12-05 at 12 42 16 PM" src="https://github.com/hail-is/hail/assets/1693348/bee7fff5-782c-4f19-aa88-26383ed386b7"> After: <img width="619" alt="Screenshot 2023-12-05 at 12 46 30 PM" src="https://github.com/hail-is/hail/assets/1693348/3d731759-6c69-4f1c-9c73-92bfb05c239a">

…ail-is#14139) The term `name` has long been ambiguous and, indeed, our Azure implementation differed from S3 and Google. Now both FileStatus and FileListEntry use the term `basename`. Azure has been changed to have the same semantics as S3 and Google. --------- Co-authored-by: jigold <jigold@users.noreply.github.com>

…l-is#14151) This provides useful feedback we can use to understand why `test_file_in_current_dir` keeps failing.

@patrick-schultz

split from hail-is#14103 co-authored by @patrick-schultz

…ng (hail-is#14438)

This test times out occasionally, as far as we can tell due to unavailability of g2 machines. We've determined that for the time being we will allow the test to time out so it does not interfere with the CI pipeline.

Decided to tackle this as it was causing noise in the logs. There is no reason this read-only select query should lock any rows.

Co-authored-by: Daniel Goldstein <danielgold95@gmail.com>

I sometimes find the term "Cost" on this page confusing, and I think its inferred meaning differs depending on who's reading the page. This page shows what users are spending on the Batch service, not what it is *costing* operators to run the service (since we charge a service fee).

May it live on in the git history

Closes hail-is#14210. Shows whether a Job is "Always Run" on the Batch page: <img width="1044" alt="Screenshot 2024-03-28 at 14 10 04" src="https://github.com/hail-is/hail/assets/84595986/ec0dd8aa-6f58-4b84-a4f5-1ead8a98f13a"> and on the Job page: <img width="281" alt="Screenshot 2024-03-28 at 14 10 11" src="https://github.com/hail-is/hail/assets/84595986/b2fbe767-35a9-4ff3-8b9d-c50be7cf4408">

This PR refactors the `Binds.scala` file, which encodes the binding structure of the IR. The binding structure controls what variable bindings are in scope in each location of the IR. The encoding specifies, for each node `x` and each child `c` of `x`, how does the environment (the set of bindings which are in scope, and their types) of 'x' differ from the environment of `c`. For instance, `c`'s environment might add a new bindings (e.g. in the body of a `StreamMap`), or it might drop all bindings, isolating the child, so that it cannot refer to anything bound outside of `x` (e.g. all relational nodes). The binding structure is complicated by the separation of the environment into "eval", "agg", "scan", and "relational" scopes. The child `c`'s environment might add new bindings in any of these scopes, and it might modify the parent's environment in other ways, like replacing the "eval" scope with the "agg" scope. In `Binds.scala`, the encoding of this relationship between parent's and child's environments was split across many methods, separately encoding bindings added to each of the scopes, and a prior modification to the parent scope. This made it difficult to understand the binding behavior of any particular node. And for nodes with more complicated binding behavior, these separate encodings became confusingly entangled. In this PR, `Binds.scala` is rewritten so that the binding behavior of a node is specified in a single place. Instead of computing a list of new bindings, it simply takes the parent's environment and returns the child's. Some use cases do require knowing what new bindings are added by a node in a child's environment. To support this, I made the `BindingEnvironment` interface a trait, and added another implementor `SegregatedBindingEnv`, which wraps two separate `BindingEnvironment`s, and puts all new bindings into the second. This satisfies the invariant that for any function `f(GenericBindingEnv): GenericBindingEnv` which modifies an environment using only the `GenericBindingEnv` interface, and any `env: BindingEnv[T]`, ``` f(env) == f(SegregatedBindingEnv(env)).unified ``` i.e. the segregation doesn't have any effect, other than keeping track of which bindings are new. Now that all binding structure for a node is encoded in one place, a future change could move this encoding to live in the node class directly.

Adds `copy_spark_log_on_error` init configuration option. When true, driver logs are copied to the remote tmpdir if an error occurs. This is useful in support cases where users cannot copy logs off the dataproc server themselves as it has already shut down.

~Stacked on hail-is#14404~ This set of changes started out as refactoring and trying to better understand `extract.scala`. But the bigger change ended up being getting rid of AggLet in favor of allowing bindings in `Let` to be agg/scan bindings. So each binding in a `Let` now has a binding scope, encoded using the pre-existing `Scope` enum. I also renamed `Let` to `Block`. While I think that's a better name for what `Let` has become, and what I hope it will further become in the future, I initially tried to keep the name `Let`. I changed it so that I could keep `Let.apply` the way it is, specialized to bindings all in the eval scope. We can change all the uses of `Let.apply` seperately, I didn't want to pollute this already large pr with that.

In a89d64a, I modified `build.yaml` to release the wheel we had already built and tested. Unbeknownst to me was that we rebuild the wheel with a different version of `hail/python/hailtop/hailctl/deploy.yaml` and releasing the version used for testing borked `hailctl dataproc` commands. To fix this, we'll rebuild the wheel but use the `jar` we've already built and tested. This is safe to do as far as I know because we don't bundle any information into the jar that depends on the make flag `DEPLOY_REMOTE`. Fixes: hail-is#14452

…ge pushing instead of gcr-push

…ght take a while]

…s ci_storage_uri in CI Steps

…4454)

milo-hyben · 2024-05-21T04:40:47Z

batch/batch/batch.py

+            'duration': duration,
+            'cost': coalesce(record.get('cost'), 0),
+            'msec_mcpu': record['msec_mcpu'],
+            'cost_breakdown': cost_breakdown,


Conflict msg, I have meged incoming with our version.

++<<<<<<< HEAD + 'cost_breakdown': cost_breakdown, ++||||||| 6a6c38d5a ++ 'cost_breakdown': record['cost_breakdown'], ++======= + 'cost_breakdown': record['cost_breakdown'], + 'always_run': bool(record['always_run']), + 'display_state': None, ++>>>>>>> bea04d9

Good one — a nice straight-forward one to resolve

…il-is#14544)

iris-garden and others added 30 commits January 3, 2024 14:02

[batch] adds requester pays config to hailtop.fs.open (hail-is#13795)

cf293b9

Closes hail-is#13567.

[hailctl] fix hailctl for YHJ (hail-is#14118)

caea515

Oh Chrome, How do I call thee? Let me count the ways.

[grafana] encode azure monitor in grafana yaml (hail-is#14106)

dfc87fc

I've already deployed to the cluster to test. Unless master deploys, you can see it here: https://grafana.azure.hail.is/datasources/edit/kgzY5Io7k I also fixed the grafana Makefile.

[query] fix and enforce that workers transmit stack traces (hail-is#1…

ff8c7e2

…4122) Co-authored-by: Edmund Higham <ehigham@users.noreply.github.com>

[lint] reintroduces black for hail directory (hail-is#14119)

da27902

See hail-is#14120, which will add the resultant commit from this PR to `.git-blame-ignore-revs` once this PR is squashed and merged.

[precommit] ignore hail/python/(hail|test) from pyright (hail-is#14124)

67c4f8e

We do not yet have pyright running on the Hail python package nor the tests. Without this change, I get failures on push.

[lint] ignores blackening commit (hail-is#14120)

42bf451

See hail-is#14119.

[compiler] Add scalafix and scalafmt configs (hail-is#14126)

5606d94

Split from hail-is#14103 Co-Authored by @patrick-schultz This change includes some edits to prevent build failures caused by using the scala formatter --------- Co-authored-by: patrick-schultz <pschultz@broadinstitute.org>

[compiler] apply scalafmt to all scala sources (hail-is#14129)

422edf6

Run scalafmt on compiler scala sources. Split from hail-is#14103. Co-authored by @patrick-schultz.

[batch][query] silence warnings in plotly (hail-is#13974)

15c2ba5

Plotly relies on behavior that Pandas is deprecated. See plotly/plotly.py#4363 and plotly/plotly.py#4379. The fix (plotly/plotly.py#4379) merged into [main and was released in 5.18.0](plotly/plotly.py@57e4d1d).

[doctest] actually test the docs (hail-is#14112)

c45a81c

As written: - `make -C hail doctest` runs the normal tests. - `test_python_docs` only tests the RST files (e.g. [see this deploy](https://batch.hail.is/batches/8097884/jobs/127))

[batch] Remove worker shim classes for cloud credentials (hail-is#14125)

1a5f485

This change removes shim credential classes from the batch worker code and attempts to consolidate credential handling in the worker API classes.

[dependencies] update jinja2 everywhere (hail-is#14137)

4040c9c

[repo] exclude 422edf6 from blame (hail-is#14142)

d601d8c

[release] 0.2.127 (hail-is#14086)

bb535cd

[batch] provide sufficient information to debug transient errors (hai…

72669f7

…l-is#14151) This provides useful feedback we can use to understand why `test_file_in_current_dir` keeps failing.

[compiler] enforce scala formatting in ci (hail-is#14143)

3b33ee9

split from hail-is#14103 co-authored by @patrick-schultz

ehigham and others added 23 commits April 3, 2024 18:56

[release] split export statements out of desperation more than anythi…

7bec3b3

…ng (hail-is#14438)

[ci] Allow GCP g2 test to timeout (hail-is#14435)

eaccef4

This test times out occasionally, as far as we can tell due to unavailability of g2 machines. We've determined that for the time being we will allow the test to time out so it does not interfere with the CI pipeline.

[batch] Do not lock when querying billing projects (hail-is#14443)

e7c152a

Decided to tackle this as it was causing noise in the logs. There is no reason this read-only select query should lock any rows.

[release] dont rely on release.sh being executable (hail-is#14445)

41126be

Co-authored-by: Daniel Goldstein <danielgold95@gmail.com>

[query] Remove native llvm experiment (hail-is#14446)

6b97ef8

May it live on in the git history

[release] 0.2.130 (hail-is#14454)

bea04d9

Merge upstream HEAD(b7bde56, 2024-05-14) Stop writing to V2 tables

82e5406

Fixing merged conflicts.

7094105

Merge upstream HEAD(e68103e, 2024-05-14) Remove V2 tables

6146638

Fixing merged conflicts.

a19a9f1

Merge upstream HEAD(dc7fce0, 2024-05-14) Use CI's credentials for ima…

4fe048f

…ge pushing instead of gcr-push

Merge upstream HEAD(13de4e6, 2024-05-14) Add job groups [migration mi…

f73747b

…ght take a while]

Fixing merged conflicts.

a3cbc74

Removing the module ukbb clause

f50d5f5

Updating sql for func get_completed_batches_ordered_by_completed_time.

b37df1e

Merge upstream HEAD(6a6c38d, 2024-05-21) Expose HAIL_CI_STORAGE_URI a…

82b13e5

…s ci_storage_uri in CI Steps

Merge upstream HEAD(bea04d9, 2024-05-21) [release] 0.2.130 (hail-is#1…

8dd30c3

…4454)

milo-hyben commented May 21, 2024

View reviewed changes

milo-hyben requested a review from jmarshall May 21, 2024 04:42

milo-hyben marked this pull request as ready for review May 21, 2024 04:42

jmarshall approved these changes May 23, 2024

View reviewed changes

Allow os.PathLike paths for Batch.read_input()/read_input_group() (ha…

f8a4e39

…il-is#14544)

milo-hyben merged commit 6f76e36 into main May 24, 2024

milo-hyben deleted the upstream-14454 branch May 24, 2024 05:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream HEAD(bea04d9, 2024-05-21) [release] 0.2.130 (#14454) #335

Merge upstream HEAD(bea04d9, 2024-05-21) [release] 0.2.130 (#14454) #335

milo-hyben commented May 21, 2024

milo-hyben May 21, 2024 •

edited

Loading

jmarshall May 23, 2024

Merge upstream HEAD(bea04d9, 2024-05-21) [release] 0.2.130 (#14454) #335

Merge upstream HEAD(bea04d9, 2024-05-21) [release] 0.2.130 (#14454) #335

Conversation

milo-hyben commented May 21, 2024

milo-hyben May 21, 2024 • edited Loading

Choose a reason for hiding this comment

jmarshall May 23, 2024

Choose a reason for hiding this comment

milo-hyben May 21, 2024 •

edited

Loading