Upstream 2023 03 01 1 #271

illusional · 2023-03-01T03:16:57Z

Based on https://hail.zulipchat.com/#narrow/stream/300487-Hail-Batch-Dev/topic/resource.20table.20query.20woes/near/338629272, merge to 9e0081c and future commits (none of which are available now) need to be merged separately.

@lgruen, this the main merge about the memory service that we could potentially reverse the hack, but I don't know much about this, I just merged to the most relevant branch:

 val (open, write) = if (n <= 50) {
  (fs.openCachedNoCompression _, fs.writeCached _)
} else {
  ((x: String) => fs.openNoCompression(x), fs.writePDOS _)
}

Out version was merged to:

val (open, write) = ((x: String) => fs.openNoCompression(x), fs.writePDOS _)

* [query] IBD implemented in terms of block matrices * cleanup * fix * get debugging info * more debugging * fix temp file location * address comments * minor fix * fix? * checkpoints

@danking

* almost works! * MatrixTable.from_parts * spellcheck * self review * format with regret * make docs build * list any -> list of any * restore MatrixTable._from_java * test assertions too * @danking's review * reorder row fields * @danking's review * fix doc * more fix doc

…is#12678) * [query][scala-fs] more debugging information when memory fails * wording * log off the fast path

…12682)

* [batch] Add environment variable for batch id in worker * address comments?

* [batch] Profile more services not just batch-driver * fixes * only add the googlecloudprofiler logger filter if invoked * fix HAIL_SHOULD_PROFILE * sort * add cloud env variable to auth * fix

`gsutil` is one of the most user hostile tools I have ever used. Here are some examples of why. I think what I now have committed is the only way to achieve the behavior we want without assuming anything about which objects are present at the target. ``` $ gsutil -m cp -r baz gs://danking/baz/ $ gsutil ls gs://danking/baz/ gs://danking/baz/baz/ ``` ``` $ gsutil rm -rf gs://danking/baz $ gsutil -m cp -r baz/ gs://danking/baz/ $ gsutil ls gs://danking/baz/ gs://danking/baz/baz/ ``` ``` $ gsutil rm -rf gs://danking/baz $ gsutil -m cp -r baz gs://danking/baz $ gsutil ls gs://danking/baz/ gs://danking/baz/1 gs://danking/baz/2 ``` ``` $ gsutil rm -rf gs://danking/baz $ touch foo $ gsutil cp foo gs://danking/baz/foo $ gsutil -m cp -r baz gs://danking/baz $ gsutil ls gs://danking/baz gs://danking/baz/foo gs://danking/baz/baz/ ``` ``` $ gsutil rm -rf gs://danking/baz $ gsutil cp foo gs://danking/baz/foo $ gsutil -m cp -r baz/\* gs://danking/baz/ $ gsutil ls gs://danking/baz gs://danking/baz/1 gs://danking/baz/2 gs://danking/baz/foo ```

…12688) * [query] improve error message when rng_nonce is unparseable * Update ExecuteContext.scala

Efficiency can wait.

hail-is#12597) * [compiler] Refactor compiled functions to take a HailTaskContext instead of partitionIndex Partition index is now unnecessary due to the completion of the randomness redesign. HailTaskContext will be used in a subsequent PR to add task-level cleanup to permit aggressive caching in generated code. * fixes * oops writeIRs * fix combine * fix other issues in combine * fix * fix combop nonsense * bleh * fix * fix * bump

…-is#12675)

* [batch] Mitigate too many resources with same prices * add equality chcks * fix

…hail-is#12600) * [query] Make caching in PartitionNativeIntervalReader more aggressive Add finalizers to HailTaskContext to clean up open indices. * remove log * comment

…ail-is#12667) * [query] Add RouterFS to scala to permit local file system IO in QoB * erasure * scala is stupid * override open/create cached * fix cast

* [compiler] Iterative `DistinctlyKeyed` Analysis Use iterative tree traversals to prevent exceeding stack size for large IRs. * traverse all ir nodes

…il-is#12295) * [lowering] Rewrite maximal independent set to be its own IR And use this new IR to enable lowered execution of maximal_independent_set * fix * updates * Update Copy.scala * Update __init__.py * Requireness.scala rule * fix bindings * fix * remove bad error * fix requireness * custom element binding for tiebreaker arguments * Add Array[Long] to hail value conversion * Make maximalIndependentSet return an IndexedSeq Functions.unwrapReturn expects scala IndexedSeq not java Array * fix GraphSuite * Add (bad) unwrap rule for arrays of structs * xfails * better uwrapReturn for arrays of structs * fix bad merge * correct Requiredness rules * ChildEnvWithoutBindings rule --------- Co-authored-by: Dan King <daniel.zidan.king@gmail.com>

…#12700) Bumps [com.github.samtools:htsjdk](https://github.com/samtools/htsjdk) from 3.0.2 to 3.0.4. - [Release notes](https://github.com/samtools/htsjdk/releases) - [Commits](samtools/htsjdk@3.0.2...3.0.4) --- updated-dependencies: - dependency-name: com.github.samtools:htsjdk dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [query] add `hl.utils.genomic_range_table` CHANGELOG: In Query on Batch, `hl.balding_nichols_model` is slightly faster. Also added `hl.utils.genomic_range_table` to quickly create a table keyed by locus. It has grated on me for a while that `hl.balding_nichols_models` requires a whole pass to verify it is sorted even though it is plainly so. This change introduces the necessary infrastructure to convince Hail of that fact. * pylint * missing import * fix tests * add to __init__.py * better genomic range test tables

Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.2.2 to 2.2.3. - [Release notes](https://github.com/pallets/werkzeug/releases) - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst) - [Commits](pallets/werkzeug@2.2.2...2.2.3) --- updated-dependencies: - dependency-name: werkzeug dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

…l-is#12577)

Bumps [de.undercouch.download](https://github.com/michel-kraemer/gradle-download-task) from 5.3.0 to 5.3.1. - [Release notes](https://github.com/michel-kraemer/gradle-download-task/releases) - [Commits](michel-kraemer/gradle-download-task@5.3.0...5.3.1) --- updated-dependencies: - dependency-name: de.undercouch.download dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [query] Upgrade spark to 3.3.0 and dataproc to 2.1 * lint * lint * update zstd-jni * trust pyspark on patch versions * create specific exceptions to appease pylint

* [query] lower logistic SKAT CHANGELOG: `hl.skat(..., logistic=True)` now supported in the Batch backend. * minor doc fix * remove wrong sentence from docs * covariates are required now * support max_iter = 0 in logreg_fit * fix syntax * wip docs * docs iwp * final take on the math * docs fixes

…l-is#12709) * [batch] Mitigate test failures by extending batch client timeout * fix

Not all IOExceptions have a cause.

* [batch] Support job logs that are not UTF-8 compatible * redundant * add test for non-utf-8 log * fix * fix test * linting * address comments * lint

* [query] improve the memory client Currently, the memory client buffers the entire output in memory which is likely to cause OOMs. For reasons that are not entirely clear to me, sometimes these OOMs get muffled by our system and instead lead to non-termination. I vaguely remember this happening before with `using`. I suspect there is something somewhat subtle wrong with that method, but I am not certain. Anyway, there are four big changes here: 1. Do not buffer the entire request body in memory when writing to memory. 2. Because of (1) we have to pull retry behavior all the way up to the top-level where we know how to recreate the body. 3. Because of (2) it is easier to provide a `write(url)(writerFunction)` style API, which I do here. 4. Again, because of (2), and because I want to preserve the file-object-like interface, I added a somewhat funky anonymous class which uses a second thread to facilitate the movement of data written into the OutputStream returned by `create` into the OutputStream of the HTTP connection. Point (4) probably bears more explanation. The root issue is the bad Apache HTTP Client interface. Instead of `request` returning an OutputStream, it takes an "entity". An entity knows how to write itself into the OutputStream of an HTTP request. This works fine if the "writer" code is pased as a function (as in my new `write` method), but that does not work if the control flow looks like: f = create(...) f.write(...) f.close() We avoid this limited API by initiating the request in a second thread which will eventually block waiting to receive data from a PipedInputStream. That PipedInputStream produces the data written to a PipedOutputStream. The `create` call returns a positioned OutputStream which just writes data into the PipedOutputStream and handles cleaning up the thread when it is closed. In a multi-core system, network requests should proceed in parallel to the client code. In a single-core system, the written data will buffer until `close` is called which will definitely yield control to the other thread. * restore the retry * fix * fix * fix oerride

@tpoterba

Flags now use the same user configuration machinery we use for Batch and QoB. I am not certain this is the right choice. Feedback very welcome. The configuration_of function lets us uniformly treat any configuration by checking, in order: explicit argument, envvar, config file, or a fallback. I added a bit of code to allow us to support the envvars which do not conform to the new envvar scheme. I also removed a few flags that are no longer used. I kind of think these flags should actually be under a new section like "query_compiler" or something. @tpoterba, thoughts?

* [hailtop] Keep strong references to tasks on the event loop * isort * linting

…ed (hail-is#12708)

…fo (hail-is#12713) * [batch] Refactor resource billing checks with additional debugging info * delint

…2718)

* [tests] Use token in remote_tmpdir path * fix missing curly brace

…and not with a default feature flag (hail-is#12719) * [qob] Don't use local sort to shuffle... * fix python * use empty dict when flags is None --------- Co-authored-by: Daniel Goldstein <danielgold95@gmail.com>

* [batch] Allow python jobs to have file resources too * add test

* [qob] Enable liftovers for Query-on-Batch * reorganize python side of things * linting * dedup some stuff on the local and spark backends * dedup local spark and py4j backend functionality * linting * address some comments * cache orderings * simplify * fix

* [batch] Make driver parameters configurable in the UI * delint * address comments * address comment

…ith nest_asyncio (hail-is#12732)

…l-is#12728)

This the main merge about the memory service that's debatable, ```scala val (open, write) = if (n <= 50) { (fs.openCachedNoCompression _, fs.writeCached _) } else { ((x: String) => fs.openNoCompression(x), fs.writePDOS _) } ``` Merged to: ```scala val (open, write) = ((x: String) => fs.openNoCompression(x), fs.writePDOS _) ``` Conflicts: batch/batch/driver/instance_collection/pool.py batch/batch/driver/main.py batch/batch/driver/templates/pool.html batch/batch/inst_coll_config.py ci/ci/ci.py hail/src/main/scala/is/hail/backend/service/ServiceBackend.scala hail/src/main/scala/is/hail/backend/service/Worker.scala

lgruen · 2023-03-01T03:40:53Z

That merge looks right if we still want to disable the use of the memory service (which I think we do). There have been some improvements to it recently though. I'll add it to the agenda for our catch-up on Friday.

For now, I think we should just go ahead with the version we had so far.

jigold and others added 30 commits February 8, 2023 16:17

[query] IBD implemented in terms of block matrices (hail-is#12629)

eb0f48b

* [query] IBD implemented in terms of block matrices * cleanup * fix * get debugging info * more debugging * fix temp file location * address comments * minor fix * fix? * checkpoints

[docs] Improve export_bgen docs (hail-is#12661)

186714b

[docs] Fix example in truncate_reference_blocks docs (hail-is#12673)

b666975

[query][scala-fs] more debugging information when memory fails (hail-…

77f4cb0

…is#12678) * [query][scala-fs] more debugging information when memory fails * wording * log off the fast path

>= is not compatible with wildcards (hail-is#12677)

7d84db0

[compiler] Remove unused Interval.contains/overlaps methods (hail-is#…

447ea87

…12682)

[batch] Add environment variable for batch id in worker (hail-is#12662)

fcac149

* [batch] Add environment variable for batch id in worker * address comments?

[batch] Add regions to python jobs (hail-is#12684)

d2476f7

[batch] Profile more services not just batch-driver (hail-is#12681)

451fdcc

* [batch] Profile more services not just batch-driver * fixes * only add the googlecloudprofiler logger filter if invoked * fix HAIL_SHOULD_PROFILE * sort * add cloud env variable to auth * fix

[query] improve error message when rng_nonce is unparseable (hail-is#…

52feb78

…12688) * [query] improve error message when rng_nonce is unparseable * Update ExecuteContext.scala

[query] MatrixBlockMatrixWriter can be lowered (hail-is#12695)

9c0854e

Efficiency can wait.

[docs] Fix description of mean_dp in interval_coverage docs (hail…

8b12ee8

…-is#12675)

[query] improve hl.agg.call_stats error messages (hail-is#12676)

46aba89

[query] improve error message when HAIL_CLOUD is unset (hail-is#12687)

df3d682

[batch] Mitigate too many resources with same prices (hail-is#12654)

d03121f

* [batch] Mitigate too many resources with same prices * add equality chcks * fix

[query] Make caching in PartitionNativeIntervalReader more aggressive (…

2fbf68b

…hail-is#12600) * [query] Make caching in PartitionNativeIntervalReader more aggressive Add finalizers to HailTaskContext to clean up open indices. * remove log * comment

[query] Add RouterFS to scala to permit local file system IO in QoB (h…

0f85414

…ail-is#12667) * [query] Add RouterFS to scala to permit local file system IO in QoB * erasure * scala is stupid * override open/create cached * fix cast

[compiler] Iterative DistinctlyKeyed Analysis (hail-is#12696)

1fe6f2a

* [compiler] Iterative `DistinctlyKeyed` Analysis Use iterative tree traversals to prevent exceeding stack size for large IRs. * traverse all ir nodes

[query] make _blanczos_pca support implicitly transposing matrix (hai…

ae3d0d5

…l-is#12577)

[compiler] fix stack safety bug in TypeCheck (hail-is#12704)

6056c05

[query] Upgrade spark to 3.3.0 and dataproc to 2.1 (hail-is#12701)

75f351d

* [query] Upgrade spark to 3.3.0 and dataproc to 2.1 * lint * lint * update zstd-jni * trust pyspark on patch versions * create specific exceptions to appease pylint

daniel-goldstein and others added 25 commits February 17, 2023 12:47

[batch] Mitigate test failures by extending batch client timeout (hai…

dbfd553

…l-is#12709) * [batch] Mitigate test failures by extending batch client timeout * fix

[query] fix NPE introduced in hail-is#12555 (hail-is#12711)

5f7e4be

Not all IOExceptions have a cause.

[batch] Support job logs that are not UTF-8 compatible (hail-is#12666)

f7b3f10

* [batch] Support job logs that are not UTF-8 compatible * redundant * add test for non-utf-8 log * fix * fix test * linting * address comments * lint

[ci] Run the local backend tests much more in parallel (hail-is#12716)

b255123

[hailtop] Keep strong references to tasks (hail-is#12692)

d867b79

* [hailtop] Keep strong references to tasks on the event loop * isort * linting

[batch] Treat unauthorized in docker error message as permission deni…

2d65cf5

…ed (hail-is#12708)

[batch] Refactor resource billing checks with additional debugging in…

537f8f7

…fo (hail-is#12713) * [batch] Refactor resource billing checks with additional debugging info * delint

[query] Don't persist in table._filter_partitions (hail-is#12705)

669a48b

[query] Prune all fields before calculating new partitions (hail-is#1…

59fbbb9

…2718)

[test] fix CI tests to correctly print java progress (hail-is#12720)

691d4bd

[batch] Fix bug in resource billing updates (hail-is#12721)

60ee155

[tests] Use token in remote_tmpdir path for tests (hail-is#12693)

4b92fb6

* [tests] Use token in remote_tmpdir path * fix missing curly brace

[batch] Allow python jobs to have file resources too (hail-is#12726)

37d1cad

* [batch] Allow python jobs to have file resources too * add test

[ci] Remove unused Spark 3.2 base image (hail-is#12727)

8104a1c

[batch] Make driver parameters configurable in the UI (hail-is#12575)

4350aec

* [batch] Make driver parameters configurable in the UI * delint * address comments * address comment

[ci/test] Write java test progress to System.err (hail-is#12730)

7fb8646

[hailtop.utils] Fix async_to_blocking to support keyboard interrupt w…

60f47ac

…ith nest_asyncio (hail-is#12732)

[batch] Dont install the GCS connector in the batch worker image (hai…

e7945a8

…l-is#12728)

[batch] Cleanup old billing tables (hail-is#12710)

3cd6e4b

[batch] Add deduped id column to resources table (hail-is#12723)

9e0081c

illusional requested a review from lgruen March 1, 2023 03:16

lgruen approved these changes Mar 1, 2023

View reviewed changes

illusional merged commit b8b6004 into main Mar 1, 2023

illusional deleted the upstream-2023-03-01_1 branch March 1, 2023 20:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upstream 2023 03 01 1 #271

Upstream 2023 03 01 1 #271

illusional commented Mar 1, 2023

lgruen commented Mar 1, 2023

Upstream 2023 03 01 1 #271

Upstream 2023 03 01 1 #271

Conversation

illusional commented Mar 1, 2023

lgruen commented Mar 1, 2023