Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream 2023 03 01 1 #271

Merged
merged 57 commits into from
Mar 1, 2023
Merged

Upstream 2023 03 01 1 #271

merged 57 commits into from
Mar 1, 2023

Conversation

illusional
Copy link

Based on https://hail.zulipchat.com/#narrow/stream/300487-Hail-Batch-Dev/topic/resource.20table.20query.20woes/near/338629272, merge to 9e0081c and future commits (none of which are available now) need to be merged separately.

@lgruen, this the main merge about the memory service that we could potentially reverse the hack, but I don't know much about this, I just merged to the most relevant branch:

 val (open, write) = if (n <= 50) {
  (fs.openCachedNoCompression _, fs.writeCached _)
} else {
  ((x: String) => fs.openNoCompression(x), fs.writePDOS _)
}

Out version was merged to:

val (open, write) = ((x: String) => fs.openNoCompression(x), fs.writePDOS _)

jigold and others added 30 commits February 8, 2023 16:17
* [query] IBD implemented in terms of block matrices

* cleanup

* fix

* get debugging info

* more debugging

* fix temp file location

* address comments

* minor fix

* fix?

* checkpoints
* almost works!

* MatrixTable.from_parts

* spellcheck

* self review

* format with regret

* make docs build

* list any -> list of any

* restore MatrixTable._from_java

* test assertions too

* @danking's review

* reorder row fields

* @danking's review

* fix doc

* more fix doc
…is#12678)

* [query][scala-fs] more debugging information when memory fails

* wording

* log off the fast path
* [batch] Add environment variable for batch id in worker

* address comments?
* [batch] Profile more services not just batch-driver

* fixes

* only add the googlecloudprofiler logger filter if invoked

* fix HAIL_SHOULD_PROFILE

* sort

* add cloud env variable to auth

* fix
`gsutil` is one of the most user hostile tools I have ever used. Here are some examples of why. I
think what I now have committed is the only way to achieve the behavior we want without assuming
anything about which objects are present at the target.

```
$ gsutil -m cp -r baz gs://danking/baz/
$ gsutil ls gs://danking/baz/
gs://danking/baz/baz/
```
```
$ gsutil rm -rf gs://danking/baz
$ gsutil -m cp -r baz/ gs://danking/baz/
$ gsutil ls gs://danking/baz/
gs://danking/baz/baz/
```
```
$ gsutil rm -rf gs://danking/baz
$ gsutil -m cp -r baz gs://danking/baz
$ gsutil ls gs://danking/baz/
gs://danking/baz/1
gs://danking/baz/2
```
```
$ gsutil rm -rf gs://danking/baz
$ touch foo
$ gsutil cp foo gs://danking/baz/foo
$ gsutil -m cp -r baz gs://danking/baz
$ gsutil ls gs://danking/baz
gs://danking/baz/foo
gs://danking/baz/baz/
```
```
$ gsutil rm -rf gs://danking/baz
$ gsutil cp foo gs://danking/baz/foo
$ gsutil -m cp -r baz/\* gs://danking/baz/
$ gsutil ls gs://danking/baz
gs://danking/baz/1
gs://danking/baz/2
gs://danking/baz/foo
```
…12688)

* [query] improve error message when rng_nonce is unparseable

* Update ExecuteContext.scala
hail-is#12597)

* [compiler] Refactor compiled functions to take a HailTaskContext instead of partitionIndex

Partition index is now unnecessary due to the completion of the randomness redesign.

HailTaskContext will be used in a subsequent PR to add task-level cleanup to permit
aggressive caching in generated code.

* fixes

* oops writeIRs

* fix combine

* fix other issues in combine

* fix

* fix combop nonsense

* bleh

* fix

* fix

* bump
* [batch] Mitigate too many resources with same prices

* add equality chcks

* fix
…hail-is#12600)

* [query] Make caching in PartitionNativeIntervalReader more aggressive

Add finalizers to HailTaskContext to clean up open indices.

* remove log

* comment
…ail-is#12667)

* [query] Add RouterFS to scala to permit local file system IO in QoB

* erasure

* scala is stupid

* override open/create cached

* fix cast
* [compiler] Iterative `DistinctlyKeyed` Analysis
Use iterative tree traversals to prevent exceeding stack size for large IRs.

* traverse all ir nodes
…il-is#12295)

* [lowering] Rewrite maximal independent set to be its own IR

And use this new IR to enable lowered execution of
maximal_independent_set

* fix

* updates

* Update Copy.scala

* Update __init__.py

* Requireness.scala rule

* fix bindings

* fix

* remove bad error

* fix requireness

* custom element binding for tiebreaker arguments

* Add Array[Long] to hail value conversion

* Make maximalIndependentSet return an IndexedSeq

Functions.unwrapReturn expects scala IndexedSeq not java Array

* fix GraphSuite

* Add (bad) unwrap rule for arrays of structs

* xfails

* better uwrapReturn for arrays of structs

* fix bad merge

* correct Requiredness rules

* ChildEnvWithoutBindings rule

---------

Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
…#12700)

Bumps [com.github.samtools:htsjdk](https://github.com/samtools/htsjdk) from 3.0.2 to 3.0.4.
- [Release notes](https://github.com/samtools/htsjdk/releases)
- [Commits](samtools/htsjdk@3.0.2...3.0.4)

---
updated-dependencies:
- dependency-name: com.github.samtools:htsjdk
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [query] add `hl.utils.genomic_range_table`

CHANGELOG: In Query on Batch, `hl.balding_nichols_model` is slightly faster. Also added `hl.utils.genomic_range_table` to quickly create a table keyed by locus.

It has grated on me for a while that `hl.balding_nichols_models` requires a whole pass to verify it
is sorted even though it is plainly so. This change introduces the necessary infrastructure to
convince Hail of that fact.

* pylint

* missing import

* fix tests

* add to __init__.py

* better genomic range test tables
Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.2.2 to 2.2.3.
- [Release notes](https://github.com/pallets/werkzeug/releases)
- [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
- [Commits](pallets/werkzeug@2.2.2...2.2.3)

---
updated-dependencies:
- dependency-name: werkzeug
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [de.undercouch.download](https://github.com/michel-kraemer/gradle-download-task) from 5.3.0 to 5.3.1.
- [Release notes](https://github.com/michel-kraemer/gradle-download-task/releases)
- [Commits](michel-kraemer/gradle-download-task@5.3.0...5.3.1)

---
updated-dependencies:
- dependency-name: de.undercouch.download
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [query] Upgrade spark to 3.3.0 and dataproc to 2.1

* lint

* lint

* update zstd-jni

* trust pyspark on patch versions

* create specific exceptions to appease pylint
* [query] lower logistic SKAT

CHANGELOG: `hl.skat(..., logistic=True)` now supported in the Batch backend.

* minor doc fix

* remove wrong sentence from docs

* covariates are required now

* support max_iter = 0 in logreg_fit

* fix syntax

* wip docs

* docs iwp

* final take on the math

* docs fixes
daniel-goldstein and others added 25 commits February 17, 2023 12:47
…l-is#12709)

* [batch] Mitigate test failures by extending batch client timeout

* fix
* [batch] Support job logs that are not UTF-8 compatible

* redundant

* add test for non-utf-8 log

* fix

* fix test

* linting

* address comments

* lint
* [query] improve the memory client

Currently, the memory client buffers the entire output in memory which is likely
to cause OOMs. For reasons that are not entirely clear to me, sometimes these OOMs
get muffled by our system and instead lead to non-termination. I vaguely remember
this happening before with `using`. I suspect there is something somewhat subtle
wrong with that method, but I am not certain.

Anyway, there are four big changes here:
1. Do not buffer the entire request body in memory when writing to memory.
2. Because of (1) we have to pull retry behavior all the way up to the top-level where we know how to recreate the body.
3. Because of (2) it is easier to provide a `write(url)(writerFunction)` style API, which I do here.
4. Again, because of (2), and because I want to preserve the file-object-like interface, I added a somewhat funky anonymous class which uses a second thread to facilitate the movement of data written into the OutputStream returned by `create` into the OutputStream of the HTTP connection.

Point (4) probably bears more explanation. The root issue is the bad Apache HTTP Client interface.
Instead of `request` returning an OutputStream, it takes an "entity". An entity knows how to write
itself into the OutputStream of an HTTP request. This works fine if the "writer" code is pased as
a function (as in my new `write` method), but that does not work if the control flow looks like:

    f = create(...)
    f.write(...)
    f.close()

We avoid this limited API by initiating the request in a second thread which will eventually block
waiting to receive data from a PipedInputStream. That PipedInputStream produces the data written to
a PipedOutputStream. The `create` call returns a positioned OutputStream which just writes data into
the PipedOutputStream and handles cleaning up the thread when it is closed.

In a multi-core system, network requests should proceed in parallel to the client code. In a
single-core system, the written data will buffer until `close` is called which will definitely yield
control to the other thread.

* restore the retry

* fix

* fix

* fix oerride
Flags now use the same user configuration machinery we use for Batch and QoB. I am not certain
this is the right choice. Feedback very welcome. The configuration_of function lets us uniformly
treat any configuration by checking, in order: explicit argument, envvar, config file, or a
fallback.

I added a bit of code to allow us to support the envvars which do not conform to the new envvar
scheme.

I also removed a few flags that are no longer used.

I kind of think these flags should actually be under a new section like "query_compiler" or
something.

@tpoterba, thoughts?
* [hailtop] Keep strong references to tasks on the event loop

* isort

* linting
…fo (hail-is#12713)

* [batch] Refactor resource billing checks with additional debugging info

* delint
* [tests] Use token in remote_tmpdir path

* fix missing curly brace
…and not with a default feature flag (hail-is#12719)

* [qob] Don't use local sort to shuffle...

* fix python

* use empty dict when flags is None

---------

Co-authored-by: Daniel Goldstein <danielgold95@gmail.com>
* [batch] Allow python jobs to have file resources too

* add test
* [qob] Enable liftovers for Query-on-Batch

* reorganize python side of things

* linting

* dedup some stuff on the local and spark backends

* dedup local spark and py4j backend functionality

* linting

* address some comments

* cache orderings

* simplify

* fix
* [batch] Make driver parameters configurable in the UI

* delint

* address comments

* address comment
This the main merge about the memory service that's debatable,

```scala
 val (open, write) = if (n <= 50) {
  (fs.openCachedNoCompression _, fs.writeCached _)
} else {
  ((x: String) => fs.openNoCompression(x), fs.writePDOS _)
}
```

Merged to:

```scala
val (open, write) = ((x: String) => fs.openNoCompression(x), fs.writePDOS _)
```

Conflicts:
        batch/batch/driver/instance_collection/pool.py
	batch/batch/driver/main.py
	batch/batch/driver/templates/pool.html
	batch/batch/inst_coll_config.py
	ci/ci/ci.py
	hail/src/main/scala/is/hail/backend/service/ServiceBackend.scala
	hail/src/main/scala/is/hail/backend/service/Worker.scala
@illusional illusional requested a review from lgruen March 1, 2023 03:16
@lgruen
Copy link

lgruen commented Mar 1, 2023

That merge looks right if we still want to disable the use of the memory service (which I think we do). There have been some improvements to it recently though. I'll add it to the agenda for our catch-up on Friday.

For now, I think we should just go ahead with the version we had so far.

@illusional illusional merged commit b8b6004 into main Mar 1, 2023
@illusional illusional deleted the upstream-2023-03-01_1 branch March 1, 2023 20:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants