Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream 2023 03 01 1 #271

Merged
merged 57 commits into from
Mar 1, 2023
Merged

Upstream 2023 03 01 1 #271

merged 57 commits into from
Mar 1, 2023

Commits on Feb 8, 2023

  1. [query] IBD implemented in terms of block matrices (hail-is#12629)

    * [query] IBD implemented in terms of block matrices
    
    * cleanup
    
    * fix
    
    * get debugging info
    
    * more debugging
    
    * fix temp file location
    
    * address comments
    
    * minor fix
    
    * fix?
    
    * checkpoints
    jigold authored Feb 8, 2023
    Configuration menu
    Copy the full SHA
    eb0f48b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    186714b View commit details
    Browse the repository at this point in the history

Commits on Feb 9, 2023

  1. Configuration menu
    Copy the full SHA
    b666975 View commit details
    Browse the repository at this point in the history

Commits on Feb 10, 2023

  1. [query] Construct MatrixTable from Constituent Parts (hail-is#12659)

    * almost works!
    
    * MatrixTable.from_parts
    
    * spellcheck
    
    * self review
    
    * format with regret
    
    * make docs build
    
    * list any -> list of any
    
    * restore MatrixTable._from_java
    
    * test assertions too
    
    * @danking's review
    
    * reorder row fields
    
    * @danking's review
    
    * fix doc
    
    * more fix doc
    ehigham authored Feb 10, 2023
    Configuration menu
    Copy the full SHA
    6c8901f View commit details
    Browse the repository at this point in the history
  2. [query][scala-fs] more debugging information when memory fails (hail-…

    …is#12678)
    
    * [query][scala-fs] more debugging information when memory fails
    
    * wording
    
    * log off the fast path
    danking authored Feb 10, 2023
    Configuration menu
    Copy the full SHA
    77f4cb0 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7d84db0 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    447ea87 View commit details
    Browse the repository at this point in the history
  5. [batch] Add environment variable for batch id in worker (hail-is#12662)

    * [batch] Add environment variable for batch id in worker
    
    * address comments?
    jigold authored Feb 10, 2023
    Configuration menu
    Copy the full SHA
    fcac149 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    d2476f7 View commit details
    Browse the repository at this point in the history

Commits on Feb 11, 2023

  1. [batch] Profile more services not just batch-driver (hail-is#12681)

    * [batch] Profile more services not just batch-driver
    
    * fixes
    
    * only add the googlecloudprofiler logger filter if invoked
    
    * fix HAIL_SHOULD_PROFILE
    
    * sort
    
    * add cloud env variable to auth
    
    * fix
    daniel-goldstein authored Feb 11, 2023
    Configuration menu
    Copy the full SHA
    451fdcc View commit details
    Browse the repository at this point in the history

Commits on Feb 12, 2023

  1. [query] fix gsutil cp invocation (hail-is#12686)

    `gsutil` is one of the most user hostile tools I have ever used. Here are some examples of why. I
    think what I now have committed is the only way to achieve the behavior we want without assuming
    anything about which objects are present at the target.
    
    ```
    $ gsutil -m cp -r baz gs://danking/baz/
    $ gsutil ls gs://danking/baz/
    gs://danking/baz/baz/
    ```
    ```
    $ gsutil rm -rf gs://danking/baz
    $ gsutil -m cp -r baz/ gs://danking/baz/
    $ gsutil ls gs://danking/baz/
    gs://danking/baz/baz/
    ```
    ```
    $ gsutil rm -rf gs://danking/baz
    $ gsutil -m cp -r baz gs://danking/baz
    $ gsutil ls gs://danking/baz/
    gs://danking/baz/1
    gs://danking/baz/2
    ```
    ```
    $ gsutil rm -rf gs://danking/baz
    $ touch foo
    $ gsutil cp foo gs://danking/baz/foo
    $ gsutil -m cp -r baz gs://danking/baz
    $ gsutil ls gs://danking/baz
    gs://danking/baz/foo
    gs://danking/baz/baz/
    ```
    ```
    $ gsutil rm -rf gs://danking/baz
    $ gsutil cp foo gs://danking/baz/foo
    $ gsutil -m cp -r baz/\* gs://danking/baz/
    $ gsutil ls gs://danking/baz
    gs://danking/baz/1
    gs://danking/baz/2
    gs://danking/baz/foo
    ```
    danking authored Feb 12, 2023
    Configuration menu
    Copy the full SHA
    2aba588 View commit details
    Browse the repository at this point in the history

Commits on Feb 13, 2023

  1. [query] improve error message when rng_nonce is unparseable (hail-is#…

    …12688)
    
    * [query] improve error message when rng_nonce is unparseable
    
    * Update ExecuteContext.scala
    danking authored Feb 13, 2023
    Configuration menu
    Copy the full SHA
    52feb78 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9c0854e View commit details
    Browse the repository at this point in the history

Commits on Feb 14, 2023

  1. [compiler] Refactor compiled functions to take a HailTaskContext inst… (

    hail-is#12597)
    
    * [compiler] Refactor compiled functions to take a HailTaskContext instead of partitionIndex
    
    Partition index is now unnecessary due to the completion of the randomness redesign.
    
    HailTaskContext will be used in a subsequent PR to add task-level cleanup to permit
    aggressive caching in generated code.
    
    * fixes
    
    * oops writeIRs
    
    * fix combine
    
    * fix other issues in combine
    
    * fix
    
    * fix combop nonsense
    
    * bleh
    
    * fix
    
    * fix
    
    * bump
    tpoterba authored Feb 14, 2023
    Configuration menu
    Copy the full SHA
    0b4218d View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    8b12ee8 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    46aba89 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    df3d682 View commit details
    Browse the repository at this point in the history
  5. [batch] Mitigate too many resources with same prices (hail-is#12654)

    * [batch] Mitigate too many resources with same prices
    
    * add equality chcks
    
    * fix
    jigold authored Feb 14, 2023
    Configuration menu
    Copy the full SHA
    d03121f View commit details
    Browse the repository at this point in the history

Commits on Feb 15, 2023

  1. [query] Make caching in PartitionNativeIntervalReader more aggressive (

    …hail-is#12600)
    
    * [query] Make caching in PartitionNativeIntervalReader more aggressive
    
    Add finalizers to HailTaskContext to clean up open indices.
    
    * remove log
    
    * comment
    tpoterba authored Feb 15, 2023
    Configuration menu
    Copy the full SHA
    2fbf68b View commit details
    Browse the repository at this point in the history
  2. [query] Add RouterFS to scala to permit local file system IO in QoB (h…

    …ail-is#12667)
    
    * [query] Add RouterFS to scala to permit local file system IO in QoB
    
    * erasure
    
    * scala is stupid
    
    * override open/create cached
    
    * fix cast
    tpoterba authored Feb 15, 2023
    Configuration menu
    Copy the full SHA
    0f85414 View commit details
    Browse the repository at this point in the history
  3. [compiler] Iterative DistinctlyKeyed Analysis (hail-is#12696)

    * [compiler] Iterative `DistinctlyKeyed` Analysis
    Use iterative tree traversals to prevent exceeding stack size for large IRs.
    
    * traverse all ir nodes
    ehigham authored Feb 15, 2023
    Configuration menu
    Copy the full SHA
    1fe6f2a View commit details
    Browse the repository at this point in the history

Commits on Feb 16, 2023

  1. [lowering] Rewrite maximal independent set to be its own value IR (ha…

    …il-is#12295)
    
    * [lowering] Rewrite maximal independent set to be its own IR
    
    And use this new IR to enable lowered execution of
    maximal_independent_set
    
    * fix
    
    * updates
    
    * Update Copy.scala
    
    * Update __init__.py
    
    * Requireness.scala rule
    
    * fix bindings
    
    * fix
    
    * remove bad error
    
    * fix requireness
    
    * custom element binding for tiebreaker arguments
    
    * Add Array[Long] to hail value conversion
    
    * Make maximalIndependentSet return an IndexedSeq
    
    Functions.unwrapReturn expects scala IndexedSeq not java Array
    
    * fix GraphSuite
    
    * Add (bad) unwrap rule for arrays of structs
    
    * xfails
    
    * better uwrapReturn for arrays of structs
    
    * fix bad merge
    
    * correct Requiredness rules
    
    * ChildEnvWithoutBindings rule
    
    ---------
    
    Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
    chrisvittal and danking authored Feb 16, 2023
    Configuration menu
    Copy the full SHA
    235d2bc View commit details
    Browse the repository at this point in the history
  2. Bump com.github.samtools:htsjdk from 3.0.2 to 3.0.4 in /hail (hail-is…

    …#12700)
    
    Bumps [com.github.samtools:htsjdk](https://github.com/samtools/htsjdk) from 3.0.2 to 3.0.4.
    - [Release notes](https://github.com/samtools/htsjdk/releases)
    - [Commits](samtools/htsjdk@3.0.2...3.0.4)
    
    ---
    updated-dependencies:
    - dependency-name: com.github.samtools:htsjdk
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Feb 16, 2023
    Configuration menu
    Copy the full SHA
    76e92ac View commit details
    Browse the repository at this point in the history
  3. [query] add hl.utils.genomic_range_table (hail-is#12679)

    * [query] add `hl.utils.genomic_range_table`
    
    CHANGELOG: In Query on Batch, `hl.balding_nichols_model` is slightly faster. Also added `hl.utils.genomic_range_table` to quickly create a table keyed by locus.
    
    It has grated on me for a while that `hl.balding_nichols_models` requires a whole pass to verify it
    is sorted even though it is plainly so. This change introduces the necessary infrastructure to
    convince Hail of that fact.
    
    * pylint
    
    * missing import
    
    * fix tests
    
    * add to __init__.py
    
    * better genomic range test tables
    danking authored Feb 16, 2023
    Configuration menu
    Copy the full SHA
    68b89a3 View commit details
    Browse the repository at this point in the history
  4. Bump werkzeug from 2.2.2 to 2.2.3 in /docker (hail-is#12703)

    Bumps [werkzeug](https://github.com/pallets/werkzeug) from 2.2.2 to 2.2.3.
    - [Release notes](https://github.com/pallets/werkzeug/releases)
    - [Changelog](https://github.com/pallets/werkzeug/blob/main/CHANGES.rst)
    - [Commits](pallets/werkzeug@2.2.2...2.2.3)
    
    ---
    updated-dependencies:
    - dependency-name: werkzeug
      dependency-type: direct:production
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Feb 16, 2023
    Configuration menu
    Copy the full SHA
    2490ed1 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    ae3d0d5 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    6056c05 View commit details
    Browse the repository at this point in the history
  7. Bump de.undercouch.download from 5.3.0 to 5.3.1 in /hail (hail-is#12707)

    Bumps [de.undercouch.download](https://github.com/michel-kraemer/gradle-download-task) from 5.3.0 to 5.3.1.
    - [Release notes](https://github.com/michel-kraemer/gradle-download-task/releases)
    - [Commits](michel-kraemer/gradle-download-task@5.3.0...5.3.1)
    
    ---
    updated-dependencies:
    - dependency-name: de.undercouch.download
      dependency-type: direct:production
      update-type: version-update:semver-patch
    ...
    
    Signed-off-by: dependabot[bot] <support@github.com>
    Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
    dependabot[bot] authored Feb 16, 2023
    Configuration menu
    Copy the full SHA
    17e5568 View commit details
    Browse the repository at this point in the history

Commits on Feb 17, 2023

  1. [query] Upgrade spark to 3.3.0 and dataproc to 2.1 (hail-is#12701)

    * [query] Upgrade spark to 3.3.0 and dataproc to 2.1
    
    * lint
    
    * lint
    
    * update zstd-jni
    
    * trust pyspark on patch versions
    
    * create specific exceptions to appease pylint
    daniel-goldstein authored Feb 17, 2023
    Configuration menu
    Copy the full SHA
    75f351d View commit details
    Browse the repository at this point in the history
  2. [query] lower logistic SKAT (hail-is#12643)

    * [query] lower logistic SKAT
    
    CHANGELOG: `hl.skat(..., logistic=True)` now supported in the Batch backend.
    
    * minor doc fix
    
    * remove wrong sentence from docs
    
    * covariates are required now
    
    * support max_iter = 0 in logreg_fit
    
    * fix syntax
    
    * wip docs
    
    * docs iwp
    
    * final take on the math
    
    * docs fixes
    danking authored Feb 17, 2023
    Configuration menu
    Copy the full SHA
    9c5851c View commit details
    Browse the repository at this point in the history
  3. [compiler] refactor BlockMatrix lowering to generate small IR (hail-i…

    …s#12624)
    
    * wip
    
    * lower BlockMatrixNativeReader
    
    * lower BlockMatrixBinaryReader
    
    * lower BlockMatrixRandom
    
    * refactor BMS2, lower BlockMatrixWrite
    
    * lower GetElement and BlockMatrixMap
    
    * make variable names private
    
    * lower map2
    
    * lower first broadcast cases
    
    * lower diagonal
    
    * lower remaining broadcasts
    
    * runtime bm sparsity abstractions
    
    * take BMType off BMSContexts
    
    * fixes
    
    * try different assertion
    
    * fixes
    
    * fix diagonal and xfails
    
    * fix requiredness test
    
    * fix transpose and enable test
    
    * don't use ExecuteContext.scoped in parser
    
    * lower BlockMatrixFilter
    
    * add withSparsity short-circuit
    
    * refactor letBindings and broadcasts
    
    * fix service xfails
    
    * rename EvalContext -> IRBuilder
    
    * move IRBuilder
    patrick-schultz authored Feb 17, 2023
    Configuration menu
    Copy the full SHA
    87997a7 View commit details
    Browse the repository at this point in the history
  4. [batch] Fix adding new resources with same rates (hail-is#12698)

    * [batch] More debugging info for resource rate changes
    
    * fix lint
    
    * fix actual bug
    
    * delint
    
    ---------
    
    Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
    jigold and danking authored Feb 17, 2023
    Configuration menu
    Copy the full SHA
    12b0942 View commit details
    Browse the repository at this point in the history
  5. [batch] Mitigate test failures by extending batch client timeout (hai…

    …l-is#12709)
    
    * [batch] Mitigate test failures by extending batch client timeout
    
    * fix
    daniel-goldstein authored Feb 17, 2023
    Configuration menu
    Copy the full SHA
    dbfd553 View commit details
    Browse the repository at this point in the history
  6. [query] fix NPE introduced in hail-is#12555 (hail-is#12711)

    Not all IOExceptions have a cause.
    danking authored Feb 17, 2023
    Configuration menu
    Copy the full SHA
    5f7e4be View commit details
    Browse the repository at this point in the history
  7. [batch] Support job logs that are not UTF-8 compatible (hail-is#12666)

    * [batch] Support job logs that are not UTF-8 compatible
    
    * redundant
    
    * add test for non-utf-8 log
    
    * fix
    
    * fix test
    
    * linting
    
    * address comments
    
    * lint
    daniel-goldstein authored Feb 17, 2023
    Configuration menu
    Copy the full SHA
    f7b3f10 View commit details
    Browse the repository at this point in the history
  8. [query] improve the memory client (hail-is#12691)

    * [query] improve the memory client
    
    Currently, the memory client buffers the entire output in memory which is likely
    to cause OOMs. For reasons that are not entirely clear to me, sometimes these OOMs
    get muffled by our system and instead lead to non-termination. I vaguely remember
    this happening before with `using`. I suspect there is something somewhat subtle
    wrong with that method, but I am not certain.
    
    Anyway, there are four big changes here:
    1. Do not buffer the entire request body in memory when writing to memory.
    2. Because of (1) we have to pull retry behavior all the way up to the top-level where we know how to recreate the body.
    3. Because of (2) it is easier to provide a `write(url)(writerFunction)` style API, which I do here.
    4. Again, because of (2), and because I want to preserve the file-object-like interface, I added a somewhat funky anonymous class which uses a second thread to facilitate the movement of data written into the OutputStream returned by `create` into the OutputStream of the HTTP connection.
    
    Point (4) probably bears more explanation. The root issue is the bad Apache HTTP Client interface.
    Instead of `request` returning an OutputStream, it takes an "entity". An entity knows how to write
    itself into the OutputStream of an HTTP request. This works fine if the "writer" code is pased as
    a function (as in my new `write` method), but that does not work if the control flow looks like:
    
        f = create(...)
        f.write(...)
        f.close()
    
    We avoid this limited API by initiating the request in a second thread which will eventually block
    waiting to receive data from a PipedInputStream. That PipedInputStream produces the data written to
    a PipedOutputStream. The `create` call returns a positioned OutputStream which just writes data into
    the PipedOutputStream and handles cleaning up the thread when it is closed.
    
    In a multi-core system, network requests should proceed in parallel to the client code. In a
    single-core system, the written data will buffer until `close` is called which will definitely yield
    control to the other thread.
    
    * restore the retry
    
    * fix
    
    * fix
    
    * fix oerride
    danking authored Feb 17, 2023
    Configuration menu
    Copy the full SHA
    abd47fa View commit details
    Browse the repository at this point in the history

Commits on Feb 18, 2023

  1. Configuration menu
    Copy the full SHA
    b255123 View commit details
    Browse the repository at this point in the history
  2. [query] rework flags and fix QoB flags (hail-is#12423)

    Flags now use the same user configuration machinery we use for Batch and QoB. I am not certain
    this is the right choice. Feedback very welcome. The configuration_of function lets us uniformly
    treat any configuration by checking, in order: explicit argument, envvar, config file, or a
    fallback.
    
    I added a bit of code to allow us to support the envvars which do not conform to the new envvar
    scheme.
    
    I also removed a few flags that are no longer used.
    
    I kind of think these flags should actually be under a new section like "query_compiler" or
    something.
    
    @tpoterba, thoughts?
    danking authored Feb 18, 2023
    Configuration menu
    Copy the full SHA
    1b2dba1 View commit details
    Browse the repository at this point in the history
  3. [hailtop] Keep strong references to tasks (hail-is#12692)

    * [hailtop] Keep strong references to tasks on the event loop
    
    * isort
    
    * linting
    daniel-goldstein authored Feb 18, 2023
    Configuration menu
    Copy the full SHA
    d867b79 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    2d65cf5 View commit details
    Browse the repository at this point in the history
  5. [batch] Refactor resource billing checks with additional debugging in…

    …fo (hail-is#12713)
    
    * [batch] Refactor resource billing checks with additional debugging info
    
    * delint
    jigold authored Feb 18, 2023
    Configuration menu
    Copy the full SHA
    537f8f7 View commit details
    Browse the repository at this point in the history

Commits on Feb 21, 2023

  1. Configuration menu
    Copy the full SHA
    669a48b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    59fbbb9 View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2023

  1. Configuration menu
    Copy the full SHA
    691d4bd View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    60ee155 View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2023

  1. [tests] Use token in remote_tmpdir path for tests (hail-is#12693)

    * [tests] Use token in remote_tmpdir path
    
    * fix missing curly brace
    jigold authored Feb 23, 2023
    Configuration menu
    Copy the full SHA
    4b92fb6 View commit details
    Browse the repository at this point in the history
  2. [qob] Update ServiceBackend logic to use distributed sort by default …

    …and not with a default feature flag (hail-is#12719)
    
    * [qob] Don't use local sort to shuffle...
    
    * fix python
    
    * use empty dict when flags is None
    
    ---------
    
    Co-authored-by: Daniel Goldstein <danielgold95@gmail.com>
    tpoterba and daniel-goldstein authored Feb 23, 2023
    Configuration menu
    Copy the full SHA
    a49c76a View commit details
    Browse the repository at this point in the history
  3. [batch] Allow python jobs to have file resources too (hail-is#12726)

    * [batch] Allow python jobs to have file resources too
    
    * add test
    daniel-goldstein authored Feb 23, 2023
    Configuration menu
    Copy the full SHA
    37d1cad View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    8104a1c View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2023

  1. [qob] Enable liftovers for Query-on-Batch (hail-is#12699)

    * [qob] Enable liftovers for Query-on-Batch
    
    * reorganize python side of things
    
    * linting
    
    * dedup some stuff on the local and spark backends
    
    * dedup local spark and py4j backend functionality
    
    * linting
    
    * address some comments
    
    * cache orderings
    
    * simplify
    
    * fix
    daniel-goldstein authored Feb 24, 2023
    Configuration menu
    Copy the full SHA
    995994c View commit details
    Browse the repository at this point in the history
  2. [batch] Make driver parameters configurable in the UI (hail-is#12575)

    * [batch] Make driver parameters configurable in the UI
    
    * delint
    
    * address comments
    
    * address comment
    jigold authored Feb 24, 2023
    Configuration menu
    Copy the full SHA
    4350aec View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    7fb8646 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    60f47ac View commit details
    Browse the repository at this point in the history

Commits on Feb 25, 2023

  1. Configuration menu
    Copy the full SHA
    e7945a8 View commit details
    Browse the repository at this point in the history

Commits on Feb 27, 2023

  1. Configuration menu
    Copy the full SHA
    3cd6e4b View commit details
    Browse the repository at this point in the history

Commits on Feb 28, 2023

  1. Configuration menu
    Copy the full SHA
    9e0081c View commit details
    Browse the repository at this point in the history

Commits on Mar 1, 2023

  1. Merge commit '9e0081c' into upstream-2023-03-01_1

    This the main merge about the memory service that's debatable,
    
    ```scala
     val (open, write) = if (n <= 50) {
      (fs.openCachedNoCompression _, fs.writeCached _)
    } else {
      ((x: String) => fs.openNoCompression(x), fs.writePDOS _)
    }
    ```
    
    Merged to:
    
    ```scala
    val (open, write) = ((x: String) => fs.openNoCompression(x), fs.writePDOS _)
    ```
    
    Conflicts:
            batch/batch/driver/instance_collection/pool.py
    	batch/batch/driver/main.py
    	batch/batch/driver/templates/pool.html
    	batch/batch/inst_coll_config.py
    	ci/ci/ci.py
    	hail/src/main/scala/is/hail/backend/service/ServiceBackend.scala
    	hail/src/main/scala/is/hail/backend/service/Worker.scala
    illusional committed Mar 1, 2023
    Configuration menu
    Copy the full SHA
    601daf5 View commit details
    Browse the repository at this point in the history