Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream 0.2.126 and proposed OOM-killer fix (3b38d0b) #317

Merged
merged 96 commits into from
Nov 10, 2023

Commits on Oct 11, 2023

  1. [linting] Catch unintended errors in check-sql.sh (hail-is#13745)

    I really borked our SQL linting. This PR is short but it catches a few
    critical problems.
    
    
    1. The point of `check-sql.sh` is to detect modifications or deletions
    of SQL files in PRs and fail if such a change occurs. Currently on
    `main` it does not detect modifications. In hail-is#13456, I removed the
    `delete-<service>-tables.sql` files (intentionally), so added the `^D`
    to the `grep` regex to indicate that it is OK to have a deletion. What I
    inadvertently did though is change the rule from "It's ok to have
    Additions of any file OR Modifications of estimated-current.sql /
    delete-<service>-tables.sql" to "It's ok to have Additions OR
    Modifications OR Deletions of estimated-current.sql /
    delete-<service>-tables.sql". Really this should have been "It's ok to
    have Additions OR Modifications of estimated-current.sql OR Deletions of
    delete-<service>-tables.sql". I've changed it to reflect that rule.
    
    2. Rules currently silently *pass* in CI with an error message that git
    is not installed. In hail-is#13437 I changed the image used to run the linters
    and inadvertently didn't include `git` which `check-sql.sh` needs to
    run. Here's how it failed in a sneaky way:
    - Since `git` is not installed, all calls to `git` fail, but the script
    is not run with `set -e` so every line of the script is executed
    - Since `git` lines fail, `modified_sql_file_list` remains empty
    - Since `modified_sql_file_list` remains empty, it appears to the check
    at the end that everything checked out
    - The if statement runs successfully and the script returns with error
    code 0
    
    To fix this I do a few things:
    - installed `git` in the linting image
    - `set -e` by default and only enable `set +e` later on when necessary
    (because we don't want a failed `git diff` to immediately exit)
    - Do away with the file checking and instead check the error code of the
    grep. If nothing survives the grep filter, which means there were no
    illegal changes made, grep will return with exit code 1. So we treat
    that exit code as a success.
    daniel-goldstein authored Oct 11, 2023
    Configuration menu
    Copy the full SHA
    5cd4eb6 View commit details
    Browse the repository at this point in the history
  2. [ci] Generate PR specific logging queries (hail-is#13744)

    Fixes hail-is#13556. I haven't tested these changes -- would like to get
    initial feedback first.
    jigold authored Oct 11, 2023
    Configuration menu
    Copy the full SHA
    b598a78 View commit details
    Browse the repository at this point in the history

Commits on Oct 12, 2023

  1. [query] fix local backend for install-editable (hail-is#13799)

    I neglected to include the extra classpath necessary when using a skinny
    JAR.
    danking authored Oct 12, 2023
    Configuration menu
    Copy the full SHA
    d187d52 View commit details
    Browse the repository at this point in the history
  2. [query] Dont compile code cache function three times (hail-is#13796)

    `f` is a thunk so it is currently being evaluated thrice before inserted
    into the code cache. The `compiledFunction` variable was unused so I
    think this is what was originally intended.
    daniel-goldstein authored Oct 12, 2023
    Configuration menu
    Copy the full SHA
    f7be69c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    87398e1 View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2023

  1. [query] fix ndarray concat with size 0 dims (hail-is#13755)

    ndarray concat was broken when the first input has size 0 along the
    concat axis. For example
    ```
    In [3]: hl.eval(hl.nd.hstack([hl.nd.zeros((2, 0)), hl.nd.array([[1.0, 2.0], [3.0, 4.0]])]))
    Out[3]:
    array([[0., 2.],
           [0., 4.]])
    ```
    The zeros matrix is 2 by 0, so horizontal concatenation should just
    return the other matrix.
    (I once saw the first column filled with random numbers, presumably from
    a buffer overflow)
    
    I did some cleaning up in the concat implementation, but the functional
    change is to record the index of the first input which is non-empty
    along the concat axis, and when resetting to the start of the axis,
    reset to that non-empty index. Other size 0 inputs are correctly handled
    when incrementing the index, the problem was that the first read happens
    before an increment.
    patrick-schultz authored Oct 13, 2023
    Configuration menu
    Copy the full SHA
    79c1c82 View commit details
    Browse the repository at this point in the history
  2. [compiler] rewrite ExtractIntervalFilters to be more robust (hail-is#…

    …13355)
    
    CHANGELOG: make hail's optimization rewriting filters to
    interval-filters smarter and more robust
    
    Completely rewrites ExtractIntervalFilters. Instead of matching against
    very specific patterns, and failing completely for things that don't
    quite match (e.g. an input is let bound, or the fold implementing "locus
    is contained in a set of intervals" is written slightly differently),
    this uses a standard abstract interpretation framework, which is almost
    completely insensitive to the form of the IR, only depending on the
    semantics. It also correctly handles missing key fields, where the
    previous implementation often produced an unsound transformation of the
    IR.
    
    Also adds a much more thorough test suite than we had before.
    
    At the top level, the analysis takes a boolean typed IR `cond` in an
    environment where there is a reference to some `key`, and produces a set
    `intervals`, such that `cond` is equivalent to `cond &
    intervals.contains(key)` (in other words `cond` implies
    `intervals.contains(key)`, or `intervals` contains all rows where `cond`
    is true). This means for instance it is safe to replace `TableFilter(t,
    cond)` with `TableFilter(TableFilterIntervals(t, intervals), cond)`.
    
    Then in a second pass it rewrites `cond` to `cond2`, such that `cond &
    (intervals.contains(key))` is equivalent to `cond2 &
    intervals.contains(key)` (in other words `cond` implies `cond2`, and
    `cond2 & intervals.contains(key)` implies `cond`). This means it is safe
    to replace the `TableFilter(t, cond)` with
    `TableFilter(TableFilterIntervals(t, intervals), cond2)`. A common
    example is when `cond` can be completely captured by the interval
    filter, i.e. `cond` is equivant to `intervals.contains(key)`, in which
    case we can take `cond2 = True`, and the `TableFilter` can be optimized
    away.
    
    This all happens in the function
    ```scala
      def extractPartitionFilters(ctx: ExecuteContext, cond: IR, ref: Ref, key: IndexedSeq[String]): Option[(IR, IndexedSeq[Interval])] = {
        if (key.isEmpty) None
        else {
          val extract = new ExtractIntervalFilters(ctx, ref.typ.asInstanceOf[TStruct].typeAfterSelectNames(key))
          val trueSet = extract.analyze(cond, ref.name)
          if (trueSet == extract.KeySetLattice.top)
            None
          else {
            val rw = extract.Rewrites(mutable.Set.empty, mutable.Set.empty)
            extract.analyze(cond, ref.name, Some(rw), trueSet)
            Some((extract.rewrite(cond, rw), trueSet))
          }
        }
      }
    ```
    `trueSet` is the set of intervals which contains all rows where `cond`
    is true. This set is passed back into `analyze` in a second pass, which
    asks it to rewrite `cond` to something equivalent, under the assumption
    that all keys are contained in `trueSet`.
    
    The abstraction of runtime values tracks two types of information:
    * Is this value a reference to / copy of one of the key fields of this
    row? We need to know this to be able to recognize comparisons with key
    values, which we want to extract to interval filters.
    * For boolean values (including, ultimately, the filter predicate
    itself), we track three sets of intervals of the key type:
    overapproximations of when the bool is true, false, and missing.
    Overapproximation here means, for example, if the boolean evaluates to
    true in some row with key `k`, then `k` must be contained in the "true"
    set of intervals. But it's completely fine if the set of intervals
    contains keys of rows where the bool is not true. In particular, a
    boolean about which we know nothing (e.g. it's just some non-key boolean
    field in the dataset) is represented by an abstract boolean value where
    all three sets are the set of all keys.
    patrick-schultz authored Oct 13, 2023
    Configuration menu
    Copy the full SHA
    bd6e397 View commit details
    Browse the repository at this point in the history

Commits on Oct 16, 2023

  1. Configuration menu
    Copy the full SHA
    9b54271 View commit details
    Browse the repository at this point in the history
  2. [hailtop] yet another transient error (hail-is#13817)

    CHANGELOG: Mitigate new transient error from Google Cloud Storage which
    manifests as `aiohttp.client_exceptions.ClientOSError: [Errno 1] [SSL:
    SSLV3_ALERT_BAD_RECORD_MAC] sslv3 alert bad record mac (_ssl.c:2548)`.
    
    As of around 1500 ET 2023-10-16, this exception happens whenever we
    issue a lot of requests to GCS.
    
    See [Zulip
    thread](https://hail.zulipchat.com/#narrow/stream/300487-Hail-Batch-Dev/topic/cluster.20size/near/396777320).
    danking authored Oct 16, 2023
    Configuration menu
    Copy the full SHA
    fcaafc5 View commit details
    Browse the repository at this point in the history
  3. [ci] fix Azure CI PR page (hail-is#13824)

    The `logging_queries` variable is always *defined* but sometimes `None`.
    danking authored Oct 16, 2023
    Configuration menu
    Copy the full SHA
    568858d View commit details
    Browse the repository at this point in the history
  4. [query] if worker or driver raise an exception, fail the Batch job (h…

    …ail-is#13715)
    
    CHANGELOG: Fixes hail-is#13697, a long standing issue with QoB, in which a
    failing partition job or driver job is not failed in the Batch UI.
    
    I am not sure why we did not do this this way in the first place. If a
    JVMJob raises an exception, Batch will mark the job as failed. Ergo, we
    should raise an exception when a driver or a worker fails!
    
    Here's an example: I used a simple pipeline that write to a bucket to
    which I have read-only access. You can see an example Batch (where every
    partition fails): https://batch.hail.is/batches/8046901. [1]
    
    ```python3
    import hail as hl
    hl.utils.range_table(3, n_partitions=3).write('gs://neale-bge/foo.ht')
    ```
    
    NB: I removed the `log.error` in `handleForPython` because that log is
    never necessary. That function converts a stack of exceptions into a
    triplet of the short message, the full exception with stack trace, and a
    Hail error id (if present). That triplet is always passed along to
    someone else who logs the exception.
    
    (FWIW, the error id indicates a Python source location that is
    associated with the error. On the Python-side, we can look up that error
    id and provide a better stack trace.)
    
    [1] You'll notice the logs are missing. I noticed this as well, it's a
    new bug. I fixed it in hail-is#13729.
    danking authored Oct 16, 2023
    Configuration menu
    Copy the full SHA
    e32e999 View commit details
    Browse the repository at this point in the history

Commits on Oct 17, 2023

  1. [query] faster array decoder (hail-is#13787)

    Picking up where hail-is#13776 left off.
    
    CHANGELOG: improved speed of reading hail format datasets from disk
    
    This PR speeds up decoding arrays in two main ways:
    * instead of calling `arrayType.isElementDefined(array, i)` on every
    single array element, which expands to
      ```scala
      val b = aoff + lengthHeaderBytes + (i >> 3)
      !((Memory.loadByte(b) & (1 << (i & 7).toInt)) != 0)
      ```
    process elements in groups of 64, and load the corresponding long of
    missing bits once
    * once we have a whole long of missing bits, we can be smarter than
    branching on each bit. After flipping to get `presentBits`, we use the
    following psuedocode to extract the positions of the set bits, with time
    proportional to the number of set bits:
      ```
      while (presentBits != 0) {
        val idx = java.lang.Long.numberOfTrailingZeroes(presentBits)
        // do something with idx
    presentBits = presentBits & (presentBits - 1) // unsets the rightmost
    set bit
      }
      ```
    
    To avoid needing to handle the last block of 64 elements differently,
    this PR changes the layout of `PCanonicalArray` to ensure the missing
    bits are always padded out to a multiple of 64 bits. They were already
    padded to a multiple of 32, and I don't expect this change to have much
    of an effect. But if needed, blocking by 32 elements instead had very
    similar performance in my benchmarks.
    
    I also experimented with unrolling loops. In the non-missing case, this
    is easy. In the missing case, I tried using `if (presentBits.bitCount >=
    8)` to guard an unrolled inner loop. In both cases, unrolling was if
    anything slower.
    
    Dan observed benefit from unrolling, but that was combined with the
    first optimization above (not loading a bit from memory every element),
    which I beleive was the real source of improvement.
    patrick-schultz authored Oct 17, 2023
    Configuration menu
    Copy the full SHA
    ecb7d86 View commit details
    Browse the repository at this point in the history
  2. [copier] minor clean up to copier (hail-is#13820)

    1. File rate is more interesting for small files.
    2. The source_report controls the progress bar. By updating it eagerly
    while we are listing a directory, the progress bar is more accurate
    sooner. We currently wait until we get a semaphore for a particular file
    to update the progress bar.
    danking authored Oct 17, 2023
    Configuration menu
    Copy the full SHA
    f0328b6 View commit details
    Browse the repository at this point in the history
  3. [fs] delete in batches, accept stdin (hail-is#13821)

    Some quality-of-life stuff for `hailtop.aiotools.delete`.
    
    Without deleting in batches, I found it impossible to delete very large
    lists of files because we create too many asyncio tasks.
    danking authored Oct 17, 2023
    Configuration menu
    Copy the full SHA
    c64d880 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9817797 View commit details
    Browse the repository at this point in the history
  5. [query] no unnecessary object allocations in RegionMemory.allocate (h…

    …ail-is#13794)
    
    Consider this:
    
    ```scala
    class Foo {
       def bar(): (Long, Long) = (3, 4)
    
       def destructure(): Unit = {
         val (x, y) = bar()
       }
    
       def accessors(): Unit = {
         val zz = bar()
         val x = zz._1
         val y = zz._2
       }
    }
    ```
    
    
    ![image](https://github.com/hail-is/hail/assets/106194/532dc7ea-8027-461d-8e12-3217f5451713)
    
    These should be exactly equivalent, right? There's no way Scala would
    compile the match into something horrible. Right? Right?
    
    ```
    public void destructure();
      Code:
         0: aload_0
         1: invokevirtual #27                 // Method bar:()Lscala/Tuple2;
         4: astore_3
         5: aload_3
         6: ifnull        35
         9: aload_3
        10: invokevirtual #33                 // Method scala/Tuple2._1$mcJ$sp:()J
        13: lstore        4
        15: aload_3
        16: invokevirtual #36                 // Method scala/Tuple2._2$mcJ$sp:()J
        19: lstore        6
        21: new           #13                 // class scala/Tuple2$mcJJ$sp
        24: dup
        25: lload         4
        27: lload         6
        29: invokespecial #21                 // Method scala/Tuple2$mcJJ$sp."<init>":(JJ)V
        32: goto          47
        35: goto          38
        38: new           #38                 // class scala/MatchError
        41: dup
        42: aload_3
        43: invokespecial #41                 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V
        46: athrow
        47: astore_2
        48: aload_2
        49: invokevirtual #33                 // Method scala/Tuple2._1$mcJ$sp:()J
        52: lstore        8
        54: aload_2
        55: invokevirtual #36                 // Method scala/Tuple2._2$mcJ$sp:()J
        58: lstore        10
        60: return
    
    public void accessors();
      Code:
         0: aload_0
         1: invokevirtual #27                 // Method bar:()Lscala/Tuple2;
         4: astore_1
         5: aload_1
         6: invokevirtual #33                 // Method scala/Tuple2._1$mcJ$sp:()J
         9: lstore_2
        10: aload_1
        11: invokevirtual #36                 // Method scala/Tuple2._2$mcJ$sp:()J
        14: lstore        4
        16: return
    ```
    
    Yeah, so, it extracts the first and second elements of the
    primitive-specialized tuple, ~~constructs a `(java.lang.Long,
    java.lang.Long)` Tuple~~ constructs another primitive-specialized tuple
    (for no reason???), then does the match on that.
    
    sigh.
    danking authored Oct 17, 2023
    Configuration menu
    Copy the full SHA
    30ec661 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    dc33c6d View commit details
    Browse the repository at this point in the history
  7. [batch] Setup database infrastructure for job groups (hail-is#13475)

    The conceptual change here is we want to parameterize all batch related
    tables to have a new job group ID that I've set to **0** for the root
    job group. We need to make sure all future inserts / updates into the
    batches table are propagated to the new job groups table. When we create
    a batch now, we also create the corresponding entries into the job
    groups and job group parents tables.
    
    I chose the root job group to be 0 as I think conceptually, the client
    should start numbering job groups at 1 and not know there is a hidden
    root job group being created under the hood. I'm not wedded to this.
    
    I tried to check for all the indices that would be needed in my
    prototype. It's possible I missed one or two, but it's not a big deal to
    add it later.
    
    I don't think we need to test this on a populated database (dev deploy
    main, submit jobs, then run the migration), but let me know if you think
    that would be helpful.
    jigold authored Oct 17, 2023
    Configuration menu
    Copy the full SHA
    efc83e7 View commit details
    Browse the repository at this point in the history

Commits on Oct 18, 2023

  1. [compiler] Don't append StmtX after ControlX (hail-is#13752)

    This change grew out of hail-is#13674.
    The idea is simple - we shouldn't be appending code after control
    statements as such statements are redundant. That idea opened pandora's
    box, but now we're not generating and dropping dead code anymore.
    
    Main changes that rose form fixing fallout from adding assert in
    `Block.append`:
    - Implement basic control-flow structures (if, while, for, switch) in
    `CodeBuilderLike` and remove the older implementations from `Code`.
    - main difference is these are built from sequencing `Code` operations
    rather than being defined from LIR
    - allows for a higher-level implementation that I think is simpler to
    read.
    - Use the type-system to prevent foot-guns like `cb.ifx(cond,
    label.goto)`.
    
    Other changes:
    - rename `ifx`, `forLoop` and `whileLoop` to just `if_`, `for_` and
    `while_`, respectively.
    - Implement loops in-terms of one-another to remove code duplication.
    - Fix logic for when to write IRs as some default value behaviour was
    broken when `HAIL_WRITE_IR_FILES` was set in tests
    ehigham authored Oct 18, 2023
    Configuration menu
    Copy the full SHA
    3189854 View commit details
    Browse the repository at this point in the history
  2. Verify logistic_regression_rows y param is column-indexed (hail-i…

    …s#13849)
    
    Fixes hail-is#13788:
    - Add `raise_unless_column_indexed` guard and apply to all
    column-indexed parameters in `statgen.py`.
    - Rename `check_row_indexed` and `check_entry_indexed` as I'm allergic
    to functions called "check" - now it's clearer what they do.
    ehigham authored Oct 18, 2023
    Configuration menu
    Copy the full SHA
    7eaf390 View commit details
    Browse the repository at this point in the history

Commits on Oct 19, 2023

  1. [fs] rate limit our use of parallelism in copy (hail-is#13833)

    This is the result of some experimentation. With ten-way parallelism,
    the copier very rarely gets rate-limited. With 75-way parallelism (the
    default), we almost always experience a tens of transient errors. If we
    start at ten and back off as in this PR, I can get to 75 with just a
    handful of transient errors.
    
    cc: @jigold
    danking authored Oct 19, 2023
    Configuration menu
    Copy the full SHA
    d721005 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    4132cd1 View commit details
    Browse the repository at this point in the history
  3. [fs] raise exceptions from failed tasks (hail-is#13831)

    Similar to hail-is#13818. We *must*
    retrieve exceptions from any task that is `done` otherwise we'll get a
    warning when the task is freed.
    danking authored Oct 19, 2023
    Configuration menu
    Copy the full SHA
    8ff5933 View commit details
    Browse the repository at this point in the history
  4. [batch] no logs on container deleted (hail-is#13857)

    Containers get deleted when a job is cancelled. This is not exceptional
    behavior.
    
    Example: https://cloudlogging.app.goo.gl/punCSPauoM1ZEqZ27
    danking authored Oct 19, 2023
    Configuration menu
    Copy the full SHA
    e107ebe View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    35994fb View commit details
    Browse the repository at this point in the history

Commits on Oct 20, 2023

  1. Configuration menu
    Copy the full SHA
    dc3aa8a View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    cc844ae View commit details
    Browse the repository at this point in the history
  3. [aiogoogle] finally squash an open http session (hail-is#13867)

    🤦
    
    I grepped for other hasattr that do not check `self`
    danking authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    d5bb0f6 View commit details
    Browse the repository at this point in the history
  4. [gear] properly close the aiomysql pools (hail-is#13869)

    We should never have been using `await`. (aiomysql should probably not
    implement `__await__`). `create_pool` returns
    `aiomysql.utils._PoolContextManager` which inherits from
    `aiomysql.utils._ContextManager` which implements `__await__`,
    `__aenter__`, and `__aexit__` thusly:
    
    ```python3
        def __await__(self):
            return self._coro.__await__()
    
        async def __aenter__(self):
            self._obj = await self._coro
            return self._obj
    
        async def __aexit__(self, exc_type, exc, tb):
            await self._obj.close()
            self._obj = None
    ```
    
    `__await__` is a footgun! You should never do that! You should close the
    return value of the coroutine!
    danking authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    85f9e57 View commit details
    Browse the repository at this point in the history
  5. [hailtop] types, new exceptions, better task exception handling (hail…

    …-is#13818)
    
    Besides the types and some transient exceptions, I think I fixed our
    task exception handling in several spots. Two things:
    
    1. We do not need to wait on a cancelled task. If it was not done, then
    it could not possibly have an exception to retrieve. Moreover, now that
    it is cancelled, there is nothing else to do. Cancellation is immediate.
    
    2. If a task is done, we *must* always retrieve the exception, otherwise
    we might not see an exception.
    danking authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    5029a31 View commit details
    Browse the repository at this point in the history
  6. [batch] wait long enough for aiohttp to clean itself up (hail-is#13868)

    We already do this in httpx.ClientSession. I just copied from there.
    danking authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    209404e View commit details
    Browse the repository at this point in the history
  7. [query] Avoid py4j for python-backend interactions (hail-is#13797)

    CHANGELOG: Fixes hail-is#13756: operations that collect large results such as
    `to_pandas` may require up to 3x less memory.
    
    This turns all "actions", i.e. backend methods supported by QoB into
    HTTP endpoints on the spark and local backends. This intentionally
    avoids py4j because py4j was really designed to pass function names and
    references around and does not handle large payloads well (such as
    results from a `collect`). Specifically, py4j uses a text-based protocol
    on top of TCP that substantially inflates the memory requirement for
    communicating large byte arrays. On the Java side, py4j serializes every
    binary payload as a Base64-encoded `java.lang.String`, which between the
    Base64 encoding and `String`'s use of UTF-16 results in a memory
    footprint of the `String` being `4/3 * 2 = 8/3` nearly three times the
    size of the byte array on either side of the py4j pipe. py4j also
    appears to do an entire copy of this payload, which means nearly a 6x
    memory requirement for sending back bytes. Using our own socket means we
    can directly send back the response bytes to python without any of this
    overhead, even going so far as to encode results directly into the TCP
    output stream. Formalizing the API between python and java also allows
    us to reuse the same payload schema across all three backends.
    daniel-goldstein authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    c73386f View commit details
    Browse the repository at this point in the history
  8. [query] improve field names and output of _same (hail-is#13825)

    Rori encountered some confusing output. _same is private, but this is
    good both for GVS/AoU and also for us.
    
    Things are indented properly, it uses as much terminal width as is
    available, the names are slightly less confusing, and we see both
    globals & row failures if both fail.
    
    ```python3
    Table._same: rows differ:
      Row mismatch at key=Struct(locus=Locus(contig=1, position=1, reference_genome=GRCh37), alleles=['A', 'C']):
        Left:
            [Struct(ancestral_af=0.381520365258488,
                    af=[0.6482459117152142],
                    __uid_entries_85=[Struct(GT=Call(alleles=[1, 1], phased=False)), Struct(GT=Call(alleles=[1, 1], phased=False))])]
        Right:
            [Struct(ancestral_af=0.381520365258488,
                    af=[0.2510276144176496],
                    __uid_entries_85=[Struct(GT=Call(alleles=[0, 0], phased=False)), Struct(GT=Call(alleles=[0, 1], phased=False))])]
      Row mismatch at key=Struct(locus=Locus(contig=1, position=2, reference_genome=GRCh37), alleles=['A', 'C']):
        Left:
            [Struct(ancestral_af=0.7058845354840656,
                    af=[0.5224710728099119],
                    __uid_entries_85=[Struct(GT=Call(alleles=[0, 0], phased=False)), Struct(GT=Call(alleles=[0, 0], phased=False))])]
        Right:
            [Struct(ancestral_af=0.7058845354840656,
                    af=[0.5042641171983404],
                    __uid_entries_85=[Struct(GT=Call(alleles=[1, 1], phased=False)), Struct(GT=Call(alleles=[1, 1], phased=False))])]
    ```
    
    versus
    
    ```python3
    Table._same: rows differ:
      Row mismatch at key=Struct(_key=Struct(locus=Locus(contig=1, position=1, reference_genome=GRCh37), alleles=['A', 'C'])):
        Left:
    [Struct(ancestral_af=0.381520365258488,
            af=[0.08835032612615329],
            __uid_39=[Struct(GT=Call(alleles=[0, 0], phased=False)),
                      Struct(GT=Call(alleles=[0, 0], phased=False))])]
        Right:
    [Struct(ancestral_af=0.381520365258488,
            af=[0.6631710694002383],
            __uid_39=[Struct(GT=Call(alleles=[0, 1], phased=False)),
                      Struct(GT=Call(alleles=[1, 1], phased=False))])]
      Row mismatch at key=Struct(_key=Struct(locus=Locus(contig=1, position=2, reference_genome=GRCh37), alleles=['A', 'C'])):
        Left:
    [Struct(ancestral_af=0.7058845354840656,
            af=[0.7020078954798737],
            __uid_39=[Struct(GT=Call(alleles=[0, 1], phased=False)),
                      Struct(GT=Call(alleles=[0, 1], phased=False))])]
        Right:
    [Struct(ancestral_af=0.7058845354840656,
            af=[0.6393831798848757],
            __uid_39=[Struct(GT=Call(alleles=[1, 1], phased=False)),
                      Struct(GT=Call(alleles=[0, 1], phased=False))])]
    ```
    danking authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    3a47bbd View commit details
    Browse the repository at this point in the history
  9. [qob] minor fixes to service backend (hail-is#13872)

    1. If a job errors rather than fails, we still want to see its logs in
    the debug info.
    
    2. The backend from before `hl_stop_for_test` is broken. In particular,
    it does not have an open ClientSession, so it cannot make HTTP requests.
    danking authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    494197d View commit details
    Browse the repository at this point in the history
  10. [batch] make cpu and mem resource readers similar and avoid race (hai…

    …l-is#13879)
    
    Fixes hail-is#13861. CPU monitor races with container deletion just like RAM
    monitor. I also switched to catching FileNotFoundError instead of exists
    since technically the file could disappear between us checking `exists`
    and us `open`ing it.
    danking authored Oct 20, 2023
    Configuration menu
    Copy the full SHA
    aecbbff View commit details
    Browse the repository at this point in the history

Commits on Oct 23, 2023

  1. [ci] Fix namespace used for logging queries on PR page (hail-is#13813)

    DEFAULT_NAMESPACE will always point to `default` for the production CI
    deployment.
    jigold authored Oct 23, 2023
    Configuration menu
    Copy the full SHA
    affd4cb View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    5d7dc2c View commit details
    Browse the repository at this point in the history
  3. [devbin] minor fixes (hail-is#13877)

    A missing import, a deprecated asyncio API, and print keys *with a
    namespace* (rather than just their name).
    danking authored Oct 23, 2023
    Configuration menu
    Copy the full SHA
    8490f67 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    86f2140 View commit details
    Browse the repository at this point in the history
  5. [website] add docs/latest/ (which currently redirects to 0.2) (hail-i…

    …s#13855)
    
    I also fixed local/dev docs building. You need Hail installed for the
    docs build to work because it tries to `import` the classes for which
    you're building docs.
    danking authored Oct 23, 2023
    Configuration menu
    Copy the full SHA
    66c9b01 View commit details
    Browse the repository at this point in the history

Commits on Oct 24, 2023

  1. [ci] Bump number of cores to 16 for db deployment jobs (hail-is#13822)

    I wanted to be safe rather than sorry for hail-is#13487 to make sure we have
    enough memory to fit all of the 100 row chunk endpoints in memory. We
    can switch it back to 4 cores after that PR is merged.
    jigold authored Oct 24, 2023
    Configuration menu
    Copy the full SHA
    914e6d0 View commit details
    Browse the repository at this point in the history
  2. [query] Teach hl.default_reference to set the default reference genome (

    hail-is#13888)
    
    CHANGELOG: hl.default_reference can now be passed an argument to change
    the default reference genome.
    
    Now, when hl.default_reference is passed a reference genome argument, it
    sets the default reference to that reference genome and returns None.
    chrisvittal authored Oct 24, 2023
    Configuration menu
    Copy the full SHA
    30b461a View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    6cc1a8d View commit details
    Browse the repository at this point in the history

Commits on Oct 25, 2023

  1. [hailctl] fix hailctl describe requester pays argument handling (hail…

    …-is#13826)
    
    Fix the overriding of the gcs_requester_pays/project config variable
    through using 'hailctl describe -u'.
    
    Closes hail-is#13793
    chrisvittal authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    ad852a0 View commit details
    Browse the repository at this point in the history
  2. [ci] Set database creation jobs back to 4 cores (hail-is#13898)

    The largest migration took 6.2 GB of memory. I think we'll be fine for
    awhile with 15 GB of memory (4 cores).
    jigold authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    817ea60 View commit details
    Browse the repository at this point in the history
  3. [make] Separate image building from image pushing (hail-is#13890)

    This came to mind yesterday during our pairing. This PR introduces the
    following properties that our image building targets do not currently
    have:
    1. If your intention is only to build images, you shouldn't need
    `kubectl`. When `DOCKER_PREFIX` is used as a docker build arg it is
    because we mirror some dockerhub images inside our registry (for
    reliability/rate limiting reasons). But for local building there's no
    reason you can't use the dockerhub image. Also, other people should be
    able to build the hail image if they want to!
    2. One should *only* need to use `kubectl` if they are intending to use
    an image in a kubernetes deployment. In other words, you should only
    need the private registry `DOCKER_PREFIX` for pushing images.
    3. One should not need to endure image pushing if the only goal is to
    build the image locally
    4. No intermediate tags should end up in the private registry. If we
    push on every image build, the private docker registry will accumulate a
    lot of `hail-ubuntu:dev-xxxxxx` tags that are never used again because
    `hail-ubuntu` is just an intermediate used to build other images. This
    does *not* change the number of layers that end up in the registry, but
    reduces a bit of the work that the registry cleanup job needs to do to
    untag and delete images and just seems cleaner.
    daniel-goldstein authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    c7ea3dd View commit details
    Browse the repository at this point in the history
  4. [batch] Update gcsfuse version to 1.2.0 (hail-is#13728)

    Let's see if this "just works".
    jigold authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    e739a95 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    2f69f8a View commit details
    Browse the repository at this point in the history
  6. [benchmark] Benchmarks on Ubuntu 22.04 (hail-is#13880)

    benchmark make targets are now made directly through the make targets in
    `benchmark/Makefile`. Previous targets in `hail/Makefile` have been
    deleted.
    To submit benchmarks in batch from the root directory, use
    ```
    $ BENCHMARK_LABEL=label NAMESPACE=default BENCHMARK_PROJECT=your-google-project make -C benchmark submit
    ```
    The benchmark docker image is now based of `hail-ubuntu` with a minimal
    set of requirements installed on top.
    The hail wheel is added after the benchmark wheel in attempt to optimise
    build times for changes to hail code.
    ehigham authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    b90cdd1 View commit details
    Browse the repository at this point in the history
  7. [query] Use EncodedLiteral instead of Literal from python to scala (h…

    …ail-is#13814)
    
    CHANGELOG: Pipelines that are memory-bound by copious use of
    `hl.literal`, such as `vds.filter_intervals`, require substantially less
    memory.
    
    Closes hail-is#13757
    daniel-goldstein authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    eabff3d View commit details
    Browse the repository at this point in the history
  8. [gradle] exclude scala-reflect (hail-is#13894)

    CHANGELOG: Fixes hail-is#13837 in which Hail could break a Spark installation
    if the Hail JAR appears on the classpath before the Scala JARs.
    
    We and several dependencies of ours are exposing an old version of
    scala-reflect (rude of us and them). If our JAR appears on the classpath
    before the scala JARs and the sub-versions of Scala differ, Spark is
    likely to have incompatible bytecode.
    
    See hail-is#13837 .
    
    In this PR, this command,
    
    ```
    make shadowJar && \
        jar tf build/libs/hail-all-spark.jar | grep scala/reflect | wc -l && \
        jar tf build/libs/hail-all-spark.jar | grep 'MutableSettings'
    ```
    
    prints:
    
    ```
        0
    
    ```
    
    On main (`5d7dc2cab7`) it prints:
    
    ```
        1417
    scala/reflect/internal/settings/MutableSettings$.class
    scala/reflect/internal/settings/MutableSettings$SettingValue.class
    scala/reflect/internal/settings/MutableSettings.class
    ```
    
    ---
    
    A bit more details follow for the curious.
    
    `./gradlew dependencies` shows these packages as depending on
    `scala-reflect`:
    
    ```
    +--- org.scalanlp:breeze-natives_2.12:1.1
    |    +--- org.scala-lang:scala-library:2.12.10 -> 2.12.17
    |    +--- org.scalanlp:breeze_2.12:1.1 -> 1.2
    |    |    +--- org.scala-lang:scala-library:2.12.10 -> 2.12.17
    |    |    +--- org.scalanlp:breeze-macros_2.12:1.2
    |    |    |    +--- org.scala-lang:scala-library:2.12.10 -> 2.12.17
    |    |    |    \--- org.scala-lang:scala-reflect:2.12.10 -> 2.12.15
    ...
    +--- org.elasticsearch:elasticsearch-spark-30_2.12:8.4.3
    |    +--- org.scala-lang:scala-library:2.12.8 -> 2.12.17
    |    +--- org.scala-lang:scala-reflect:2.12.8 -> 2.12.15 (*)
    |    \--- org.apache.spark:spark-core_2.12:3.2.1 -> 3.3.0
    ...
    |         +--- org.scala-lang:scala-reflect:2.12.15 (*)
    ...
    +--- org.scala-lang:scala-reflect:2.12.15 (*)
    ...
    +--- org.apache.spark:spark-sql_2.12:3.3.0
    ...
    |    +--- org.apache.spark:spark-catalyst_2.12:3.3.0
    |    |    +--- org.scala-lang:scala-reflect:2.12.15 (*)
    ```
    
    This root (the fourth to last package above):
    
    ```
    +--- org.scala-lang:scala-reflect:2.12.15 (*)
    ```
    
    is coming from this line in build.gradle:
    
    ```gradle
        shadow 'org.scala-lang:scala-reflect:' + scalaVersion
    ```
    
    I think `shadow` means "needed to compile but do not include in shadow
    JAR", so I do not know if we can trust `./gradlew dependencies` to tell
    us why any particular class file is in our shadow JAR.
    danking authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    0402aad View commit details
    Browse the repository at this point in the history
  9. [query] ensure nest_asyncio is applied, but only when necessary (hail…

    …-is#13899)
    
    CHANGELOG: Fix `RuntimeError: This event loop is already running` error
    when running hail in a Jupyter Notebook.
    
    Man this is really complicated.
    
    OK, so, things I learned:
    
    1. [asyncio will not create a new event loop if `set_event_loop` has
    been called even if `set_event_loop(None)` has since been
    called.](https://github.com/python/cpython/blob/main/Lib/asyncio/events.py#L676)
    2. [asyncio will not create a new event loop in a thread other than the
    main
    thread.](https://github.com/python/cpython/blob/main/Lib/asyncio/events.py#L677)
    3. `aiohttp.ClientSession` stashes a copy of the event loop present when
    it starts. This can cause all manner of extremely confusing behavior if
    you later change the event loop or use that session from a different
    thread.
    
    The fix, in the end, wasn't that complicated. Anywhere Hail explicitly
    asks for an event loop (so that we can run async code), we apply nest
    asyncio if the event loop is already running. Otherwise we do nothing.
    Nest asyncio appears to [no longer
    require](https://github.com/erdewit/nest_asyncio/tree/master#usage)
    `apply` to be called before the event loop starts running.
    
    This PR *does not* address:
    1. Hail nesting async code in sync code in async code. I think we should
    avoid this, but the `hailtop.fs` and `hailtop.batch` APIs, among others,
    need async versions before we can do that.
    2. This `aiohttp.ClientSession` nonsense. We really should take pains to
    ensure we create one `ClientSession` per loop and we never mix loops.
    danking authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    a53f4ff View commit details
    Browse the repository at this point in the history
  10. [release] 0.2.125 (hail-is#13905)

    Closes hail-is#13806
    
    ---------
    
    Co-authored-by: jigold <jigold@users.noreply.github.com>
    danking and jigold authored Oct 25, 2023
    Configuration menu
    Copy the full SHA
    c4e2880 View commit details
    Browse the repository at this point in the history

Commits on Oct 26, 2023

  1. Make SemanticHash Resilient to FileNotFoundExceptions (hail-is#13919

    )
    
    Fixes hail-is#13915
    `MatrixVCFReader` accepts glob patterns (wildcards in glob names). This
    bamboozled `SemanticHash` which had assumed all files had been resolved.
    This change fixes this by adding explicit `FileNotFoundException`
    handling to `SemanticHash` and replacing the `params.files` object of
    `MatrixVCFReader` with the resolved paths.
    ehigham authored Oct 26, 2023
    Configuration menu
    Copy the full SHA
    cfb7323 View commit details
    Browse the repository at this point in the history
  2. [query] avoid hanging the JVM in Dataproc (hail-is#13916)

    Non-daemon threads [keep a JVM
    alive](https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html):
    
    > When a Java Virtual Machine starts up, there is usually a single
    non-daemon thread (which typically calls the method named main of some
    designated class). The Java Virtual Machine continues to execute threads
    until either of the following occurs:
    >
    > The exit method of class Runtime has been called and the security
    manager has permitted the exit operation to take place.
    >
    > All threads that are not daemon threads have died, either by returning
    from the call to the run method or by throwing an exception that
    propagates beyond the run method.
    
    Spark appears to wait for the JVM to terminate before it considers a job
    complete.
    danking authored Oct 26, 2023
    Configuration menu
    Copy the full SHA
    87a6057 View commit details
    Browse the repository at this point in the history
  3. [makefile] use the pushed-private image name not the local image name (

    …hail-is#13926)
    
    Didn't verify but this seems closer to right.
    danking authored Oct 26, 2023
    Configuration menu
    Copy the full SHA
    0fe50ea View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    6e6f467 View commit details
    Browse the repository at this point in the history

Commits on Oct 27, 2023

  1. [ci] Use private base image for GPU testing image (hail-is#13750)

    Trying to rely on our private registry instead of dockerhub where we
    can.
    daniel-goldstein authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    b87500b View commit details
    Browse the repository at this point in the history
  2. [compiler] fix ndarray broadcasting with 0 sizes (hail-is#13751)

    Really bad example:
    ```python
    In [2]: a = hl.nd.array(2)
    
    In [3]: empty = np.array([], np.int32).reshape((0, 2, 2))
    
    In [4]: empty = hl.nd.array(empty)
    
    In [5]: hl.eval(empty)
    Out[5]: array([], shape=(0, 2, 2), dtype=int32)
    
    In [6]: hl.eval(a + empty)
    Out[6]:
    array([[[       2, 55622722],
            [       2,        3]]], dtype=int32)
    ```
    
    Result of sum of shape `()` with shape `(0, 2, 2)` should be shape `(0,
    2, 2)` by standard numpy broadcasting rules. Instead, we try to produce
    a result with shape `(1, 2, 2)`, which contains garbage data.
    patrick-schultz authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    75ee439 View commit details
    Browse the repository at this point in the history
  3. [fs] improve copy tool progress bars (hail-is#13832)

    I renamed RichProgressBar and SimpleRichProgressBar to
    ...CopyToolProgressBar because that is more accurate. I enhanced both to
    now include a count and a rate with the right units based on the
    description. It is a bit flaky because I need the descriptions to be
    exactly "files" or exactly "bytes" to pick the right units, but this
    seems fine for the specific case of th CopyToolProgressBar.
    
    There is probably a better way to build these UIs. I am sure we will
    start to figure that out as we use rich more.
    
    
    Before:
    <img width="830" alt="Screenshot 2023-10-16 at 18 28 14"
    src="https://github.com/hail-is/hail/assets/106194/95f8828e-beb3-46d2-9403-18ff7aa60256">
    
    After:
    <img width="830" alt="Screenshot 2023-10-16 at 18 27 53"
    src="https://github.com/hail-is/hail/assets/106194/01186b7c-d59f-4a0e-a1f6-9279fb50ae7e">
    danking authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    823b28d View commit details
    Browse the repository at this point in the history
  4. [batch] Only update log files if they exist (hail-is#13911)

    If a container is deleted before it ever runs, the log files won't
    exist.
    
    Fixes hail-is#13906
    Fixes hail-is#13907
    daniel-goldstein authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    73b91ed View commit details
    Browse the repository at this point in the history
  5. [ci] Dont tar the hail wheel in CI pipelines (hail-is#13912)

    IIUC the reason we `tar` the hail wheel to move it between jobs in the
    CI pipeline is because the wheel name must contain the pip version and
    that is not known statically in `build.yaml`. However, it would be just
    as effective to copy the wheel around inside a directory, and then we
    don't have to do all this tar'ing and untar'ing.
    
    cc @ehigham I'm happy to hold off on this if it would bork your branch
    daniel-goldstein authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    1c53068 View commit details
    Browse the repository at this point in the history
  6. [query] Delete unused experimental functions in compile.py (hail-is#1…

    …3918)
    
    `compiled_compare` and `compile_comparison_binary` are unused and broken
    as they depend on backend functionality that no longer exists. This
    deletes them and other functions that they depended on that are now
    themselves unused.
    daniel-goldstein authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    d3936da View commit details
    Browse the repository at this point in the history
  7. [devbin] Local UI development proxy server (hail-is#13632)

    This adds a script that runs an `aiohttp` server on localhost that by
    default proxies all requests to whatever batch namespace `hailctl` is
    pointed to but serves static assets and renders all HTML locally. This
    allows for local editing of HTML / CSS / SCSS files.
    
    To run, this branch needs to be deployed in the relevant namespace (to
    pick up the changes to `web_common.py`), and you need to have the `sass`
    CLI installed (`brew install sass/sass/sass`). Then run in a separate
    terminal `make devserver SERVICE=batch` and open up `localhost:8000`. If
    you want to supply fake data instead of proxying an endpoint to the k8s
    deployment of the service, you can override that endpoint in
    `dev_proxy.py`.
    
    If this looks good, I'll add the above instructions to the dev docs.
    
    Resolves hail-is#13629
    daniel-goldstein authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    8ec5361 View commit details
    Browse the repository at this point in the history
  8. [ci] Rename the deploy step to release (hail-is#13931)

    "deploy" is a really overloaded term when it comes to CI and release
    much more accurately reflects that this step does.
    daniel-goldstein authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    06f7483 View commit details
    Browse the repository at this point in the history
  9. [make] Take the rest of docker-build.sh arguments as docker build arg…

    …uments (hail-is#13910)
    
    hail-is#13890 removed an optional argument to `docker-build.sh`. Now that there
    are no optional arguments I think it is cleaner to claim any additional
    arguments as docker args instead of stuffing docker args into an
    environment variable.
    daniel-goldstein authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    010ff09 View commit details
    Browse the repository at this point in the history
  10. [batch] Switch to using v3 tables for batches and jobs billing (hail-…

    …is#13891)
    
    This PR switches all usages of v2 billing tables on the batch and
    batch-driver server code to use the v3 tables instead. Stacked on
    hail-is#12849.
    jigold authored Oct 27, 2023
    Configuration menu
    Copy the full SHA
    ebf4260 View commit details
    Browse the repository at this point in the history
  11. Configuration menu
    Copy the full SHA
    aade5c6 View commit details
    Browse the repository at this point in the history

Commits on Oct 30, 2023

  1. [query] fix bug in new dict decoder (hail-is#13939)

    The decoder uses a `StagedArrayBuilder` to hold elements while being
    sorted. The array builder is stored in a class field. When the same
    decoder function is called more than once, that array builder is reused.
    
    Before this fix, the array builder was never cleared, so if the decoder
    function was called more than once, the array builder would still
    contain the elements from previously decoded dicts.
    
    Since it's highly non-obvious that you need to call `clear` immediately
    after `new StagedCodeBuilder`, this PR makes the constructor take a
    CodeBuilder, and always inserts a clear at the call site. I also took
    the opportunity to CodeBuilderify the rest of the interface.
    patrick-schultz authored Oct 30, 2023
    Configuration menu
    Copy the full SHA
    28e56f7 View commit details
    Browse the repository at this point in the history
  2. [query] Update ReferenceGenome docs (hail-is#13924)

    Add a section on managing the default reference using the
    `default_reference` method.
    chrisvittal authored Oct 30, 2023
    Configuration menu
    Copy the full SHA
    5bc9a25 View commit details
    Browse the repository at this point in the history
  3. [pip] Release 0.2.126 (hail-is#13942)

    Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
    daniel-goldstein and danking authored Oct 30, 2023
    Configuration menu
    Copy the full SHA
    ee77707 View commit details
    Browse the repository at this point in the history
  4. [query] refactor approx_cdf to support manual combining (hail-is#13935)

    This PR enables users to combine results from multiple independent
    approx_cdf aggregators, for instance to allow updating quantile
    summaries when adding new samples.
    
    In more detail:
    * Change the result type of the internal aggregator to be a lossless
    representation of the internal state.
    * Add a registered function which expose the CombOp of the aggregator.
    * Move the function which computes the old result type from the internal
    state to python.
    * Add a flag `_raw` to `approx_cdf` which produces the internal result
    type, which supports combining; otherwise convert to the old result
    type.
    patrick-schultz authored Oct 30, 2023
    Configuration menu
    Copy the full SHA
    63c74fd View commit details
    Browse the repository at this point in the history

Commits on Oct 31, 2023

  1. Configuration menu
    Copy the full SHA
    5d42835 View commit details
    Browse the repository at this point in the history
  2. [query] Add semhash support for Table-to-Table Aggregations (hail-is#…

    …13922)
    
    Namely, TableKeyByAndAggregate and TableAggregateByKey
    ehigham authored Oct 31, 2023
    Configuration menu
    Copy the full SHA
    a950ede View commit details
    Browse the repository at this point in the history
  3. [ci] Delete old CreateDatabaseStep (hail-is#13947)

    Fixes hail-is#13895. This is unused now.
    jigold authored Oct 31, 2023
    Configuration menu
    Copy the full SHA
    22bcec2 View commit details
    Browse the repository at this point in the history
  4. [batch] Turn off autocomplete on billing projects page (hail-is#13946)

    Fixes hail-is#13860
    
    ---------
    
    Co-authored-by: iris <84595986+iris-garden@users.noreply.github.com>
    jigold and iris-garden authored Oct 31, 2023
    Configuration menu
    Copy the full SHA
    701bce0 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    40a3467 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    662dc75 View commit details
    Browse the repository at this point in the history

Commits on Nov 1, 2023

  1. [hailtop] Remove service_ns method in DeployConfig (hail-is#13959)

    The `DeployConfig.service_ns` doesn't really do anything, we always use
    the `_default_namespace`. This is maybe from an earlier age where some
    services might live in different namespaces.
    daniel-goldstein authored Nov 1, 2023
    Configuration menu
    Copy the full SHA
    3f0b115 View commit details
    Browse the repository at this point in the history

Commits on Nov 2, 2023

  1. [hailtop] Remove unused blocking httpx classes (hail-is#13965)

    These are not used as far as I can tell.
    daniel-goldstein authored Nov 2, 2023
    Configuration menu
    Copy the full SHA
    9badedb View commit details
    Browse the repository at this point in the history
  2. [benchmark] fix combiner benchmarks (hail-is#13956)

    The combiner benchmarks broke following the deletion of the
    `experimental.vcf_combiner` python package. Re-implement them in terms
    of the `vds` package.
    ehigham authored Nov 2, 2023
    Configuration menu
    Copy the full SHA
    5f4508f View commit details
    Browse the repository at this point in the history
  3. [compiler] Add Switch to IR (hail-is#13963)

    We have no high-level IR analogue to `CodeBuilderLike.switch`. Such a
    node is useful for flattening the IR in deeply-nested `If` nodes,
    predicated on integer equality.
    This partially addresses the stack-overflow error on the
    `matrix_muluti_write_nothing` benchmark, which currently has a
    stack-overflow error when computing the type of the CDA.
    ehigham authored Nov 2, 2023
    Configuration menu
    Copy the full SHA
    29c2966 View commit details
    Browse the repository at this point in the history
  4. [ci] Remove dead code from CI configs (hail-is#13967)

    - `CreateNamespaceStep.public` was entirely unused
    - `adminServiceAccount` is not used in `build.yaml` so
    `CreateNamespaceStep.admin_service_account` is always `None` meaning it
    has no effect.
    - The three environment variables that I deleted from the
    `deployment.yaml` are as far as I can tell entirely unused (they are now
    grabbed from the global config)
    daniel-goldstein authored Nov 2, 2023
    Configuration menu
    Copy the full SHA
    b4a3d57 View commit details
    Browse the repository at this point in the history
  5. [batch] Dont box data_disk_space_remaining (hail-is#13968)

    Very small change, something I noticed while working on something else
    entirely. Given how this is currently used I don't think it needs to be
    boxed anymore.
    daniel-goldstein authored Nov 2, 2023
    Configuration menu
    Copy the full SHA
    2c1188c View commit details
    Browse the repository at this point in the history
  6. [batch] Make GCP disk attachment idempotent (hail-is#13955)

    This is a fix for an error Ben found.
    
    ```
    Traceback (most recent call last):
      File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 1907, in run
        await self.setup_io()
      File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 1848, in setup_io
        await self.disk.create(labels=labels)
      File "/usr/local/lib/python3.9/dist-packages/batch/cloud/gcp/worker/disk.py", line 47, in create
        await self._attach()
      File "/usr/local/lib/python3.9/dist-packages/batch/cloud/gcp/worker/disk.py", line 112, in _attach
        self.last_response = await self.compute_client.attach_disk(
      File "/usr/local/lib/python3.9/dist-packages/hailtop/aiocloud/aiogoogle/client/compute_client.py", line 83, in attach_disk
        return await self._request_with_zonal_operations_response(self.post, path, params, **kwargs)
      File "/usr/local/lib/python3.9/dist-packages/hailtop/aiocloud/aiogoogle/client/compute_client.py", line 126, in _request_with_zonal_operations_response
        return await retry_transient_errors(request_and_wait)
      File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 763, in retry_transient_errors
        return await retry_transient_errors_with_debug_string('', 0, f, *args, **kwargs)
      File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 775, in retry_transient_errors_with_debug_string
        return await f(*args, **kwargs)
      File "/usr/local/lib/python3.9/dist-packages/hailtop/aiocloud/aiogoogle/client/compute_client.py", line 116, in request_and_wait
        raise GCPOperationError(result['httpErrorStatusCode'],
    hailtop.aiocloud.aiogoogle.client.compute_client.GCPOperationError: GCPOperationError: 400:BAD REQUEST ['RESOURCE_IN_USE_BY_ANOTHER_RESOURCE'] ["The disk resource 'projects/hail-vdc/zones/us-central1-b/disks/batch-disk-82XXXXX' is already being used by 'projects/hail-vdc/zones/us-central1-b/instances/batch-worker-default-standard-yjXXXX'"]; {'kind': 'compute#operation', 'id': 'XXXXX', 'name': 'operation-XXXXX', 'zone': 'https://www.googleapis.com/compute/v1/projects/hail-vdc/zones/us-central1-b', 'clientOperationId': 'XXXX', 'operationType': 'attachDisk', 'targetLink': 'https://www.googleapis.com/compute/v1/projects/hail-vdc/zones/us-central1-b/instances/batch-worker-default-standard-yjupd', 'targetId': 'XXXX', 'status': 'DONE', 'user': 'batch2-agent@hail-vdc.iam.gserviceaccount.com', 'progress': 100, 'insertTime': '2023-10-30T20:38:40.145-07:00', 'startTime': '2023-10-30T20:38:41.871-07:00', 'endTime': '2023-10-30T20:38:42.367-07:00', 'error': {'errors': [{'code': 'RESOURCE_IN_USE_BY_ANOTHER_RESOURCE', 'message': "The disk resource 'projects/hail-vdc/zones/us-central1-b/disks/batch-disk-82XXXXX' is already being used by 'projects/hail-vdc/zones/us-central1-b/instances/batch-worker-default-standard-yjXXXX'"}]}, 'httpErrorStatusCode': 400, 'httpErrorMessage': 'BAD REQUEST', 'selfLink': 'https://www.googleapis.com/compute/v1/projects/hail-vdc/zones/us-central1-b/operations/operation-XXX'}
    ```
    jigold authored Nov 2, 2023
    Configuration menu
    Copy the full SHA
    d231b40 View commit details
    Browse the repository at this point in the history
  7. [batch] Fix async exit stacks (hail-is#13969)

    I couldn't find the best issue for this. Should fix hail-is#13908, but I
    thought there was another issue about reducing noisy grafana alerts
    which this PR also addresses.
    jigold authored Nov 2, 2023
    Configuration menu
    Copy the full SHA
    a5c7a8a View commit details
    Browse the repository at this point in the history

Commits on Nov 6, 2023

  1. Configuration menu
    Copy the full SHA
    2e536ff View commit details
    Browse the repository at this point in the history
  2. [batch] Remove explicit settings in hailtop tests (hail-is#13973)

    Another small step in going key-less.
    daniel-goldstein authored Nov 6, 2023
    Configuration menu
    Copy the full SHA
    806baf9 View commit details
    Browse the repository at this point in the history
  3. [batch] Populate job groups in database (hail-is#13487)

    This PR just populates the records for older batch into the `job_groups`
    and `job_group_self_and_ancestors` tables.
    
    Stacked on hail-is#13475
    jigold authored Nov 6, 2023
    Configuration menu
    Copy the full SHA
    7d536c2 View commit details
    Browse the repository at this point in the history

Commits on Nov 7, 2023

  1. [batch] Fix enum column state in job groups table (hail-is#13982)

    I forgot that "open" was a valid batches state when I created the job
    groups table state column as an enum. This should fix the failed
    migration from hail-is#13487
    jigold authored Nov 7, 2023
    Configuration menu
    Copy the full SHA
    c4aa1c6 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    f73d92e View commit details
    Browse the repository at this point in the history

Commits on Nov 8, 2023

  1. [website] add a link to the community feedback survey to website (hai…

    …l-is#13986)
    
    The CSS for the website is a real mess. I initially tried to clean it
    up, but that became a time sink. We should eventually do that, but for
    now I made the minimal edits to get a reasonable looking layout.
    
    # Main Page
    ## Big
    
    <img width="2032" alt="Screenshot 2023-11-07 at 12 19 20"
    src="https://github.com/hail-is/hail/assets/106194/94c5c2d8-6a4d-44a9-888c-61b28d590857">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 19 27"
    src="https://github.com/hail-is/hail/assets/106194/8c35f736-cd56-4d8b-b5d6-3284592ff65a">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 19 29"
    src="https://github.com/hail-is/hail/assets/106194/9b396b45-bae5-469b-9825-b73a5cd8f917">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 19 31"
    src="https://github.com/hail-is/hail/assets/106194/8d27d238-5b3a-4c40-9c32-a7eb691c622b">
    
    ## Phone
    <img width="2032" alt="Screenshot 2023-11-07 at 12 22 42"
    src="https://github.com/hail-is/hail/assets/106194/be32332a-cdba-4f6d-b117-d7e8c163d8c8">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 22 44"
    src="https://github.com/hail-is/hail/assets/106194/ebc4f1d5-c728-4b0d-90f9-adbb9de4fd88">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 22 47"
    src="https://github.com/hail-is/hail/assets/106194/9cffe08a-fdfa-4af4-b060-cfd242c0642c">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 22 48"
    src="https://github.com/hail-is/hail/assets/106194/f5e5b09a-8692-4411-ba13-e7055c17be70">
    
    
    # Docs
    ## Big
    <img width="2032" alt="Screenshot 2023-11-07 at 12 24 09"
    src="https://github.com/hail-is/hail/assets/106194/564a47e5-8036-4e60-a7fc-16e5aeeabd94">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 24 19"
    src="https://github.com/hail-is/hail/assets/106194/0d954da0-8bdb-49e0-aa66-4ac5e0acb1f4">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 24 25"
    src="https://github.com/hail-is/hail/assets/106194/e0466542-90d3-440c-a7a5-b797b88af63c">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 24 40"
    src="https://github.com/hail-is/hail/assets/106194/4d5e5946-b014-484c-b404-3e9bd4389378">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 24 49"
    src="https://github.com/hail-is/hail/assets/106194/5e2e4666-3bac-4560-a831-4e2ea05de0ae">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 24 55"
    src="https://github.com/hail-is/hail/assets/106194/5f103ee1-a168-47ca-a5b2-f1385d4deac9">
    
    ## Phone
    <img width="2032" alt="Screenshot 2023-11-07 at 12 25 21"
    src="https://github.com/hail-is/hail/assets/106194/087b638c-e6f8-4633-9639-9f188b6b2e57">
    <img width="2032" alt="Screenshot 2023-11-07 at 12 25 23"
    src="https://github.com/hail-is/hail/assets/106194/cba530ea-d75c-4609-8307-16b3096a0e8c">
    
    With the navbar open, in mobile, it looks the same as the non-docs
    pages.
    danking authored Nov 8, 2023
    Configuration menu
    Copy the full SHA
    67801df View commit details
    Browse the repository at this point in the history
  2. [query] Use apache commons lang3 instead of lang (hail-is#13970)

    `org.apache.commons.lang` is from the `commons-lang` library, but in
    `build.gradle` we explicitly depend on `commons-lang3` which puts
    everything under the `lang3` package. We must be picking up
    `commons-lang` as some transitive dependency but we no longer get it in
    Spark 3.4. Regardless, better to use what we explicitly depend on.
    daniel-goldstein authored Nov 8, 2023
    Configuration menu
    Copy the full SHA
    0618348 View commit details
    Browse the repository at this point in the history

Commits on Nov 9, 2023

  1. [hailtop] Dont nest event loops unnecessarily (hail-is#13677)

    Removes any occurences of async / sync / async nesting in the code, i.e.
    a coroutine should not involve somewhere deep down a synchronous call
    that blocks on the completion of an async task.
    
    ---------
    
    Co-authored-by: Dan King <dking@broadinstitute.org>
    daniel-goldstein and Dan King authored Nov 9, 2023
    Configuration menu
    Copy the full SHA
    ebfbc24 View commit details
    Browse the repository at this point in the history
  2. [batch] Prevent the worker container from being killed on OOM event (h…

    …ail-is#13977)
    
    This PR fixes that crun has a different way to specify memory
    requirements when using cgroups v2 instead of cgroups v1. Should fix
    hail-is#13902.
    jigold authored Nov 9, 2023
    Configuration menu
    Copy the full SHA
    3b38d0b View commit details
    Browse the repository at this point in the history
  3. Merge upstream HEAD (3b38d0b, 2023-11-10) for 0.2.126 and OOM fix

    In particular, we need to incorporate and test hail-is#13977
    as the proposed fix for jobs becoming unresponsive due to being
    targeted by the kernel's OOM-killer.
    
    (Our local gcsfuse repo workaround is replaced by upstream's.)
    jmarshall committed Nov 9, 2023
    Configuration menu
    Copy the full SHA
    7e15f72 View commit details
    Browse the repository at this point in the history