-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upstream 0.2.126 and proposed OOM-killer fix (3b38d0b) #317
Commits on Oct 11, 2023
-
[linting] Catch unintended errors in check-sql.sh (hail-is#13745)
I really borked our SQL linting. This PR is short but it catches a few critical problems. 1. The point of `check-sql.sh` is to detect modifications or deletions of SQL files in PRs and fail if such a change occurs. Currently on `main` it does not detect modifications. In hail-is#13456, I removed the `delete-<service>-tables.sql` files (intentionally), so added the `^D` to the `grep` regex to indicate that it is OK to have a deletion. What I inadvertently did though is change the rule from "It's ok to have Additions of any file OR Modifications of estimated-current.sql / delete-<service>-tables.sql" to "It's ok to have Additions OR Modifications OR Deletions of estimated-current.sql / delete-<service>-tables.sql". Really this should have been "It's ok to have Additions OR Modifications of estimated-current.sql OR Deletions of delete-<service>-tables.sql". I've changed it to reflect that rule. 2. Rules currently silently *pass* in CI with an error message that git is not installed. In hail-is#13437 I changed the image used to run the linters and inadvertently didn't include `git` which `check-sql.sh` needs to run. Here's how it failed in a sneaky way: - Since `git` is not installed, all calls to `git` fail, but the script is not run with `set -e` so every line of the script is executed - Since `git` lines fail, `modified_sql_file_list` remains empty - Since `modified_sql_file_list` remains empty, it appears to the check at the end that everything checked out - The if statement runs successfully and the script returns with error code 0 To fix this I do a few things: - installed `git` in the linting image - `set -e` by default and only enable `set +e` later on when necessary (because we don't want a failed `git diff` to immediately exit) - Do away with the file checking and instead check the error code of the grep. If nothing survives the grep filter, which means there were no illegal changes made, grep will return with exit code 1. So we treat that exit code as a success.
Configuration menu - View commit details
-
Copy full SHA for 5cd4eb6 - Browse repository at this point
Copy the full SHA 5cd4eb6View commit details -
[ci] Generate PR specific logging queries (hail-is#13744)
Fixes hail-is#13556. I haven't tested these changes -- would like to get initial feedback first.
Configuration menu - View commit details
-
Copy full SHA for b598a78 - Browse repository at this point
Copy the full SHA b598a78View commit details
Commits on Oct 12, 2023
-
[query] fix local backend for install-editable (hail-is#13799)
I neglected to include the extra classpath necessary when using a skinny JAR.
Configuration menu - View commit details
-
Copy full SHA for d187d52 - Browse repository at this point
Copy the full SHA d187d52View commit details -
[query] Dont compile code cache function three times (hail-is#13796)
`f` is a thunk so it is currently being evaluated thrice before inserted into the code cache. The `compiledFunction` variable was unused so I think this is what was originally intended.
Configuration menu - View commit details
-
Copy full SHA for f7be69c - Browse repository at this point
Copy the full SHA f7be69cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 87398e1 - Browse repository at this point
Copy the full SHA 87398e1View commit details
Commits on Oct 13, 2023
-
[query] fix ndarray concat with size 0 dims (hail-is#13755)
ndarray concat was broken when the first input has size 0 along the concat axis. For example ``` In [3]: hl.eval(hl.nd.hstack([hl.nd.zeros((2, 0)), hl.nd.array([[1.0, 2.0], [3.0, 4.0]])])) Out[3]: array([[0., 2.], [0., 4.]]) ``` The zeros matrix is 2 by 0, so horizontal concatenation should just return the other matrix. (I once saw the first column filled with random numbers, presumably from a buffer overflow) I did some cleaning up in the concat implementation, but the functional change is to record the index of the first input which is non-empty along the concat axis, and when resetting to the start of the axis, reset to that non-empty index. Other size 0 inputs are correctly handled when incrementing the index, the problem was that the first read happens before an increment.
Configuration menu - View commit details
-
Copy full SHA for 79c1c82 - Browse repository at this point
Copy the full SHA 79c1c82View commit details -
[compiler] rewrite ExtractIntervalFilters to be more robust (hail-is#…
…13355) CHANGELOG: make hail's optimization rewriting filters to interval-filters smarter and more robust Completely rewrites ExtractIntervalFilters. Instead of matching against very specific patterns, and failing completely for things that don't quite match (e.g. an input is let bound, or the fold implementing "locus is contained in a set of intervals" is written slightly differently), this uses a standard abstract interpretation framework, which is almost completely insensitive to the form of the IR, only depending on the semantics. It also correctly handles missing key fields, where the previous implementation often produced an unsound transformation of the IR. Also adds a much more thorough test suite than we had before. At the top level, the analysis takes a boolean typed IR `cond` in an environment where there is a reference to some `key`, and produces a set `intervals`, such that `cond` is equivalent to `cond & intervals.contains(key)` (in other words `cond` implies `intervals.contains(key)`, or `intervals` contains all rows where `cond` is true). This means for instance it is safe to replace `TableFilter(t, cond)` with `TableFilter(TableFilterIntervals(t, intervals), cond)`. Then in a second pass it rewrites `cond` to `cond2`, such that `cond & (intervals.contains(key))` is equivalent to `cond2 & intervals.contains(key)` (in other words `cond` implies `cond2`, and `cond2 & intervals.contains(key)` implies `cond`). This means it is safe to replace the `TableFilter(t, cond)` with `TableFilter(TableFilterIntervals(t, intervals), cond2)`. A common example is when `cond` can be completely captured by the interval filter, i.e. `cond` is equivant to `intervals.contains(key)`, in which case we can take `cond2 = True`, and the `TableFilter` can be optimized away. This all happens in the function ```scala def extractPartitionFilters(ctx: ExecuteContext, cond: IR, ref: Ref, key: IndexedSeq[String]): Option[(IR, IndexedSeq[Interval])] = { if (key.isEmpty) None else { val extract = new ExtractIntervalFilters(ctx, ref.typ.asInstanceOf[TStruct].typeAfterSelectNames(key)) val trueSet = extract.analyze(cond, ref.name) if (trueSet == extract.KeySetLattice.top) None else { val rw = extract.Rewrites(mutable.Set.empty, mutable.Set.empty) extract.analyze(cond, ref.name, Some(rw), trueSet) Some((extract.rewrite(cond, rw), trueSet)) } } } ``` `trueSet` is the set of intervals which contains all rows where `cond` is true. This set is passed back into `analyze` in a second pass, which asks it to rewrite `cond` to something equivalent, under the assumption that all keys are contained in `trueSet`. The abstraction of runtime values tracks two types of information: * Is this value a reference to / copy of one of the key fields of this row? We need to know this to be able to recognize comparisons with key values, which we want to extract to interval filters. * For boolean values (including, ultimately, the filter predicate itself), we track three sets of intervals of the key type: overapproximations of when the bool is true, false, and missing. Overapproximation here means, for example, if the boolean evaluates to true in some row with key `k`, then `k` must be contained in the "true" set of intervals. But it's completely fine if the set of intervals contains keys of rows where the bool is not true. In particular, a boolean about which we know nothing (e.g. it's just some non-key boolean field in the dataset) is represented by an abstract boolean value where all three sets are the set of all keys.
Configuration menu - View commit details
-
Copy full SHA for bd6e397 - Browse repository at this point
Copy the full SHA bd6e397View commit details
Commits on Oct 16, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 9b54271 - Browse repository at this point
Copy the full SHA 9b54271View commit details -
[hailtop] yet another transient error (hail-is#13817)
CHANGELOG: Mitigate new transient error from Google Cloud Storage which manifests as `aiohttp.client_exceptions.ClientOSError: [Errno 1] [SSL: SSLV3_ALERT_BAD_RECORD_MAC] sslv3 alert bad record mac (_ssl.c:2548)`. As of around 1500 ET 2023-10-16, this exception happens whenever we issue a lot of requests to GCS. See [Zulip thread](https://hail.zulipchat.com/#narrow/stream/300487-Hail-Batch-Dev/topic/cluster.20size/near/396777320).
Configuration menu - View commit details
-
Copy full SHA for fcaafc5 - Browse repository at this point
Copy the full SHA fcaafc5View commit details -
[ci] fix Azure CI PR page (hail-is#13824)
The `logging_queries` variable is always *defined* but sometimes `None`.
Configuration menu - View commit details
-
Copy full SHA for 568858d - Browse repository at this point
Copy the full SHA 568858dView commit details -
[query] if worker or driver raise an exception, fail the Batch job (h…
…ail-is#13715) CHANGELOG: Fixes hail-is#13697, a long standing issue with QoB, in which a failing partition job or driver job is not failed in the Batch UI. I am not sure why we did not do this this way in the first place. If a JVMJob raises an exception, Batch will mark the job as failed. Ergo, we should raise an exception when a driver or a worker fails! Here's an example: I used a simple pipeline that write to a bucket to which I have read-only access. You can see an example Batch (where every partition fails): https://batch.hail.is/batches/8046901. [1] ```python3 import hail as hl hl.utils.range_table(3, n_partitions=3).write('gs://neale-bge/foo.ht') ``` NB: I removed the `log.error` in `handleForPython` because that log is never necessary. That function converts a stack of exceptions into a triplet of the short message, the full exception with stack trace, and a Hail error id (if present). That triplet is always passed along to someone else who logs the exception. (FWIW, the error id indicates a Python source location that is associated with the error. On the Python-side, we can look up that error id and provide a better stack trace.) [1] You'll notice the logs are missing. I noticed this as well, it's a new bug. I fixed it in hail-is#13729.
Configuration menu - View commit details
-
Copy full SHA for e32e999 - Browse repository at this point
Copy the full SHA e32e999View commit details
Commits on Oct 17, 2023
-
[query] faster array decoder (hail-is#13787)
Picking up where hail-is#13776 left off. CHANGELOG: improved speed of reading hail format datasets from disk This PR speeds up decoding arrays in two main ways: * instead of calling `arrayType.isElementDefined(array, i)` on every single array element, which expands to ```scala val b = aoff + lengthHeaderBytes + (i >> 3) !((Memory.loadByte(b) & (1 << (i & 7).toInt)) != 0) ``` process elements in groups of 64, and load the corresponding long of missing bits once * once we have a whole long of missing bits, we can be smarter than branching on each bit. After flipping to get `presentBits`, we use the following psuedocode to extract the positions of the set bits, with time proportional to the number of set bits: ``` while (presentBits != 0) { val idx = java.lang.Long.numberOfTrailingZeroes(presentBits) // do something with idx presentBits = presentBits & (presentBits - 1) // unsets the rightmost set bit } ``` To avoid needing to handle the last block of 64 elements differently, this PR changes the layout of `PCanonicalArray` to ensure the missing bits are always padded out to a multiple of 64 bits. They were already padded to a multiple of 32, and I don't expect this change to have much of an effect. But if needed, blocking by 32 elements instead had very similar performance in my benchmarks. I also experimented with unrolling loops. In the non-missing case, this is easy. In the missing case, I tried using `if (presentBits.bitCount >= 8)` to guard an unrolled inner loop. In both cases, unrolling was if anything slower. Dan observed benefit from unrolling, but that was combined with the first optimization above (not loading a bit from memory every element), which I beleive was the real source of improvement.
Configuration menu - View commit details
-
Copy full SHA for ecb7d86 - Browse repository at this point
Copy the full SHA ecb7d86View commit details -
[copier] minor clean up to copier (hail-is#13820)
1. File rate is more interesting for small files. 2. The source_report controls the progress bar. By updating it eagerly while we are listing a directory, the progress bar is more accurate sooner. We currently wait until we get a semaphore for a particular file to update the progress bar.
Configuration menu - View commit details
-
Copy full SHA for f0328b6 - Browse repository at this point
Copy the full SHA f0328b6View commit details -
[fs] delete in batches, accept stdin (hail-is#13821)
Some quality-of-life stuff for `hailtop.aiotools.delete`. Without deleting in batches, I found it impossible to delete very large lists of files because we create too many asyncio tasks.
Configuration menu - View commit details
-
Copy full SHA for c64d880 - Browse repository at this point
Copy the full SHA c64d880View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9817797 - Browse repository at this point
Copy the full SHA 9817797View commit details -
[query] no unnecessary object allocations in RegionMemory.allocate (h…
…ail-is#13794) Consider this: ```scala class Foo { def bar(): (Long, Long) = (3, 4) def destructure(): Unit = { val (x, y) = bar() } def accessors(): Unit = { val zz = bar() val x = zz._1 val y = zz._2 } } ``` ![image](https://github.com/hail-is/hail/assets/106194/532dc7ea-8027-461d-8e12-3217f5451713) These should be exactly equivalent, right? There's no way Scala would compile the match into something horrible. Right? Right? ``` public void destructure(); Code: 0: aload_0 1: invokevirtual #27 // Method bar:()Lscala/Tuple2; 4: astore_3 5: aload_3 6: ifnull 35 9: aload_3 10: invokevirtual #33 // Method scala/Tuple2._1$mcJ$sp:()J 13: lstore 4 15: aload_3 16: invokevirtual #36 // Method scala/Tuple2._2$mcJ$sp:()J 19: lstore 6 21: new #13 // class scala/Tuple2$mcJJ$sp 24: dup 25: lload 4 27: lload 6 29: invokespecial #21 // Method scala/Tuple2$mcJJ$sp."<init>":(JJ)V 32: goto 47 35: goto 38 38: new #38 // class scala/MatchError 41: dup 42: aload_3 43: invokespecial #41 // Method scala/MatchError."<init>":(Ljava/lang/Object;)V 46: athrow 47: astore_2 48: aload_2 49: invokevirtual #33 // Method scala/Tuple2._1$mcJ$sp:()J 52: lstore 8 54: aload_2 55: invokevirtual #36 // Method scala/Tuple2._2$mcJ$sp:()J 58: lstore 10 60: return public void accessors(); Code: 0: aload_0 1: invokevirtual #27 // Method bar:()Lscala/Tuple2; 4: astore_1 5: aload_1 6: invokevirtual #33 // Method scala/Tuple2._1$mcJ$sp:()J 9: lstore_2 10: aload_1 11: invokevirtual #36 // Method scala/Tuple2._2$mcJ$sp:()J 14: lstore 4 16: return ``` Yeah, so, it extracts the first and second elements of the primitive-specialized tuple, ~~constructs a `(java.lang.Long, java.lang.Long)` Tuple~~ constructs another primitive-specialized tuple (for no reason???), then does the match on that. sigh.
Configuration menu - View commit details
-
Copy full SHA for 30ec661 - Browse repository at this point
Copy the full SHA 30ec661View commit details -
[ci] Force the installation of node and npm pyright in the linting im…
…age (hail-is#13838) Fixes hail-is#13722
Configuration menu - View commit details
-
Copy full SHA for dc33c6d - Browse repository at this point
Copy the full SHA dc33c6dView commit details -
[batch] Setup database infrastructure for job groups (hail-is#13475)
The conceptual change here is we want to parameterize all batch related tables to have a new job group ID that I've set to **0** for the root job group. We need to make sure all future inserts / updates into the batches table are propagated to the new job groups table. When we create a batch now, we also create the corresponding entries into the job groups and job group parents tables. I chose the root job group to be 0 as I think conceptually, the client should start numbering job groups at 1 and not know there is a hidden root job group being created under the hood. I'm not wedded to this. I tried to check for all the indices that would be needed in my prototype. It's possible I missed one or two, but it's not a big deal to add it later. I don't think we need to test this on a populated database (dev deploy main, submit jobs, then run the migration), but let me know if you think that would be helpful.
Configuration menu - View commit details
-
Copy full SHA for efc83e7 - Browse repository at this point
Copy the full SHA efc83e7View commit details
Commits on Oct 18, 2023
-
[compiler] Don't append StmtX after ControlX (hail-is#13752)
This change grew out of hail-is#13674. The idea is simple - we shouldn't be appending code after control statements as such statements are redundant. That idea opened pandora's box, but now we're not generating and dropping dead code anymore. Main changes that rose form fixing fallout from adding assert in `Block.append`: - Implement basic control-flow structures (if, while, for, switch) in `CodeBuilderLike` and remove the older implementations from `Code`. - main difference is these are built from sequencing `Code` operations rather than being defined from LIR - allows for a higher-level implementation that I think is simpler to read. - Use the type-system to prevent foot-guns like `cb.ifx(cond, label.goto)`. Other changes: - rename `ifx`, `forLoop` and `whileLoop` to just `if_`, `for_` and `while_`, respectively. - Implement loops in-terms of one-another to remove code duplication. - Fix logic for when to write IRs as some default value behaviour was broken when `HAIL_WRITE_IR_FILES` was set in tests
Configuration menu - View commit details
-
Copy full SHA for 3189854 - Browse repository at this point
Copy the full SHA 3189854View commit details -
Verify
logistic_regression_rows
y
param is column-indexed (hail-i……s#13849) Fixes hail-is#13788: - Add `raise_unless_column_indexed` guard and apply to all column-indexed parameters in `statgen.py`. - Rename `check_row_indexed` and `check_entry_indexed` as I'm allergic to functions called "check" - now it's clearer what they do.
Configuration menu - View commit details
-
Copy full SHA for 7eaf390 - Browse repository at this point
Copy the full SHA 7eaf390View commit details
Commits on Oct 19, 2023
-
[fs] rate limit our use of parallelism in copy (hail-is#13833)
This is the result of some experimentation. With ten-way parallelism, the copier very rarely gets rate-limited. With 75-way parallelism (the default), we almost always experience a tens of transient errors. If we start at ten and back off as in this PR, I can get to 75 with just a handful of transient errors. cc: @jigold
Configuration menu - View commit details
-
Copy full SHA for d721005 - Browse repository at this point
Copy the full SHA d721005View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4132cd1 - Browse repository at this point
Copy the full SHA 4132cd1View commit details -
[fs] raise exceptions from failed tasks (hail-is#13831)
Similar to hail-is#13818. We *must* retrieve exceptions from any task that is `done` otherwise we'll get a warning when the task is freed.
Configuration menu - View commit details
-
Copy full SHA for 8ff5933 - Browse repository at this point
Copy the full SHA 8ff5933View commit details -
[batch] no logs on container deleted (hail-is#13857)
Containers get deleted when a job is cancelled. This is not exceptional behavior. Example: https://cloudlogging.app.goo.gl/punCSPauoM1ZEqZ27
Configuration menu - View commit details
-
Copy full SHA for e107ebe - Browse repository at this point
Copy the full SHA e107ebeView commit details -
Configuration menu - View commit details
-
Copy full SHA for 35994fb - Browse repository at this point
Copy the full SHA 35994fbView commit details
Commits on Oct 20, 2023
-
Configuration menu - View commit details
-
Copy full SHA for dc3aa8a - Browse repository at this point
Copy the full SHA dc3aa8aView commit details -
Configuration menu - View commit details
-
Copy full SHA for cc844ae - Browse repository at this point
Copy the full SHA cc844aeView commit details -
[aiogoogle] finally squash an open http session (hail-is#13867)
🤦 I grepped for other hasattr that do not check `self`
Configuration menu - View commit details
-
Copy full SHA for d5bb0f6 - Browse repository at this point
Copy the full SHA d5bb0f6View commit details -
[gear] properly close the aiomysql pools (hail-is#13869)
We should never have been using `await`. (aiomysql should probably not implement `__await__`). `create_pool` returns `aiomysql.utils._PoolContextManager` which inherits from `aiomysql.utils._ContextManager` which implements `__await__`, `__aenter__`, and `__aexit__` thusly: ```python3 def __await__(self): return self._coro.__await__() async def __aenter__(self): self._obj = await self._coro return self._obj async def __aexit__(self, exc_type, exc, tb): await self._obj.close() self._obj = None ``` `__await__` is a footgun! You should never do that! You should close the return value of the coroutine!
Configuration menu - View commit details
-
Copy full SHA for 85f9e57 - Browse repository at this point
Copy the full SHA 85f9e57View commit details -
[hailtop] types, new exceptions, better task exception handling (hail…
…-is#13818) Besides the types and some transient exceptions, I think I fixed our task exception handling in several spots. Two things: 1. We do not need to wait on a cancelled task. If it was not done, then it could not possibly have an exception to retrieve. Moreover, now that it is cancelled, there is nothing else to do. Cancellation is immediate. 2. If a task is done, we *must* always retrieve the exception, otherwise we might not see an exception.
Configuration menu - View commit details
-
Copy full SHA for 5029a31 - Browse repository at this point
Copy the full SHA 5029a31View commit details -
[batch] wait long enough for aiohttp to clean itself up (hail-is#13868)
We already do this in httpx.ClientSession. I just copied from there.
Configuration menu - View commit details
-
Copy full SHA for 209404e - Browse repository at this point
Copy the full SHA 209404eView commit details -
[query] Avoid py4j for python-backend interactions (hail-is#13797)
CHANGELOG: Fixes hail-is#13756: operations that collect large results such as `to_pandas` may require up to 3x less memory. This turns all "actions", i.e. backend methods supported by QoB into HTTP endpoints on the spark and local backends. This intentionally avoids py4j because py4j was really designed to pass function names and references around and does not handle large payloads well (such as results from a `collect`). Specifically, py4j uses a text-based protocol on top of TCP that substantially inflates the memory requirement for communicating large byte arrays. On the Java side, py4j serializes every binary payload as a Base64-encoded `java.lang.String`, which between the Base64 encoding and `String`'s use of UTF-16 results in a memory footprint of the `String` being `4/3 * 2 = 8/3` nearly three times the size of the byte array on either side of the py4j pipe. py4j also appears to do an entire copy of this payload, which means nearly a 6x memory requirement for sending back bytes. Using our own socket means we can directly send back the response bytes to python without any of this overhead, even going so far as to encode results directly into the TCP output stream. Formalizing the API between python and java also allows us to reuse the same payload schema across all three backends.
Configuration menu - View commit details
-
Copy full SHA for c73386f - Browse repository at this point
Copy the full SHA c73386fView commit details -
[query] improve field names and output of _same (hail-is#13825)
Rori encountered some confusing output. _same is private, but this is good both for GVS/AoU and also for us. Things are indented properly, it uses as much terminal width as is available, the names are slightly less confusing, and we see both globals & row failures if both fail. ```python3 Table._same: rows differ: Row mismatch at key=Struct(locus=Locus(contig=1, position=1, reference_genome=GRCh37), alleles=['A', 'C']): Left: [Struct(ancestral_af=0.381520365258488, af=[0.6482459117152142], __uid_entries_85=[Struct(GT=Call(alleles=[1, 1], phased=False)), Struct(GT=Call(alleles=[1, 1], phased=False))])] Right: [Struct(ancestral_af=0.381520365258488, af=[0.2510276144176496], __uid_entries_85=[Struct(GT=Call(alleles=[0, 0], phased=False)), Struct(GT=Call(alleles=[0, 1], phased=False))])] Row mismatch at key=Struct(locus=Locus(contig=1, position=2, reference_genome=GRCh37), alleles=['A', 'C']): Left: [Struct(ancestral_af=0.7058845354840656, af=[0.5224710728099119], __uid_entries_85=[Struct(GT=Call(alleles=[0, 0], phased=False)), Struct(GT=Call(alleles=[0, 0], phased=False))])] Right: [Struct(ancestral_af=0.7058845354840656, af=[0.5042641171983404], __uid_entries_85=[Struct(GT=Call(alleles=[1, 1], phased=False)), Struct(GT=Call(alleles=[1, 1], phased=False))])] ``` versus ```python3 Table._same: rows differ: Row mismatch at key=Struct(_key=Struct(locus=Locus(contig=1, position=1, reference_genome=GRCh37), alleles=['A', 'C'])): Left: [Struct(ancestral_af=0.381520365258488, af=[0.08835032612615329], __uid_39=[Struct(GT=Call(alleles=[0, 0], phased=False)), Struct(GT=Call(alleles=[0, 0], phased=False))])] Right: [Struct(ancestral_af=0.381520365258488, af=[0.6631710694002383], __uid_39=[Struct(GT=Call(alleles=[0, 1], phased=False)), Struct(GT=Call(alleles=[1, 1], phased=False))])] Row mismatch at key=Struct(_key=Struct(locus=Locus(contig=1, position=2, reference_genome=GRCh37), alleles=['A', 'C'])): Left: [Struct(ancestral_af=0.7058845354840656, af=[0.7020078954798737], __uid_39=[Struct(GT=Call(alleles=[0, 1], phased=False)), Struct(GT=Call(alleles=[0, 1], phased=False))])] Right: [Struct(ancestral_af=0.7058845354840656, af=[0.6393831798848757], __uid_39=[Struct(GT=Call(alleles=[1, 1], phased=False)), Struct(GT=Call(alleles=[0, 1], phased=False))])] ```
Configuration menu - View commit details
-
Copy full SHA for 3a47bbd - Browse repository at this point
Copy the full SHA 3a47bbdView commit details -
[qob] minor fixes to service backend (hail-is#13872)
1. If a job errors rather than fails, we still want to see its logs in the debug info. 2. The backend from before `hl_stop_for_test` is broken. In particular, it does not have an open ClientSession, so it cannot make HTTP requests.
Configuration menu - View commit details
-
Copy full SHA for 494197d - Browse repository at this point
Copy the full SHA 494197dView commit details -
[batch] make cpu and mem resource readers similar and avoid race (hai…
…l-is#13879) Fixes hail-is#13861. CPU monitor races with container deletion just like RAM monitor. I also switched to catching FileNotFoundError instead of exists since technically the file could disappear between us checking `exists` and us `open`ing it.
Configuration menu - View commit details
-
Copy full SHA for aecbbff - Browse repository at this point
Copy the full SHA aecbbffView commit details
Commits on Oct 23, 2023
-
[ci] Fix namespace used for logging queries on PR page (hail-is#13813)
DEFAULT_NAMESPACE will always point to `default` for the production CI deployment.
Configuration menu - View commit details
-
Copy full SHA for affd4cb - Browse repository at this point
Copy the full SHA affd4cbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5d7dc2c - Browse repository at this point
Copy the full SHA 5d7dc2cView commit details -
[devbin] minor fixes (hail-is#13877)
A missing import, a deprecated asyncio API, and print keys *with a namespace* (rather than just their name).
Configuration menu - View commit details
-
Copy full SHA for 8490f67 - Browse repository at this point
Copy the full SHA 8490f67View commit details -
Configuration menu - View commit details
-
Copy full SHA for 86f2140 - Browse repository at this point
Copy the full SHA 86f2140View commit details -
[website] add docs/latest/ (which currently redirects to 0.2) (hail-i…
…s#13855) I also fixed local/dev docs building. You need Hail installed for the docs build to work because it tries to `import` the classes for which you're building docs.
Configuration menu - View commit details
-
Copy full SHA for 66c9b01 - Browse repository at this point
Copy the full SHA 66c9b01View commit details
Commits on Oct 24, 2023
-
[ci] Bump number of cores to 16 for db deployment jobs (hail-is#13822)
I wanted to be safe rather than sorry for hail-is#13487 to make sure we have enough memory to fit all of the 100 row chunk endpoints in memory. We can switch it back to 4 cores after that PR is merged.
Configuration menu - View commit details
-
Copy full SHA for 914e6d0 - Browse repository at this point
Copy the full SHA 914e6d0View commit details -
[query] Teach hl.default_reference to set the default reference genome (
hail-is#13888) CHANGELOG: hl.default_reference can now be passed an argument to change the default reference genome. Now, when hl.default_reference is passed a reference genome argument, it sets the default reference to that reference genome and returns None.
Configuration menu - View commit details
-
Copy full SHA for 30b461a - Browse repository at this point
Copy the full SHA 30b461aView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6cc1a8d - Browse repository at this point
Copy the full SHA 6cc1a8dView commit details
Commits on Oct 25, 2023
-
[hailctl] fix hailctl describe requester pays argument handling (hail…
…-is#13826) Fix the overriding of the gcs_requester_pays/project config variable through using 'hailctl describe -u'. Closes hail-is#13793
Configuration menu - View commit details
-
Copy full SHA for ad852a0 - Browse repository at this point
Copy the full SHA ad852a0View commit details -
[ci] Set database creation jobs back to 4 cores (hail-is#13898)
The largest migration took 6.2 GB of memory. I think we'll be fine for awhile with 15 GB of memory (4 cores).
Configuration menu - View commit details
-
Copy full SHA for 817ea60 - Browse repository at this point
Copy the full SHA 817ea60View commit details -
[make] Separate image building from image pushing (hail-is#13890)
This came to mind yesterday during our pairing. This PR introduces the following properties that our image building targets do not currently have: 1. If your intention is only to build images, you shouldn't need `kubectl`. When `DOCKER_PREFIX` is used as a docker build arg it is because we mirror some dockerhub images inside our registry (for reliability/rate limiting reasons). But for local building there's no reason you can't use the dockerhub image. Also, other people should be able to build the hail image if they want to! 2. One should *only* need to use `kubectl` if they are intending to use an image in a kubernetes deployment. In other words, you should only need the private registry `DOCKER_PREFIX` for pushing images. 3. One should not need to endure image pushing if the only goal is to build the image locally 4. No intermediate tags should end up in the private registry. If we push on every image build, the private docker registry will accumulate a lot of `hail-ubuntu:dev-xxxxxx` tags that are never used again because `hail-ubuntu` is just an intermediate used to build other images. This does *not* change the number of layers that end up in the registry, but reduces a bit of the work that the registry cleanup job needs to do to untag and delete images and just seems cleaner.
Configuration menu - View commit details
-
Copy full SHA for c7ea3dd - Browse repository at this point
Copy the full SHA c7ea3ddView commit details -
[batch] Update gcsfuse version to 1.2.0 (hail-is#13728)
Let's see if this "just works".
Configuration menu - View commit details
-
Copy full SHA for e739a95 - Browse repository at this point
Copy the full SHA e739a95View commit details -
Configuration menu - View commit details
-
Copy full SHA for 2f69f8a - Browse repository at this point
Copy the full SHA 2f69f8aView commit details -
[benchmark] Benchmarks on Ubuntu 22.04 (hail-is#13880)
benchmark make targets are now made directly through the make targets in `benchmark/Makefile`. Previous targets in `hail/Makefile` have been deleted. To submit benchmarks in batch from the root directory, use ``` $ BENCHMARK_LABEL=label NAMESPACE=default BENCHMARK_PROJECT=your-google-project make -C benchmark submit ``` The benchmark docker image is now based of `hail-ubuntu` with a minimal set of requirements installed on top. The hail wheel is added after the benchmark wheel in attempt to optimise build times for changes to hail code.
Configuration menu - View commit details
-
Copy full SHA for b90cdd1 - Browse repository at this point
Copy the full SHA b90cdd1View commit details -
[query] Use EncodedLiteral instead of Literal from python to scala (h…
…ail-is#13814) CHANGELOG: Pipelines that are memory-bound by copious use of `hl.literal`, such as `vds.filter_intervals`, require substantially less memory. Closes hail-is#13757
Configuration menu - View commit details
-
Copy full SHA for eabff3d - Browse repository at this point
Copy the full SHA eabff3dView commit details -
[gradle] exclude scala-reflect (hail-is#13894)
CHANGELOG: Fixes hail-is#13837 in which Hail could break a Spark installation if the Hail JAR appears on the classpath before the Scala JARs. We and several dependencies of ours are exposing an old version of scala-reflect (rude of us and them). If our JAR appears on the classpath before the scala JARs and the sub-versions of Scala differ, Spark is likely to have incompatible bytecode. See hail-is#13837 . In this PR, this command, ``` make shadowJar && \ jar tf build/libs/hail-all-spark.jar | grep scala/reflect | wc -l && \ jar tf build/libs/hail-all-spark.jar | grep 'MutableSettings' ``` prints: ``` 0 ``` On main (`5d7dc2cab7`) it prints: ``` 1417 scala/reflect/internal/settings/MutableSettings$.class scala/reflect/internal/settings/MutableSettings$SettingValue.class scala/reflect/internal/settings/MutableSettings.class ``` --- A bit more details follow for the curious. `./gradlew dependencies` shows these packages as depending on `scala-reflect`: ``` +--- org.scalanlp:breeze-natives_2.12:1.1 | +--- org.scala-lang:scala-library:2.12.10 -> 2.12.17 | +--- org.scalanlp:breeze_2.12:1.1 -> 1.2 | | +--- org.scala-lang:scala-library:2.12.10 -> 2.12.17 | | +--- org.scalanlp:breeze-macros_2.12:1.2 | | | +--- org.scala-lang:scala-library:2.12.10 -> 2.12.17 | | | \--- org.scala-lang:scala-reflect:2.12.10 -> 2.12.15 ... +--- org.elasticsearch:elasticsearch-spark-30_2.12:8.4.3 | +--- org.scala-lang:scala-library:2.12.8 -> 2.12.17 | +--- org.scala-lang:scala-reflect:2.12.8 -> 2.12.15 (*) | \--- org.apache.spark:spark-core_2.12:3.2.1 -> 3.3.0 ... | +--- org.scala-lang:scala-reflect:2.12.15 (*) ... +--- org.scala-lang:scala-reflect:2.12.15 (*) ... +--- org.apache.spark:spark-sql_2.12:3.3.0 ... | +--- org.apache.spark:spark-catalyst_2.12:3.3.0 | | +--- org.scala-lang:scala-reflect:2.12.15 (*) ``` This root (the fourth to last package above): ``` +--- org.scala-lang:scala-reflect:2.12.15 (*) ``` is coming from this line in build.gradle: ```gradle shadow 'org.scala-lang:scala-reflect:' + scalaVersion ``` I think `shadow` means "needed to compile but do not include in shadow JAR", so I do not know if we can trust `./gradlew dependencies` to tell us why any particular class file is in our shadow JAR.
Configuration menu - View commit details
-
Copy full SHA for 0402aad - Browse repository at this point
Copy the full SHA 0402aadView commit details -
[query] ensure nest_asyncio is applied, but only when necessary (hail…
…-is#13899) CHANGELOG: Fix `RuntimeError: This event loop is already running` error when running hail in a Jupyter Notebook. Man this is really complicated. OK, so, things I learned: 1. [asyncio will not create a new event loop if `set_event_loop` has been called even if `set_event_loop(None)` has since been called.](https://github.com/python/cpython/blob/main/Lib/asyncio/events.py#L676) 2. [asyncio will not create a new event loop in a thread other than the main thread.](https://github.com/python/cpython/blob/main/Lib/asyncio/events.py#L677) 3. `aiohttp.ClientSession` stashes a copy of the event loop present when it starts. This can cause all manner of extremely confusing behavior if you later change the event loop or use that session from a different thread. The fix, in the end, wasn't that complicated. Anywhere Hail explicitly asks for an event loop (so that we can run async code), we apply nest asyncio if the event loop is already running. Otherwise we do nothing. Nest asyncio appears to [no longer require](https://github.com/erdewit/nest_asyncio/tree/master#usage) `apply` to be called before the event loop starts running. This PR *does not* address: 1. Hail nesting async code in sync code in async code. I think we should avoid this, but the `hailtop.fs` and `hailtop.batch` APIs, among others, need async versions before we can do that. 2. This `aiohttp.ClientSession` nonsense. We really should take pains to ensure we create one `ClientSession` per loop and we never mix loops.
Configuration menu - View commit details
-
Copy full SHA for a53f4ff - Browse repository at this point
Copy the full SHA a53f4ffView commit details -
[release] 0.2.125 (hail-is#13905)
Closes hail-is#13806 --------- Co-authored-by: jigold <jigold@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for c4e2880 - Browse repository at this point
Copy the full SHA c4e2880View commit details
Commits on Oct 26, 2023
-
Make
SemanticHash
Resilient toFileNotFoundExceptions
(hail-is#13919) Fixes hail-is#13915 `MatrixVCFReader` accepts glob patterns (wildcards in glob names). This bamboozled `SemanticHash` which had assumed all files had been resolved. This change fixes this by adding explicit `FileNotFoundException` handling to `SemanticHash` and replacing the `params.files` object of `MatrixVCFReader` with the resolved paths.
Configuration menu - View commit details
-
Copy full SHA for cfb7323 - Browse repository at this point
Copy the full SHA cfb7323View commit details -
[query] avoid hanging the JVM in Dataproc (hail-is#13916)
Non-daemon threads [keep a JVM alive](https://docs.oracle.com/javase/8/docs/api/java/lang/Thread.html): > When a Java Virtual Machine starts up, there is usually a single non-daemon thread (which typically calls the method named main of some designated class). The Java Virtual Machine continues to execute threads until either of the following occurs: > > The exit method of class Runtime has been called and the security manager has permitted the exit operation to take place. > > All threads that are not daemon threads have died, either by returning from the call to the run method or by throwing an exception that propagates beyond the run method. Spark appears to wait for the JVM to terminate before it considers a job complete.
Configuration menu - View commit details
-
Copy full SHA for 87a6057 - Browse repository at this point
Copy the full SHA 87a6057View commit details -
[makefile] use the pushed-private image name not the local image name (…
…hail-is#13926) Didn't verify but this seems closer to right.
Configuration menu - View commit details
-
Copy full SHA for 0fe50ea - Browse repository at this point
Copy the full SHA 0fe50eaView commit details -
Configuration menu - View commit details
-
Copy full SHA for 6e6f467 - Browse repository at this point
Copy the full SHA 6e6f467View commit details
Commits on Oct 27, 2023
-
[ci] Use private base image for GPU testing image (hail-is#13750)
Trying to rely on our private registry instead of dockerhub where we can.
Configuration menu - View commit details
-
Copy full SHA for b87500b - Browse repository at this point
Copy the full SHA b87500bView commit details -
[compiler] fix ndarray broadcasting with 0 sizes (hail-is#13751)
Really bad example: ```python In [2]: a = hl.nd.array(2) In [3]: empty = np.array([], np.int32).reshape((0, 2, 2)) In [4]: empty = hl.nd.array(empty) In [5]: hl.eval(empty) Out[5]: array([], shape=(0, 2, 2), dtype=int32) In [6]: hl.eval(a + empty) Out[6]: array([[[ 2, 55622722], [ 2, 3]]], dtype=int32) ``` Result of sum of shape `()` with shape `(0, 2, 2)` should be shape `(0, 2, 2)` by standard numpy broadcasting rules. Instead, we try to produce a result with shape `(1, 2, 2)`, which contains garbage data.
Configuration menu - View commit details
-
Copy full SHA for 75ee439 - Browse repository at this point
Copy the full SHA 75ee439View commit details -
[fs] improve copy tool progress bars (hail-is#13832)
I renamed RichProgressBar and SimpleRichProgressBar to ...CopyToolProgressBar because that is more accurate. I enhanced both to now include a count and a rate with the right units based on the description. It is a bit flaky because I need the descriptions to be exactly "files" or exactly "bytes" to pick the right units, but this seems fine for the specific case of th CopyToolProgressBar. There is probably a better way to build these UIs. I am sure we will start to figure that out as we use rich more. Before: <img width="830" alt="Screenshot 2023-10-16 at 18 28 14" src="https://github.com/hail-is/hail/assets/106194/95f8828e-beb3-46d2-9403-18ff7aa60256"> After: <img width="830" alt="Screenshot 2023-10-16 at 18 27 53" src="https://github.com/hail-is/hail/assets/106194/01186b7c-d59f-4a0e-a1f6-9279fb50ae7e">
Configuration menu - View commit details
-
Copy full SHA for 823b28d - Browse repository at this point
Copy the full SHA 823b28dView commit details -
[batch] Only update log files if they exist (hail-is#13911)
If a container is deleted before it ever runs, the log files won't exist. Fixes hail-is#13906 Fixes hail-is#13907
Configuration menu - View commit details
-
Copy full SHA for 73b91ed - Browse repository at this point
Copy the full SHA 73b91edView commit details -
[ci] Dont tar the hail wheel in CI pipelines (hail-is#13912)
IIUC the reason we `tar` the hail wheel to move it between jobs in the CI pipeline is because the wheel name must contain the pip version and that is not known statically in `build.yaml`. However, it would be just as effective to copy the wheel around inside a directory, and then we don't have to do all this tar'ing and untar'ing. cc @ehigham I'm happy to hold off on this if it would bork your branch
Configuration menu - View commit details
-
Copy full SHA for 1c53068 - Browse repository at this point
Copy the full SHA 1c53068View commit details -
[query] Delete unused experimental functions in compile.py (hail-is#1…
…3918) `compiled_compare` and `compile_comparison_binary` are unused and broken as they depend on backend functionality that no longer exists. This deletes them and other functions that they depended on that are now themselves unused.
Configuration menu - View commit details
-
Copy full SHA for d3936da - Browse repository at this point
Copy the full SHA d3936daView commit details -
[devbin] Local UI development proxy server (hail-is#13632)
This adds a script that runs an `aiohttp` server on localhost that by default proxies all requests to whatever batch namespace `hailctl` is pointed to but serves static assets and renders all HTML locally. This allows for local editing of HTML / CSS / SCSS files. To run, this branch needs to be deployed in the relevant namespace (to pick up the changes to `web_common.py`), and you need to have the `sass` CLI installed (`brew install sass/sass/sass`). Then run in a separate terminal `make devserver SERVICE=batch` and open up `localhost:8000`. If you want to supply fake data instead of proxying an endpoint to the k8s deployment of the service, you can override that endpoint in `dev_proxy.py`. If this looks good, I'll add the above instructions to the dev docs. Resolves hail-is#13629
Configuration menu - View commit details
-
Copy full SHA for 8ec5361 - Browse repository at this point
Copy the full SHA 8ec5361View commit details -
[ci] Rename the deploy step to release (hail-is#13931)
"deploy" is a really overloaded term when it comes to CI and release much more accurately reflects that this step does.
Configuration menu - View commit details
-
Copy full SHA for 06f7483 - Browse repository at this point
Copy the full SHA 06f7483View commit details -
[make] Take the rest of docker-build.sh arguments as docker build arg…
…uments (hail-is#13910) hail-is#13890 removed an optional argument to `docker-build.sh`. Now that there are no optional arguments I think it is cleaner to claim any additional arguments as docker args instead of stuffing docker args into an environment variable.
Configuration menu - View commit details
-
Copy full SHA for 010ff09 - Browse repository at this point
Copy the full SHA 010ff09View commit details -
[batch] Switch to using v3 tables for batches and jobs billing (hail-…
…is#13891) This PR switches all usages of v2 billing tables on the batch and batch-driver server code to use the v3 tables instead. Stacked on hail-is#12849.
Configuration menu - View commit details
-
Copy full SHA for ebf4260 - Browse repository at this point
Copy the full SHA ebf4260View commit details -
Configuration menu - View commit details
-
Copy full SHA for aade5c6 - Browse repository at this point
Copy the full SHA aade5c6View commit details
Commits on Oct 30, 2023
-
[query] fix bug in new dict decoder (hail-is#13939)
The decoder uses a `StagedArrayBuilder` to hold elements while being sorted. The array builder is stored in a class field. When the same decoder function is called more than once, that array builder is reused. Before this fix, the array builder was never cleared, so if the decoder function was called more than once, the array builder would still contain the elements from previously decoded dicts. Since it's highly non-obvious that you need to call `clear` immediately after `new StagedCodeBuilder`, this PR makes the constructor take a CodeBuilder, and always inserts a clear at the call site. I also took the opportunity to CodeBuilderify the rest of the interface.
Configuration menu - View commit details
-
Copy full SHA for 28e56f7 - Browse repository at this point
Copy the full SHA 28e56f7View commit details -
[query] Update ReferenceGenome docs (hail-is#13924)
Add a section on managing the default reference using the `default_reference` method.
Configuration menu - View commit details
-
Copy full SHA for 5bc9a25 - Browse repository at this point
Copy the full SHA 5bc9a25View commit details -
[pip] Release 0.2.126 (hail-is#13942)
Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
Configuration menu - View commit details
-
Copy full SHA for ee77707 - Browse repository at this point
Copy the full SHA ee77707View commit details -
[query] refactor approx_cdf to support manual combining (hail-is#13935)
This PR enables users to combine results from multiple independent approx_cdf aggregators, for instance to allow updating quantile summaries when adding new samples. In more detail: * Change the result type of the internal aggregator to be a lossless representation of the internal state. * Add a registered function which expose the CombOp of the aggregator. * Move the function which computes the old result type from the internal state to python. * Add a flag `_raw` to `approx_cdf` which produces the internal result type, which supports combining; otherwise convert to the old result type.
Configuration menu - View commit details
-
Copy full SHA for 63c74fd - Browse repository at this point
Copy the full SHA 63c74fdView commit details
Commits on Oct 31, 2023
-
Configuration menu - View commit details
-
Copy full SHA for 5d42835 - Browse repository at this point
Copy the full SHA 5d42835View commit details -
[query] Add semhash support for Table-to-Table Aggregations (hail-is#…
…13922) Namely, TableKeyByAndAggregate and TableAggregateByKey
Configuration menu - View commit details
-
Copy full SHA for a950ede - Browse repository at this point
Copy the full SHA a950edeView commit details -
[ci] Delete old CreateDatabaseStep (hail-is#13947)
Fixes hail-is#13895. This is unused now.
Configuration menu - View commit details
-
Copy full SHA for 22bcec2 - Browse repository at this point
Copy the full SHA 22bcec2View commit details -
[batch] Turn off autocomplete on billing projects page (hail-is#13946)
Fixes hail-is#13860 --------- Co-authored-by: iris <84595986+iris-garden@users.noreply.github.com>
Configuration menu - View commit details
-
Copy full SHA for 701bce0 - Browse repository at this point
Copy the full SHA 701bce0View commit details -
[batch] Check whether user exists before adding to billing project (h…
…ail-is#13945) Fixes hail-is#13858. <img width="589" alt="Screenshot 2023-10-30 at 12 23 06 PM" src="https://github.com/hail-is/hail/assets/1693348/5ad26813-5534-488c-8029-f2607ba72033">
Configuration menu - View commit details
-
Copy full SHA for 40a3467 - Browse repository at this point
Copy the full SHA 40a3467View commit details -
Configuration menu - View commit details
-
Copy full SHA for 662dc75 - Browse repository at this point
Copy the full SHA 662dc75View commit details
Commits on Nov 1, 2023
-
[hailtop] Remove service_ns method in DeployConfig (hail-is#13959)
The `DeployConfig.service_ns` doesn't really do anything, we always use the `_default_namespace`. This is maybe from an earlier age where some services might live in different namespaces.
Configuration menu - View commit details
-
Copy full SHA for 3f0b115 - Browse repository at this point
Copy the full SHA 3f0b115View commit details
Commits on Nov 2, 2023
-
[hailtop] Remove unused blocking httpx classes (hail-is#13965)
These are not used as far as I can tell.
Configuration menu - View commit details
-
Copy full SHA for 9badedb - Browse repository at this point
Copy the full SHA 9badedbView commit details -
[benchmark] fix combiner benchmarks (hail-is#13956)
The combiner benchmarks broke following the deletion of the `experimental.vcf_combiner` python package. Re-implement them in terms of the `vds` package.
Configuration menu - View commit details
-
Copy full SHA for 5f4508f - Browse repository at this point
Copy the full SHA 5f4508fView commit details -
[compiler] Add
Switch
toIR
(hail-is#13963)We have no high-level IR analogue to `CodeBuilderLike.switch`. Such a node is useful for flattening the IR in deeply-nested `If` nodes, predicated on integer equality. This partially addresses the stack-overflow error on the `matrix_muluti_write_nothing` benchmark, which currently has a stack-overflow error when computing the type of the CDA.
Configuration menu - View commit details
-
Copy full SHA for 29c2966 - Browse repository at this point
Copy the full SHA 29c2966View commit details -
[ci] Remove dead code from CI configs (hail-is#13967)
- `CreateNamespaceStep.public` was entirely unused - `adminServiceAccount` is not used in `build.yaml` so `CreateNamespaceStep.admin_service_account` is always `None` meaning it has no effect. - The three environment variables that I deleted from the `deployment.yaml` are as far as I can tell entirely unused (they are now grabbed from the global config)
Configuration menu - View commit details
-
Copy full SHA for b4a3d57 - Browse repository at this point
Copy the full SHA b4a3d57View commit details -
[batch] Dont box data_disk_space_remaining (hail-is#13968)
Very small change, something I noticed while working on something else entirely. Given how this is currently used I don't think it needs to be boxed anymore.
Configuration menu - View commit details
-
Copy full SHA for 2c1188c - Browse repository at this point
Copy the full SHA 2c1188cView commit details -
[batch] Make GCP disk attachment idempotent (hail-is#13955)
This is a fix for an error Ben found. ``` Traceback (most recent call last): File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 1907, in run await self.setup_io() File "/usr/local/lib/python3.9/dist-packages/batch/worker/worker.py", line 1848, in setup_io await self.disk.create(labels=labels) File "/usr/local/lib/python3.9/dist-packages/batch/cloud/gcp/worker/disk.py", line 47, in create await self._attach() File "/usr/local/lib/python3.9/dist-packages/batch/cloud/gcp/worker/disk.py", line 112, in _attach self.last_response = await self.compute_client.attach_disk( File "/usr/local/lib/python3.9/dist-packages/hailtop/aiocloud/aiogoogle/client/compute_client.py", line 83, in attach_disk return await self._request_with_zonal_operations_response(self.post, path, params, **kwargs) File "/usr/local/lib/python3.9/dist-packages/hailtop/aiocloud/aiogoogle/client/compute_client.py", line 126, in _request_with_zonal_operations_response return await retry_transient_errors(request_and_wait) File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 763, in retry_transient_errors return await retry_transient_errors_with_debug_string('', 0, f, *args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/hailtop/utils/utils.py", line 775, in retry_transient_errors_with_debug_string return await f(*args, **kwargs) File "/usr/local/lib/python3.9/dist-packages/hailtop/aiocloud/aiogoogle/client/compute_client.py", line 116, in request_and_wait raise GCPOperationError(result['httpErrorStatusCode'], hailtop.aiocloud.aiogoogle.client.compute_client.GCPOperationError: GCPOperationError: 400:BAD REQUEST ['RESOURCE_IN_USE_BY_ANOTHER_RESOURCE'] ["The disk resource 'projects/hail-vdc/zones/us-central1-b/disks/batch-disk-82XXXXX' is already being used by 'projects/hail-vdc/zones/us-central1-b/instances/batch-worker-default-standard-yjXXXX'"]; {'kind': 'compute#operation', 'id': 'XXXXX', 'name': 'operation-XXXXX', 'zone': 'https://www.googleapis.com/compute/v1/projects/hail-vdc/zones/us-central1-b', 'clientOperationId': 'XXXX', 'operationType': 'attachDisk', 'targetLink': 'https://www.googleapis.com/compute/v1/projects/hail-vdc/zones/us-central1-b/instances/batch-worker-default-standard-yjupd', 'targetId': 'XXXX', 'status': 'DONE', 'user': 'batch2-agent@hail-vdc.iam.gserviceaccount.com', 'progress': 100, 'insertTime': '2023-10-30T20:38:40.145-07:00', 'startTime': '2023-10-30T20:38:41.871-07:00', 'endTime': '2023-10-30T20:38:42.367-07:00', 'error': {'errors': [{'code': 'RESOURCE_IN_USE_BY_ANOTHER_RESOURCE', 'message': "The disk resource 'projects/hail-vdc/zones/us-central1-b/disks/batch-disk-82XXXXX' is already being used by 'projects/hail-vdc/zones/us-central1-b/instances/batch-worker-default-standard-yjXXXX'"}]}, 'httpErrorStatusCode': 400, 'httpErrorMessage': 'BAD REQUEST', 'selfLink': 'https://www.googleapis.com/compute/v1/projects/hail-vdc/zones/us-central1-b/operations/operation-XXX'} ```
Configuration menu - View commit details
-
Copy full SHA for d231b40 - Browse repository at this point
Copy the full SHA d231b40View commit details -
[batch] Fix async exit stacks (hail-is#13969)
I couldn't find the best issue for this. Should fix hail-is#13908, but I thought there was another issue about reducing noisy grafana alerts which this PR also addresses.
Configuration menu - View commit details
-
Copy full SHA for a5c7a8a - Browse repository at this point
Copy the full SHA a5c7a8aView commit details
Commits on Nov 6, 2023
-
[batch] Install the Ops Agent in GCP on instance startup (hail-is#13949)
Fixes hail-is#13903 <img width="944" alt="Screenshot 2023-10-30 at 3 47 30 PM" src="https://github.com/hail-is/hail/assets/1693348/16e49387-9ded-44a1-8cc3-501aae889cf5">
Configuration menu - View commit details
-
Copy full SHA for 2e536ff - Browse repository at this point
Copy the full SHA 2e536ffView commit details -
[batch] Remove explicit settings in hailtop tests (hail-is#13973)
Another small step in going key-less.
Configuration menu - View commit details
-
Copy full SHA for 806baf9 - Browse repository at this point
Copy the full SHA 806baf9View commit details -
[batch] Populate job groups in database (hail-is#13487)
This PR just populates the records for older batch into the `job_groups` and `job_group_self_and_ancestors` tables. Stacked on hail-is#13475
Configuration menu - View commit details
-
Copy full SHA for 7d536c2 - Browse repository at this point
Copy the full SHA 7d536c2View commit details
Commits on Nov 7, 2023
-
[batch] Fix enum column state in job groups table (hail-is#13982)
I forgot that "open" was a valid batches state when I created the job groups table state column as an enum. This should fix the failed migration from hail-is#13487
Configuration menu - View commit details
-
Copy full SHA for c4aa1c6 - Browse repository at this point
Copy the full SHA c4aa1c6View commit details -
Configuration menu - View commit details
-
Copy full SHA for f73d92e - Browse repository at this point
Copy the full SHA f73d92eView commit details
Commits on Nov 8, 2023
-
[website] add a link to the community feedback survey to website (hai…
…l-is#13986) The CSS for the website is a real mess. I initially tried to clean it up, but that became a time sink. We should eventually do that, but for now I made the minimal edits to get a reasonable looking layout. # Main Page ## Big <img width="2032" alt="Screenshot 2023-11-07 at 12 19 20" src="https://github.com/hail-is/hail/assets/106194/94c5c2d8-6a4d-44a9-888c-61b28d590857"> <img width="2032" alt="Screenshot 2023-11-07 at 12 19 27" src="https://github.com/hail-is/hail/assets/106194/8c35f736-cd56-4d8b-b5d6-3284592ff65a"> <img width="2032" alt="Screenshot 2023-11-07 at 12 19 29" src="https://github.com/hail-is/hail/assets/106194/9b396b45-bae5-469b-9825-b73a5cd8f917"> <img width="2032" alt="Screenshot 2023-11-07 at 12 19 31" src="https://github.com/hail-is/hail/assets/106194/8d27d238-5b3a-4c40-9c32-a7eb691c622b"> ## Phone <img width="2032" alt="Screenshot 2023-11-07 at 12 22 42" src="https://github.com/hail-is/hail/assets/106194/be32332a-cdba-4f6d-b117-d7e8c163d8c8"> <img width="2032" alt="Screenshot 2023-11-07 at 12 22 44" src="https://github.com/hail-is/hail/assets/106194/ebc4f1d5-c728-4b0d-90f9-adbb9de4fd88"> <img width="2032" alt="Screenshot 2023-11-07 at 12 22 47" src="https://github.com/hail-is/hail/assets/106194/9cffe08a-fdfa-4af4-b060-cfd242c0642c"> <img width="2032" alt="Screenshot 2023-11-07 at 12 22 48" src="https://github.com/hail-is/hail/assets/106194/f5e5b09a-8692-4411-ba13-e7055c17be70"> # Docs ## Big <img width="2032" alt="Screenshot 2023-11-07 at 12 24 09" src="https://github.com/hail-is/hail/assets/106194/564a47e5-8036-4e60-a7fc-16e5aeeabd94"> <img width="2032" alt="Screenshot 2023-11-07 at 12 24 19" src="https://github.com/hail-is/hail/assets/106194/0d954da0-8bdb-49e0-aa66-4ac5e0acb1f4"> <img width="2032" alt="Screenshot 2023-11-07 at 12 24 25" src="https://github.com/hail-is/hail/assets/106194/e0466542-90d3-440c-a7a5-b797b88af63c"> <img width="2032" alt="Screenshot 2023-11-07 at 12 24 40" src="https://github.com/hail-is/hail/assets/106194/4d5e5946-b014-484c-b404-3e9bd4389378"> <img width="2032" alt="Screenshot 2023-11-07 at 12 24 49" src="https://github.com/hail-is/hail/assets/106194/5e2e4666-3bac-4560-a831-4e2ea05de0ae"> <img width="2032" alt="Screenshot 2023-11-07 at 12 24 55" src="https://github.com/hail-is/hail/assets/106194/5f103ee1-a168-47ca-a5b2-f1385d4deac9"> ## Phone <img width="2032" alt="Screenshot 2023-11-07 at 12 25 21" src="https://github.com/hail-is/hail/assets/106194/087b638c-e6f8-4633-9639-9f188b6b2e57"> <img width="2032" alt="Screenshot 2023-11-07 at 12 25 23" src="https://github.com/hail-is/hail/assets/106194/cba530ea-d75c-4609-8307-16b3096a0e8c"> With the navbar open, in mobile, it looks the same as the non-docs pages.
Configuration menu - View commit details
-
Copy full SHA for 67801df - Browse repository at this point
Copy the full SHA 67801dfView commit details -
[query] Use apache commons lang3 instead of lang (hail-is#13970)
`org.apache.commons.lang` is from the `commons-lang` library, but in `build.gradle` we explicitly depend on `commons-lang3` which puts everything under the `lang3` package. We must be picking up `commons-lang` as some transitive dependency but we no longer get it in Spark 3.4. Regardless, better to use what we explicitly depend on.
Configuration menu - View commit details
-
Copy full SHA for 0618348 - Browse repository at this point
Copy the full SHA 0618348View commit details
Commits on Nov 9, 2023
-
[hailtop] Dont nest event loops unnecessarily (hail-is#13677)
Removes any occurences of async / sync / async nesting in the code, i.e. a coroutine should not involve somewhere deep down a synchronous call that blocks on the completion of an async task. --------- Co-authored-by: Dan King <dking@broadinstitute.org>
Configuration menu - View commit details
-
Copy full SHA for ebfbc24 - Browse repository at this point
Copy the full SHA ebfbc24View commit details -
[batch] Prevent the worker container from being killed on OOM event (h…
…ail-is#13977) This PR fixes that crun has a different way to specify memory requirements when using cgroups v2 instead of cgroups v1. Should fix hail-is#13902.
Configuration menu - View commit details
-
Copy full SHA for 3b38d0b - Browse repository at this point
Copy the full SHA 3b38d0bView commit details -
Merge upstream HEAD (3b38d0b, 2023-11-10) for 0.2.126 and OOM fix
In particular, we need to incorporate and test hail-is#13977 as the proposed fix for jobs becoming unresponsive due to being targeted by the kernel's OOM-killer. (Our local gcsfuse repo workaround is replaced by upstream's.)
Configuration menu - View commit details
-
Copy full SHA for 7e15f72 - Browse repository at this point
Copy the full SHA 7e15f72View commit details