Merge upstream changes #98

lgruen · 2021-03-23T22:44:01Z

No description provided.

* [batch] Worker cleanup * more changes * wip * delint * additions? * fix

* [query] Add `source_file_field` to `import_table` CHANGELOG: Add `source_file_field` parameter to `hl.import_table` to allow lines to be associated with their original source file. * ugh

…0142) * [ci] add authorize sha and action items table to user page * [ci] track review requested in addition to assigned for PR reviews

@daniel-goldstein

* [batch] add more logs and do not wait for asyncgens I think there is some unresolved issue with asyncgen shutdown that is keeping workers alive. This is not an issue in worker because worker calls sys.exit which forcibly stops execution. cc: @daniel-goldstein @jigold. * fix lint

* [query-service] maybe fix event loop not initialized The event loop is supposed to be initialized in the main thread. Sometimes our tests get placed in the non-main thread (always a thread named Dummy-1). Hopefully the session-scoped fixture is run in the main thread. * fix

* [prometheus] add prometheus to track SLIs * add wraps

* [query] apply nest-asyncio as early as possible * fix

Looks like this got added in some dndarray work

* [query-service] fix receive logic Only one coro waits on receive now. We still error if a message is sent before we make our first response. * fix * fix

…d with hl.loop (hail-is#10174)

…il-is#10172) * [linting] add curlylint check for any service that renders jinja2 templates * [linting] spaces not tabs

* [website] fix website I build old versions of the docs and use them in new websites. This does not work for versions of the docs before I introduced the new system. In particular versions 0.2.63 and before generate old-style docs. * tutorials are templated

They are too flaky currently due to the version issue.

* Bump version number * Updated changelog

…ail-is#10181)

…rk (hail-is#10184) * [query-service] teach query service to read MTs and Ts created by Spark Hail-on-Spark uses HadoopFS which emulates directories by creating size-zero files with the name `gs://bucket/dirname/`. Note: the object name literally ends in a slash. Such files should not be included in `listStatus` (they should always be empty anyway). Unfortunately, my fix in hail-is#9914 was wrong because `GoogleStorageFileStatus` removes the trailing slash. This prevented the path from matching `path`, which always ends in a `/`. * fix

* [googlestoragefs] ignore the directory check entirely If a file exists with the *same name as the directory we are listing*, then it must be a directory marker. It does not matter if that file is a directory or not. * Update GoogleStorageFS.scala

…is#10189)

* Attempt implementing the sum rule in Emit * Connected the python code, but not working yet * NDArrayExpression.sum is working now * Add default arg when no axis is provided * More comprehensive test * Unused imports * Use sum appropriately in linear_regression_rows_nd * Deleted extra blank line * Don't use typeToTypeInfo, make NumericPrimitives the source of these decisions * Better assertions, with tests * Got the summation index correct * Add documentation

…0196)

* Add map, but protect users of the spark backend from writing arbitrary maps * If densify would have been a no-op, that should work * Densify and Sparsify are no-ops for now * Rename map to map_dense and map_sparse. Give better implementations for add, multiply, divide, subtract of a scalar * Make the maps underscore methods

* [tls] make hello use tls * change pylint ignore message

* Use the checkpointed table from mt_to_table_of_ndarray to avoid recomputing mt * Keep extra row fields from being included

* Merge pull request #35 from populationgenomics/add-query-graceful-shutdown Add query graceful shutdown * Remove unused argument from query:on_shutdown

…ail-is#10203) * [auth] add more options for obtaining session id for dev credentials * [auth] extract userinfo query for use in both userinfo and verify_dev_credentials * remove unused import

* Change hail to use spark3 and scala 2.12 by default, change build_hail_spar3 to instead test spark2 for backwards support * Update Makefile * Update dataproc image version * Scale down the dataproc version, since latest dataproc is using Spark release candidate * Update pyspark version in requirements.txt * Bump scala/spark patch versions * We want to use the newer py4j jar when using spark 3 * Upgrade json4s * I now want Spark 3.1.1, since it's been released * Upgrade to 3.1.1 in the Makefile, fix a deprecateed IOUtils method * Update pyspark as well * Don't update json4s * Try upgrading version * Fixed issue for constructing bufferspecs * Should at least be using newest one * Remove abstracts from type hints * Revert "Remove abstracts from type hints" This reverts commit 1e0d194. * Things don't go well if I don't use the same json4s version as Spark * Mixed a typeHintFieldName * See if this fixes my BlockMatrixSparsity issue * json4s can't handle a curried apply method * This works so long as the jar file is included in the libs directory * Makefile changes to support pulling elasticsearch * Use dataproc image for Spark 3.1.1 * Update patch version of dataproc image, no longer uses Spark RC * Fixed up Makefile, now correctly depends on copying the jar * Now we just check that the specified version is 7, as that's all we support * Delete build_hail_spark2, we can't support spark2 * Version checks for Scala and Spark * Updated installation docs * Spark versions warning * Update some old pysparks

* [batch] Add more info to UI pages * fixes * addr comment * addr comments

Bumps [jinja2](https://github.com/pallets/jinja) from 2.10.1 to 2.11.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst) - [Commits](pallets/jinja@2.10.1...2.11.3) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [docker][hail] update to latest pytest Issues like this https://ci.hail.is/batches/221291/jobs/112 do not appear locally for me, I suspect this is due to my using a much newer pytest. * fix many tests incorrectly using pytest * another one * remove unnecessary pip installs in service test dockerfiles * fix

…uting (hail-is#10207) * [gateway] cut out router-resolver from internal auth flow * [gateway] cut out router from internal

* add available pan-ukb datasets * add rst files for schemas * reference associated variant indices HT in the block matrix descriptions

We don't test the logs, but I did test this manually, it works as expected.

…vice backend (hail-is#10199) * Fix tmp_dir default, which doesn't work for the service backend. * Fix type for tmp_dir.

illusional · 2021-03-23T22:46:31Z

batch/Dockerfile.worker

@@ -10,7 +10,7 @@ RUN hail-apt-get-install \
 COPY docker/hail-ubuntu/pip.conf /root/.config/pip/pip.conf
 COPY docker/hail-ubuntu/hail-pip-install /bin/hail-pip-install
 COPY docker/requirements.txt .
-RUN hail-pip-install -r requirements.txt pyspark==2.4.0
+RUN hail-pip-install -r requirements.txt pyspark==3.1.1


Juicy, I like this!

illusional

I fixed the query definition issue in: 3ad57ea

query/query/query.py

@daniel-goldstein

* [batch] Worker cleanup (hail-is#10155) * [batch] Worker cleanup * more changes * wip * delint * additions? * fix * [query] Add `source_file_field` to `import_table` (hail-is#10164) * [query] Add `source_file_field` to `import_table` CHANGELOG: Add `source_file_field` parameter to `hl.import_table` to allow lines to be associated with their original source file. * ugh * [ci] add authorize sha and action items table to user page (hail-is#10142) * [ci] add authorize sha and action items table to user page * [ci] track review requested in addition to assigned for PR reviews * [ci] add CI dropdown with link to user page (hail-is#10163) * [batch] add more logs and do not wait for asyncgens (hail-is#10136) * [batch] add more logs and do not wait for asyncgens I think there is some unresolved issue with asyncgen shutdown that is keeping workers alive. This is not an issue in worker because worker calls sys.exit which forcibly stops execution. cc: @daniel-goldstein @jigold. * fix lint * [query-service] maybe fix event loop not initialized (hail-is#10153) * [query-service] maybe fix event loop not initialized The event loop is supposed to be initialized in the main thread. Sometimes our tests get placed in the non-main thread (always a thread named Dummy-1). Hopefully the session-scoped fixture is run in the main thread. * fix * [prometheus] add prometheus to track SLIs (hail-is#10165) * [prometheus] add prometheus to track SLIs * add wraps * [query] apply nest-asyncio as early as possible (hail-is#10158) * [query] apply nest-asyncio as early as possible * fix * [grafana] set pod fsGroup to grafana user (hail-is#10162) * fix linting errors (hail-is#10171) * [query] Remove verbose print (hail-is#10167) Looks like this got added in some dndarray work * [ci] update assignees and reviewers on PR github update (hail-is#10168) * [query-service] fix receive logic (hail-is#10159) * [query-service] fix receive logic Only one coro waits on receive now. We still error if a message is sent before we make our first response. * fix * fix * CHANGELOG: Fixed incorrect error message when incorrect type specified with hl.loop (hail-is#10174) * [linting] add curlylint check for any service that renders jinja2 (hail-is#10172) * [linting] add curlylint check for any service that renders jinja2 templates * [linting] spaces not tabs * [website] fix website (hail-is#10173) * [website] fix website I build old versions of the docs and use them in new websites. This does not work for versions of the docs before I introduced the new system. In particular versions 0.2.63 and before generate old-style docs. * tutorials are templated * [ci] change mention for deploy failure (hail-is#10178) * [gateway] move ukbb routing into gateway (hail-is#10179) * [query] Fix filter intervals (keep=False) memory leak (hail-is#10182) * [query-service] remove service backend tests (hail-is#10180) They are too flaky currently due to the version issue. * [website] pass response body as kwarg (hail-is#10176) * Release 0.2.64 (hail-is#10183) * Bump version number * Updated changelog * [nginx] ensure nginx configs dont overwrite each other in build.yaml (hail-is#10181) * [query-service] teach query service to read MTs and Ts created by Spark (hail-is#10184) * [query-service] teach query service to read MTs and Ts created by Spark Hail-on-Spark uses HadoopFS which emulates directories by creating size-zero files with the name `gs://bucket/dirname/`. Note: the object name literally ends in a slash. Such files should not be included in `listStatus` (they should always be empty anyway). Unfortunately, my fix in hail-is#9914 was wrong because `GoogleStorageFileStatus` removes the trailing slash. This prevented the path from matching `path`, which always ends in a `/`. * fix * [website] dont jinja render any of the batch docs (hail-is#10190) * [googlestoragefs] ignore the directory check entirely (hail-is#10185) * [googlestoragefs] ignore the directory check entirely If a file exists with the *same name as the directory we are listing*, then it must be a directory marker. It does not matter if that file is a directory or not. * Update GoogleStorageFS.scala * [ci] fix focus on slash and search job page for PRs (hail-is#10194) * [query] Improve file compatibility error (hail-is#10191) * Call init_service from init based on HAIL_QUERY_BACKEND value. (hail-is#10189) * [query] NDArray Sum (hail-is#10187) * Attempt implementing the sum rule in Emit * Connected the python code, but not working yet * NDArrayExpression.sum is working now * Add default arg when no axis is provided * More comprehensive test * Unused imports * Use sum appropriately in linear_regression_rows_nd * Deleted extra blank line * Don't use typeToTypeInfo, make NumericPrimitives the source of these decisions * Better assertions, with tests * Got the summation index correct * Add documentation * [website] fix resource path for non-html files in the docs (hail-is#10196) * [query] Remove tcode from primitive orderings (hail-is#10193) * [query] BlockMatrix map (hail-is#10195) * Add map, but protect users of the spark backend from writing arbitrary maps * If densify would have been a no-op, that should work * Densify and Sparsify are no-ops for now * Rename map to map_dense and map_sparse. Give better implementations for add, multiply, divide, subtract of a scalar * Make the maps underscore methods * [query] Remove all uses of .tcode[Boolean] (hail-is#10198) * [ci] make test hello speak https (hail-is#10192) * [tls] make hello use tls * change pylint ignore message * [query] blanczos_pca dont do extra loading work (hail-is#10201) * Use the checkpointed table from mt_to_table_of_ndarray to avoid recomputing mt * Keep extra row fields from being included * Add query graceful shutdown for rolling updates (hail-is#10106) * Merge pull request #35 from populationgenomics/add-query-graceful-shutdown Add query graceful shutdown * Remove unused argument from query:on_shutdown * [auth] add more options for obtaining session id for dev credentials (hail-is#10203) * [auth] add more options for obtaining session id for dev credentials * [auth] extract userinfo query for use in both userinfo and verify_dev_credentials * remove unused import * [query] Default to Spark 3 (hail-is#10054) * Change hail to use spark3 and scala 2.12 by default, change build_hail_spar3 to instead test spark2 for backwards support * Update Makefile * Update dataproc image version * Scale down the dataproc version, since latest dataproc is using Spark release candidate * Update pyspark version in requirements.txt * Bump scala/spark patch versions * We want to use the newer py4j jar when using spark 3 * Upgrade json4s * I now want Spark 3.1.1, since it's been released * Upgrade to 3.1.1 in the Makefile, fix a deprecateed IOUtils method * Update pyspark as well * Don't update json4s * Try upgrading version * Fixed issue for constructing bufferspecs * Should at least be using newest one * Remove abstracts from type hints * Revert "Remove abstracts from type hints" This reverts commit 1e0d194. * Things don't go well if I don't use the same json4s version as Spark * Mixed a typeHintFieldName * See if this fixes my BlockMatrixSparsity issue * json4s can't handle a curried apply method * This works so long as the jar file is included in the libs directory * Makefile changes to support pulling elasticsearch * Use dataproc image for Spark 3.1.1 * Update patch version of dataproc image, no longer uses Spark RC * Fixed up Makefile, now correctly depends on copying the jar * Now we just check that the specified version is 7, as that's all we support * Delete build_hail_spark2, we can't support spark2 * Version checks for Scala and Spark * Updated installation docs * Spark versions warning * Update some old pysparks * [batch] Add more info to UI pages (hail-is#10070) * [batch] Add more info to UI pages * fixes * addr comment * addr comments * Bump jinja2 from 2.10.1 to 2.11.3 in /docker (hail-is#10209) Bumps [jinja2](https://github.com/pallets/jinja) from 2.10.1 to 2.11.3. - [Release notes](https://github.com/pallets/jinja/releases) - [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst) - [Commits](pallets/jinja@2.10.1...2.11.3) Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * [docker][hail] update to latest pytest (hail-is#10177) * [docker][hail] update to latest pytest Issues like this https://ci.hail.is/batches/221291/jobs/112 do not appear locally for me, I suspect this is due to my using a much newer pytest. * fix many tests incorrectly using pytest * another one * remove unnecessary pip installs in service test dockerfiles * fix * [gateway] Cut out router and router-resolver from gateway internal routing (hail-is#10207) * [gateway] cut out router-resolver from internal auth flow * [gateway] cut out router from internal * [datasets] add pan-ukb datasets (hail-is#10186) * add available pan-ukb datasets * add rst files for schemas * reference associated variant indices HT in the block matrix descriptions * [query] Add json warn context to `parse_json` (hail-is#10160) We don't test the logs, but I did test this manually, it works as expected. * [query] fix tmp_dir default in init(), which doesn't work for the service backend (hail-is#10199) * Fix tmp_dir default, which doesn't work for the service backend. * Fix type for tmp_dir. * [gitignore]ignore website and doc files (hail-is#10214) * Remove duplicate on_shutdown in query service Co-authored-by: jigold <jigold@users.noreply.github.com> Co-authored-by: Tim Poterba <tpoterba@broadinstitute.org> Co-authored-by: Daniel Goldstein <danielgold95@gmail.com> Co-authored-by: Dan King <daniel.zidan.king@gmail.com> Co-authored-by: John Compitello <johnc@broadinstitute.org> Co-authored-by: Christopher Vittal <cvittal@broadinstitute.org> Co-authored-by: Michael Franklin <michael@illusional.net> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Patrick Cummings <42842025+pwc2@users.noreply.github.com> Co-authored-by: Carolin Diaz <63973811+CDiaz96@users.noreply.github.com>

jigold and others added 30 commits March 9, 2021 09:54

[batch] Worker cleanup (hail-is#10155)

556fedf

* [batch] Worker cleanup * more changes * wip * delint * additions? * fix

[query] Add source_file_field to import_table (hail-is#10164)

7a45835

* [query] Add `source_file_field` to `import_table` CHANGELOG: Add `source_file_field` parameter to `hl.import_table` to allow lines to be associated with their original source file. * ugh

[ci] add authorize sha and action items table to user page (hail-is#1…

31459ac

…0142) * [ci] add authorize sha and action items table to user page * [ci] track review requested in addition to assigned for PR reviews

[ci] add CI dropdown with link to user page (hail-is#10163)

f2b980f

[prometheus] add prometheus to track SLIs (hail-is#10165)

7fa37a6

* [prometheus] add prometheus to track SLIs * add wraps

[query] apply nest-asyncio as early as possible (hail-is#10158)

831abfb

* [query] apply nest-asyncio as early as possible * fix

[grafana] set pod fsGroup to grafana user (hail-is#10162)

09dd086

fix linting errors (hail-is#10171)

18042cc

[query] Remove verbose print (hail-is#10167)

c456b7e

Looks like this got added in some dndarray work

[ci] update assignees and reviewers on PR github update (hail-is#10168)

bcb0f3b

[query-service] fix receive logic (hail-is#10159)

dd29d2b

* [query-service] fix receive logic Only one coro waits on receive now. We still error if a message is sent before we make our first response. * fix * fix

CHANGELOG: Fixed incorrect error message when incorrect type specifie…

251d681

…d with hl.loop (hail-is#10174)

[linting] add curlylint check for any service that renders jinja2 (ha…

434ba4a

…il-is#10172) * [linting] add curlylint check for any service that renders jinja2 templates * [linting] spaces not tabs

[ci] change mention for deploy failure (hail-is#10178)

d4689bf

[gateway] move ukbb routing into gateway (hail-is#10179)

f5e497a

[query] Fix filter intervals (keep=False) memory leak (hail-is#10182)

fee24a8

[query-service] remove service backend tests (hail-is#10180)

e47e71b

They are too flaky currently due to the version issue.

[website] pass response body as kwarg (hail-is#10176)

42847e4

Release 0.2.64 (hail-is#10183)

1ef7018

* Bump version number * Updated changelog

[nginx] ensure nginx configs dont overwrite each other in build.yaml (h…

72dc5ee

…ail-is#10181)

[website] dont jinja render any of the batch docs (hail-is#10190)

fbf6233

[ci] fix focus on slash and search job page for PRs (hail-is#10194)

d463980

[query] Improve file compatibility error (hail-is#10191)

9df985a

Call init_service from init based on HAIL_QUERY_BACKEND value. (hail-…

1443fbc

…is#10189)

daniel-goldstein and others added 18 commits March 16, 2021 15:45

[website] fix resource path for non-html files in the docs (hail-is#1…

79cc76b

…0196)

[query] Remove tcode from primitive orderings (hail-is#10193)

383069d

[query] Remove all uses of .tcode[Boolean] (hail-is#10198)

5d89225

[ci] make test hello speak https (hail-is#10192)

022d02f

* [tls] make hello use tls * change pylint ignore message

[query] blanczos_pca dont do extra loading work (hail-is#10201)

466f3c3

* Use the checkpointed table from mt_to_table_of_ndarray to avoid recomputing mt * Keep extra row fields from being included

Add query graceful shutdown for rolling updates (hail-is#10106)

8e1a9f1

* Merge pull request #35 from populationgenomics/add-query-graceful-shutdown Add query graceful shutdown * Remove unused argument from query:on_shutdown

[auth] add more options for obtaining session id for dev credentials (h…

9df5c2e

…ail-is#10203) * [auth] add more options for obtaining session id for dev credentials * [auth] extract userinfo query for use in both userinfo and verify_dev_credentials * remove unused import

[batch] Add more info to UI pages (hail-is#10070)

f475cb6

* [batch] Add more info to UI pages * fixes * addr comment * addr comments

[gateway] Cut out router and router-resolver from gateway internal ro…

a88fcd5

…uting (hail-is#10207) * [gateway] cut out router-resolver from internal auth flow * [gateway] cut out router from internal

[datasets] add pan-ukb datasets (hail-is#10186)

1a4aebc

* add available pan-ukb datasets * add rst files for schemas * reference associated variant indices HT in the block matrix descriptions

[query] Add json warn context to parse_json (hail-is#10160)

c14bce4

We don't test the logs, but I did test this manually, it works as expected.

[query] fix tmp_dir default in init(), which doesn't work for the ser…

357844b

…vice backend (hail-is#10199) * Fix tmp_dir default, which doesn't work for the service backend. * Fix type for tmp_dir.

[gitignore]ignore website and doc files (hail-is#10214)

8aee9a5

Merge remote-tracking branch 'upstream/main' into upstream

d457460

lgruen requested a review from illusional March 23, 2021 22:44

illusional reviewed Mar 23, 2021

View reviewed changes

Remove duplicate on_shutdown in query service

3ad57ea

illusional approved these changes Mar 23, 2021

View reviewed changes

query/query/query.py Outdated Show resolved Hide resolved

illusional merged commit 317f9a3 into main Mar 23, 2021

illusional deleted the upstream branch March 23, 2021 22:58

vladsavelyev mentioned this pull request Mar 29, 2021

Merge upstream changes #100

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge upstream changes #98

Merge upstream changes #98

lgruen commented Mar 23, 2021

illusional Mar 23, 2021

illusional left a comment

Merge upstream changes #98

Merge upstream changes #98

Conversation

lgruen commented Mar 23, 2021

illusional Mar 23, 2021

Choose a reason for hiding this comment

illusional left a comment

Choose a reason for hiding this comment