Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge upstream changes #98

Merged
merged 49 commits into from
Mar 23, 2021
Merged

Merge upstream changes #98

merged 49 commits into from
Mar 23, 2021

Conversation

lgruen
Copy link

@lgruen lgruen commented Mar 23, 2021

No description provided.

jigold and others added 30 commits March 9, 2021 09:54
* [batch] Worker cleanup

* more changes

* wip

* delint

* additions?

* fix
* [query] Add `source_file_field` to `import_table`

CHANGELOG: Add `source_file_field` parameter to `hl.import_table` to allow lines to be associated with their original source file.

* ugh
…0142)

* [ci] add authorize sha and action items table to user page

* [ci] track review requested in addition to assigned for PR reviews
* [batch] add more logs and do not wait for asyncgens

I think there is some unresolved issue with asyncgen shutdown that is keeping
workers alive. This is not an issue in worker because worker calls sys.exit
which forcibly stops execution. cc: @daniel-goldstein @jigold.

* fix lint
* [query-service] maybe fix event loop not initialized

The event loop is supposed to be initialized in the main thread. Sometimes
our tests get placed in the non-main thread (always a thread named Dummy-1).
Hopefully the session-scoped fixture is run in the main thread.

* fix
* [prometheus] add prometheus to track SLIs

* add wraps
* [query] apply nest-asyncio as early as possible

* fix
Looks like this got added in some dndarray work
* [query-service] fix receive logic

Only one coro waits on receive now. We still error if a message is sent before
we make our first response.

* fix

* fix
…il-is#10172)

* [linting] add curlylint check for any service that renders jinja2 templates

* [linting] spaces not tabs
* [website] fix website

I build old versions of the docs and use them in new websites. This does not
work for versions of the docs before I introduced the new system. In particular
versions 0.2.63 and before generate old-style docs.

* tutorials are templated
They are too flaky currently due to the version issue.
* Bump version number

* Updated changelog
…rk (hail-is#10184)

* [query-service] teach query service to read MTs and Ts created by Spark

Hail-on-Spark uses HadoopFS which emulates directories by creating size-zero files with
the name `gs://bucket/dirname/`. Note: the object name literally ends in a slash. Such files
should not be included in `listStatus` (they should always be empty anyway). Unfortunately,
my fix in hail-is#9914 was wrong because `GoogleStorageFileStatus` removes
the trailing slash. This prevented the path from matching `path`, which always ends in a `/`.

* fix
* [googlestoragefs] ignore the directory check entirely

If a file exists with the *same name as the directory we are listing*,
then it must be a directory marker. It does not matter if that file is
a directory or not.

* Update GoogleStorageFS.scala
* Attempt implementing the sum rule in Emit

* Connected the python code, but not working yet

* NDArrayExpression.sum is working now

* Add default arg when no axis is provided

* More comprehensive test

* Unused imports

* Use sum appropriately in linear_regression_rows_nd

* Deleted extra blank line

* Don't use typeToTypeInfo, make NumericPrimitives the source of these decisions

* Better assertions, with tests

* Got the summation index correct

* Add documentation
daniel-goldstein and others added 18 commits March 16, 2021 15:45
* Add map, but protect users of the spark backend from writing arbitrary maps

* If densify would have been a no-op, that should work

* Densify and Sparsify are no-ops for now

* Rename map to map_dense and map_sparse. Give better implementations for add, multiply, divide, subtract of a scalar

* Make the maps underscore methods
* [tls] make hello use tls

* change pylint ignore message
* Use the checkpointed table from mt_to_table_of_ndarray to avoid recomputing mt

* Keep extra row fields from being included
* Merge pull request #35 from populationgenomics/add-query-graceful-shutdown

Add query graceful shutdown

* Remove unused argument from query:on_shutdown
…ail-is#10203)

* [auth] add more options for obtaining session id for dev credentials

* [auth] extract userinfo query for use in both userinfo and verify_dev_credentials

* remove unused import
* Change hail to use spark3 and scala 2.12 by default, change build_hail_spar3 to instead test spark2 for backwards support

* Update Makefile

* Update dataproc image version

* Scale down the dataproc version, since latest dataproc is using Spark release candidate

* Update pyspark version in requirements.txt

* Bump scala/spark patch versions

* We want to use the newer py4j jar when using spark 3

* Upgrade json4s

* I now want Spark 3.1.1, since it's been released

* Upgrade to 3.1.1 in the Makefile, fix a deprecateed IOUtils method

* Update pyspark as well

* Don't update json4s

* Try upgrading version

* Fixed issue for constructing bufferspecs

* Should at least be using newest one

* Remove abstracts from type hints

* Revert "Remove abstracts from type hints"

This reverts commit 1e0d194.

* Things don't go well if I don't use the same json4s version as Spark

* Mixed a typeHintFieldName

* See if this fixes my BlockMatrixSparsity issue

* json4s can't handle a curried apply method

* This works so long as the jar file is included in the libs directory

* Makefile changes to support pulling elasticsearch

* Use dataproc image for Spark 3.1.1

* Update patch version of dataproc image, no longer uses Spark RC

* Fixed up Makefile, now correctly depends on copying the jar

* Now we just check that the specified version is 7, as that's all we support

* Delete build_hail_spark2, we can't support spark2

* Version checks for Scala and Spark

* Updated installation docs

* Spark versions warning

* Update some old pysparks
* [batch] Add more info to UI pages

* fixes

* addr comment

* addr comments
Bumps [jinja2](https://github.com/pallets/jinja) from 2.10.1 to 2.11.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst)
- [Commits](pallets/jinja@2.10.1...2.11.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* [docker][hail] update to latest pytest

Issues like this https://ci.hail.is/batches/221291/jobs/112 do not appear locally for me,
I suspect this is due to my using a much newer pytest.

* fix many tests incorrectly using pytest

* another one

* remove unnecessary pip installs in service test dockerfiles

* fix
…uting (hail-is#10207)

* [gateway] cut out router-resolver from internal auth flow

* [gateway] cut out router from internal
* add available pan-ukb datasets

* add rst files for schemas

* reference associated variant indices HT in the block matrix descriptions
We don't test the logs, but I did test this manually, it works as
expected.
…vice backend (hail-is#10199)

* Fix tmp_dir default, which doesn't work for the service backend.

* Fix type for tmp_dir.
@lgruen lgruen requested a review from illusional March 23, 2021 22:44
@@ -10,7 +10,7 @@ RUN hail-apt-get-install \
COPY docker/hail-ubuntu/pip.conf /root/.config/pip/pip.conf
COPY docker/hail-ubuntu/hail-pip-install /bin/hail-pip-install
COPY docker/requirements.txt .
RUN hail-pip-install -r requirements.txt pyspark==2.4.0
RUN hail-pip-install -r requirements.txt pyspark==3.1.1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Juicy, I like this!

Copy link

@illusional illusional left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the query definition issue in: 3ad57ea

query/query/query.py Outdated Show resolved Hide resolved
@illusional illusional merged commit 317f9a3 into main Mar 23, 2021
@illusional illusional deleted the upstream branch March 23, 2021 22:58
vladsavelyev pushed a commit that referenced this pull request Mar 26, 2021
* [batch] Worker cleanup (hail-is#10155)

* [batch] Worker cleanup

* more changes

* wip

* delint

* additions?

* fix

* [query] Add `source_file_field` to `import_table` (hail-is#10164)

* [query] Add `source_file_field` to `import_table`

CHANGELOG: Add `source_file_field` parameter to `hl.import_table` to allow lines to be associated with their original source file.

* ugh

* [ci] add authorize sha and action items table to user page (hail-is#10142)

* [ci] add authorize sha and action items table to user page

* [ci] track review requested in addition to assigned for PR reviews

* [ci] add CI dropdown with link to user page (hail-is#10163)

* [batch] add more logs and do not wait for asyncgens (hail-is#10136)

* [batch] add more logs and do not wait for asyncgens

I think there is some unresolved issue with asyncgen shutdown that is keeping
workers alive. This is not an issue in worker because worker calls sys.exit
which forcibly stops execution. cc: @daniel-goldstein @jigold.

* fix lint

* [query-service] maybe fix event loop not initialized (hail-is#10153)

* [query-service] maybe fix event loop not initialized

The event loop is supposed to be initialized in the main thread. Sometimes
our tests get placed in the non-main thread (always a thread named Dummy-1).
Hopefully the session-scoped fixture is run in the main thread.

* fix

* [prometheus] add prometheus to track SLIs (hail-is#10165)

* [prometheus] add prometheus to track SLIs

* add wraps

* [query] apply nest-asyncio as early as possible (hail-is#10158)

* [query] apply nest-asyncio as early as possible

* fix

* [grafana] set pod fsGroup to grafana user (hail-is#10162)

* fix linting errors (hail-is#10171)

* [query] Remove verbose print (hail-is#10167)

Looks like this got added in some dndarray work

* [ci] update assignees and reviewers on PR github update (hail-is#10168)

* [query-service] fix receive logic (hail-is#10159)

* [query-service] fix receive logic

Only one coro waits on receive now. We still error if a message is sent before
we make our first response.

* fix

* fix

* CHANGELOG: Fixed incorrect error message when incorrect type specified with hl.loop (hail-is#10174)

* [linting] add curlylint check for any service that renders jinja2 (hail-is#10172)

* [linting] add curlylint check for any service that renders jinja2 templates

* [linting] spaces not tabs

* [website] fix website (hail-is#10173)

* [website] fix website

I build old versions of the docs and use them in new websites. This does not
work for versions of the docs before I introduced the new system. In particular
versions 0.2.63 and before generate old-style docs.

* tutorials are templated

* [ci] change mention for deploy failure (hail-is#10178)

* [gateway] move ukbb routing into gateway (hail-is#10179)

* [query] Fix filter intervals (keep=False) memory leak (hail-is#10182)

* [query-service] remove service backend tests (hail-is#10180)

They are too flaky currently due to the version issue.

* [website] pass response body as kwarg (hail-is#10176)

* Release 0.2.64 (hail-is#10183)

* Bump version number

* Updated changelog

* [nginx] ensure nginx configs dont overwrite each other in build.yaml (hail-is#10181)

* [query-service] teach query service to read MTs and Ts created by Spark (hail-is#10184)

* [query-service] teach query service to read MTs and Ts created by Spark

Hail-on-Spark uses HadoopFS which emulates directories by creating size-zero files with
the name `gs://bucket/dirname/`. Note: the object name literally ends in a slash. Such files
should not be included in `listStatus` (they should always be empty anyway). Unfortunately,
my fix in hail-is#9914 was wrong because `GoogleStorageFileStatus` removes
the trailing slash. This prevented the path from matching `path`, which always ends in a `/`.

* fix

* [website] dont jinja render any of the batch docs (hail-is#10190)

* [googlestoragefs] ignore the directory check entirely (hail-is#10185)

* [googlestoragefs] ignore the directory check entirely

If a file exists with the *same name as the directory we are listing*,
then it must be a directory marker. It does not matter if that file is
a directory or not.

* Update GoogleStorageFS.scala

* [ci] fix focus on slash and search job page for PRs (hail-is#10194)

* [query] Improve file compatibility error (hail-is#10191)

* Call init_service from init based on HAIL_QUERY_BACKEND value. (hail-is#10189)

* [query] NDArray Sum (hail-is#10187)

* Attempt implementing the sum rule in Emit

* Connected the python code, but not working yet

* NDArrayExpression.sum is working now

* Add default arg when no axis is provided

* More comprehensive test

* Unused imports

* Use sum appropriately in linear_regression_rows_nd

* Deleted extra blank line

* Don't use typeToTypeInfo, make NumericPrimitives the source of these decisions

* Better assertions, with tests

* Got the summation index correct

* Add documentation

* [website] fix resource path for non-html files in the docs (hail-is#10196)

* [query] Remove tcode from primitive orderings (hail-is#10193)

* [query] BlockMatrix map (hail-is#10195)

* Add map, but protect users of the spark backend from writing arbitrary maps

* If densify would have been a no-op, that should work

* Densify and Sparsify are no-ops for now

* Rename map to map_dense and map_sparse. Give better implementations for add, multiply, divide, subtract of a scalar

* Make the maps underscore methods

* [query] Remove all uses of .tcode[Boolean] (hail-is#10198)

* [ci] make test hello speak https (hail-is#10192)

* [tls] make hello use tls

* change pylint ignore message

* [query] blanczos_pca dont do extra loading work (hail-is#10201)

* Use the checkpointed table from mt_to_table_of_ndarray to avoid recomputing mt

* Keep extra row fields from being included

* Add query graceful shutdown for rolling updates (hail-is#10106)

* Merge pull request #35 from populationgenomics/add-query-graceful-shutdown

Add query graceful shutdown

* Remove unused argument from query:on_shutdown

* [auth] add more options for obtaining session id for dev credentials (hail-is#10203)

* [auth] add more options for obtaining session id for dev credentials

* [auth] extract userinfo query for use in both userinfo and verify_dev_credentials

* remove unused import

* [query] Default to Spark 3 (hail-is#10054)

* Change hail to use spark3 and scala 2.12 by default, change build_hail_spar3 to instead test spark2 for backwards support

* Update Makefile

* Update dataproc image version

* Scale down the dataproc version, since latest dataproc is using Spark release candidate

* Update pyspark version in requirements.txt

* Bump scala/spark patch versions

* We want to use the newer py4j jar when using spark 3

* Upgrade json4s

* I now want Spark 3.1.1, since it's been released

* Upgrade to 3.1.1 in the Makefile, fix a deprecateed IOUtils method

* Update pyspark as well

* Don't update json4s

* Try upgrading version

* Fixed issue for constructing bufferspecs

* Should at least be using newest one

* Remove abstracts from type hints

* Revert "Remove abstracts from type hints"

This reverts commit 1e0d194.

* Things don't go well if I don't use the same json4s version as Spark

* Mixed a typeHintFieldName

* See if this fixes my BlockMatrixSparsity issue

* json4s can't handle a curried apply method

* This works so long as the jar file is included in the libs directory

* Makefile changes to support pulling elasticsearch

* Use dataproc image for Spark 3.1.1

* Update patch version of dataproc image, no longer uses Spark RC

* Fixed up Makefile, now correctly depends on copying the jar

* Now we just check that the specified version is 7, as that's all we support

* Delete build_hail_spark2, we can't support spark2

* Version checks for Scala and Spark

* Updated installation docs

* Spark versions warning

* Update some old pysparks

* [batch] Add more info to UI pages (hail-is#10070)

* [batch] Add more info to UI pages

* fixes

* addr comment

* addr comments

* Bump jinja2 from 2.10.1 to 2.11.3 in /docker (hail-is#10209)

Bumps [jinja2](https://github.com/pallets/jinja) from 2.10.1 to 2.11.3.
- [Release notes](https://github.com/pallets/jinja/releases)
- [Changelog](https://github.com/pallets/jinja/blob/master/CHANGES.rst)
- [Commits](pallets/jinja@2.10.1...2.11.3)

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* [docker][hail] update to latest pytest (hail-is#10177)

* [docker][hail] update to latest pytest

Issues like this https://ci.hail.is/batches/221291/jobs/112 do not appear locally for me,
I suspect this is due to my using a much newer pytest.

* fix many tests incorrectly using pytest

* another one

* remove unnecessary pip installs in service test dockerfiles

* fix

* [gateway] Cut out router and router-resolver from gateway internal routing (hail-is#10207)

* [gateway] cut out router-resolver from internal auth flow

* [gateway] cut out router from internal

* [datasets] add pan-ukb datasets (hail-is#10186)

* add available pan-ukb datasets

* add rst files for schemas

* reference associated variant indices HT in the block matrix descriptions

* [query] Add json warn context to `parse_json` (hail-is#10160)

We don't test the logs, but I did test this manually, it works as
expected.

* [query] fix tmp_dir default in init(), which doesn't work for the service backend (hail-is#10199)

* Fix tmp_dir default, which doesn't work for the service backend.

* Fix type for tmp_dir.

* [gitignore]ignore website and doc files (hail-is#10214)

* Remove duplicate on_shutdown in query service

Co-authored-by: jigold <jigold@users.noreply.github.com>
Co-authored-by: Tim Poterba <tpoterba@broadinstitute.org>
Co-authored-by: Daniel Goldstein <danielgold95@gmail.com>
Co-authored-by: Dan King <daniel.zidan.king@gmail.com>
Co-authored-by: John Compitello <johnc@broadinstitute.org>
Co-authored-by: Christopher Vittal <cvittal@broadinstitute.org>
Co-authored-by: Michael Franklin <michael@illusional.net>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Patrick Cummings <42842025+pwc2@users.noreply.github.com>
Co-authored-by: Carolin Diaz <63973811+CDiaz96@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants