From 35da6f47ddcd1345ac2411b921c46f825fc31ae2 Mon Sep 17 00:00:00 2001 From: Dan King Date: Wed, 6 Sep 2023 18:59:28 -0400 Subject: [PATCH] [test-dataproc] make test-dataproc-37 and -38 not race (#13573) Consider, for example, this deploy: https://ci.hail.is/batches/7956812. `test-dataproc-37` succeeded but `test-dataproc-38` failed (it timed out b/c the master failed to come online). You can see the error logs for the cluster here: https://cloudlogging.app.goo.gl/t1ux8oqy11Ba2dih7 It states a certain file either did not exist or we did not have permission to access it. [`test_dataproc-37`](https://batch.hail.is/batches/7956812/jobs/193) and [`test_dataproc-38`](https://batch.hail.is/batches/7956812/jobs/194) started around the same time and both uploaded four files into: gs://hail-30-day/hailctl/dataproc/ci_test_dataproc/0.2.121-7343e9c368dc/ And then set it to public read/write. The public read/write means that permissions are not the issue. Instead, the issue is that there must be some sort of race condition in GCS which means that if you "patch" (aka overwrite) an existing file, it is possible that a concurrent reader will see the file as not existing. Unfortunately, I cannot confirm this with audit logs of the writes and read because [public objects do not generate audit logs](https://cloud.google.com/logging/docs/audit#data-access). > Publicly available resources that have the Identity and Access Management policies [allAuthenticatedUsers](https://cloud.google.com/iam/docs/overview#allauthenticatedusers) or [allUsers](https://cloud.google.com/iam/docs/overview#allusers) don't generate audit logs. Resources that can be accessed without logging into a Google Cloud, Google Workspace, Cloud Identity, or Drive Enterprise account don't generate audit logs. This helps protect end-user identities and information. --- build.yaml | 4 ++-- hail/python/hail/docs/change_log.md | 5 ++++- .../hailtop/hailctl/dataproc/resources/vep-GRCh37.sh | 6 ++++++ .../hailtop/hailctl/dataproc/resources/vep-GRCh38.sh | 8 ++++++-- 4 files changed, 18 insertions(+), 5 deletions(-) diff --git a/build.yaml b/build.yaml index d909a3b3ac2..388921695d0 100644 --- a/build.yaml +++ b/build.yaml @@ -3210,7 +3210,7 @@ steps: cd hail chmod 755 ./gradlew time retry ./gradlew --version - make test-dataproc-37 DEV_CLARIFIER=ci_test_dataproc + make test-dataproc-37 DEV_CLARIFIER=ci_test_dataproc-37 dependsOn: - ci_utils_image - default_ns @@ -3252,7 +3252,7 @@ steps: cd hail chmod 755 ./gradlew time retry ./gradlew --version - make test-dataproc-38 DEV_CLARIFIER=ci_test_dataproc + make test-dataproc-38 DEV_CLARIFIER=ci_test_dataproc-38 dependsOn: - ci_utils_image - default_ns diff --git a/hail/python/hail/docs/change_log.md b/hail/python/hail/docs/change_log.md index f6e693d5b45..ad76c6fe136 100644 --- a/hail/python/hail/docs/change_log.md +++ b/hail/python/hail/docs/change_log.md @@ -67,7 +67,7 @@ Released 2023-08-31 Query-on-Batch and Batch use. ### Bug Fixes -- (hail#13327) Fix (hail#12936) in which VEP frequently failed (due to Docker not starting up) on +- (hail#13573) Fix (hail#12936) in which VEP frequently failed (due to Docker not starting up) on clusters with a non-trivial number of workers. - (hail#13485) Fix (hail#13479) in which `hl.vds.local_to_global` could produce invalid values when the LA field is too short. There were and are no issues when the LA field has the correct length. @@ -109,6 +109,9 @@ Released 2023-08-31 ### Deprecations - (hail#13275) Hail no longer officially supports Python 3.8. +- (hail#13508) The `n` parameter of `MatrixTable.tail` is deprecated in favor of a new `n_rows` + parameter. + ## Version 0.2.120 diff --git a/hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh37.sh b/hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh37.sh index e8c55a927e8..c46d66fb947 100644 --- a/hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh37.sh +++ b/hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh37.sh @@ -1,5 +1,7 @@ #!/bin/bash +set -x + export PROJECT="$(gcloud config get-value project)" export ASSEMBLY=GRCh37 export VEP_CONFIG_PATH="$(/usr/share/google/get_metadata_value attributes/VEP_CONFIG_PATH)" @@ -24,6 +26,10 @@ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debi apt-get update apt-get install -y --allow-unauthenticated docker-ce +# https://github.com/hail-is/hail/issues/12936 +sleep 60 +sudo service docker restart + # Get VEP cache and LOFTEE data gcloud storage cp --billing-project $PROJECT gs://hail-us-vep/vep85-loftee-gcloud.json /vep_data/vep85-gcloud.json ln -s /vep_data/vep85-gcloud.json $VEP_CONFIG_PATH diff --git a/hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh38.sh b/hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh38.sh index 8fd7022f56e..c6711157de9 100644 --- a/hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh38.sh +++ b/hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh38.sh @@ -1,5 +1,7 @@ #!/bin/bash +set -x + export PROJECT="$(gcloud config get-value project)" export VEP_CONFIG_PATH="$(/usr/share/google/get_metadata_value attributes/VEP_CONFIG_PATH)" export VEP_REPLICATE="$(/usr/share/google/get_metadata_value attributes/VEP_REPLICATE)" @@ -24,6 +26,10 @@ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debi apt-get update apt-get install -y --allow-unauthenticated docker-ce +# https://github.com/hail-is/hail/issues/12936 +sleep 60 +sudo service docker restart + # Get VEP cache and LOFTEE data gcloud storage cp --billing-project $PROJECT gs://hail-us-vep/vep95-GRCh38-loftee-gcloud.json /vep_data/vep95-GRCh38-gcloud.json ln -s /vep_data/vep95-GRCh38-gcloud.json $VEP_CONFIG_PATH @@ -56,5 +62,3 @@ docker run -i -v /vep_data/:/opt/vep/.vep/:ro ${VEP_DOCKER_IMAGE} \ /opt/vep/src/ensembl-vep/vep "\$@" EOF chmod +x /vep.sh - -sudo service docker restart