Skip to content

Commit

Permalink
[test-dataproc] make test-dataproc-37 and -38 not race (#13573)
Browse files Browse the repository at this point in the history
Consider, for example, this deploy: https://ci.hail.is/batches/7956812.
`test-dataproc-37` succeeded but `test-dataproc-38` failed (it timed out
b/c the master failed to come online).

You can see the error logs for the cluster here:
https://cloudlogging.app.goo.gl/t1ux8oqy11Ba2dih7

It states a certain file either did not exist or we did not have
permission to access it.

[`test_dataproc-37`](https://batch.hail.is/batches/7956812/jobs/193) and
[`test_dataproc-38`](https://batch.hail.is/batches/7956812/jobs/194)
started around the same time and both uploaded four files into:


gs://hail-30-day/hailctl/dataproc/ci_test_dataproc/0.2.121-7343e9c368dc/

And then set it to public read/write. The public read/write means that
permissions are not the issue.

Instead, the issue is that there must be some sort of race condition in
GCS which means that if you "patch" (aka overwrite) an existing file, it
is possible that a concurrent reader will see the file as not existing.

Unfortunately, I cannot confirm this with audit logs of the writes and
read because [public objects do not generate audit
logs](https://cloud.google.com/logging/docs/audit#data-access).
> Publicly available resources that have the Identity and Access
Management policies
[allAuthenticatedUsers](https://cloud.google.com/iam/docs/overview#allauthenticatedusers)
or [allUsers](https://cloud.google.com/iam/docs/overview#allusers) don't
generate audit logs. Resources that can be accessed without logging into
a Google Cloud, Google Workspace, Cloud Identity, or Drive Enterprise
account don't generate audit logs. This helps protect end-user
identities and information.
  • Loading branch information
danking authored Sep 6, 2023
1 parent 56f6231 commit 35da6f4
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 5 deletions.
4 changes: 2 additions & 2 deletions build.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3210,7 +3210,7 @@ steps:
cd hail
chmod 755 ./gradlew
time retry ./gradlew --version
make test-dataproc-37 DEV_CLARIFIER=ci_test_dataproc
make test-dataproc-37 DEV_CLARIFIER=ci_test_dataproc-37
dependsOn:
- ci_utils_image
- default_ns
Expand Down Expand Up @@ -3252,7 +3252,7 @@ steps:
cd hail
chmod 755 ./gradlew
time retry ./gradlew --version
make test-dataproc-38 DEV_CLARIFIER=ci_test_dataproc
make test-dataproc-38 DEV_CLARIFIER=ci_test_dataproc-38
dependsOn:
- ci_utils_image
- default_ns
Expand Down
5 changes: 4 additions & 1 deletion hail/python/hail/docs/change_log.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ Released 2023-08-31
Query-on-Batch and Batch use.

### Bug Fixes
- (hail#13327) Fix (hail#12936) in which VEP frequently failed (due to Docker not starting up) on
- (hail#13573) Fix (hail#12936) in which VEP frequently failed (due to Docker not starting up) on
clusters with a non-trivial number of workers.
- (hail#13485) Fix (hail#13479) in which `hl.vds.local_to_global` could produce invalid values when
the LA field is too short. There were and are no issues when the LA field has the correct length.
Expand Down Expand Up @@ -109,6 +109,9 @@ Released 2023-08-31
### Deprecations

- (hail#13275) Hail no longer officially supports Python 3.8.
- (hail#13508) The `n` parameter of `MatrixTable.tail` is deprecated in favor of a new `n_rows`
parameter.


## Version 0.2.120

Expand Down
6 changes: 6 additions & 0 deletions hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh37.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/bin/bash

set -x

export PROJECT="$(gcloud config get-value project)"
export ASSEMBLY=GRCh37
export VEP_CONFIG_PATH="$(/usr/share/google/get_metadata_value attributes/VEP_CONFIG_PATH)"
Expand All @@ -24,6 +26,10 @@ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debi
apt-get update
apt-get install -y --allow-unauthenticated docker-ce

# https://github.com/hail-is/hail/issues/12936
sleep 60
sudo service docker restart

# Get VEP cache and LOFTEE data
gcloud storage cp --billing-project $PROJECT gs://hail-us-vep/vep85-loftee-gcloud.json /vep_data/vep85-gcloud.json
ln -s /vep_data/vep85-gcloud.json $VEP_CONFIG_PATH
Expand Down
8 changes: 6 additions & 2 deletions hail/python/hailtop/hailctl/dataproc/resources/vep-GRCh38.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/bin/bash

set -x

export PROJECT="$(gcloud config get-value project)"
export VEP_CONFIG_PATH="$(/usr/share/google/get_metadata_value attributes/VEP_CONFIG_PATH)"
export VEP_REPLICATE="$(/usr/share/google/get_metadata_value attributes/VEP_REPLICATE)"
Expand All @@ -24,6 +26,10 @@ sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/debi
apt-get update
apt-get install -y --allow-unauthenticated docker-ce

# https://github.com/hail-is/hail/issues/12936
sleep 60
sudo service docker restart

# Get VEP cache and LOFTEE data
gcloud storage cp --billing-project $PROJECT gs://hail-us-vep/vep95-GRCh38-loftee-gcloud.json /vep_data/vep95-GRCh38-gcloud.json
ln -s /vep_data/vep95-GRCh38-gcloud.json $VEP_CONFIG_PATH
Expand Down Expand Up @@ -56,5 +62,3 @@ docker run -i -v /vep_data/:/opt/vep/.vep/:ro ${VEP_DOCKER_IMAGE} \
/opt/vep/src/ensembl-vep/vep "\$@"
EOF
chmod +x /vep.sh

sudo service docker restart

0 comments on commit 35da6f4

Please sign in to comment.