Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reference disks in GCP Batch [WX-1819] #7502

Merged
merged 15 commits into from
Sep 11, 2024

Conversation

mcovarr
Copy link
Contributor

@mcovarr mcovarr commented Aug 20, 2024

Description

Reference disks now working in the GCP Batch backend 🎉

Release Notes Confirmation

CHANGELOG.md

  • I updated CHANGELOG.md in this PR
  • I assert that this change shouldn't be included in CHANGELOG.md because it doesn't impact community users

Terra Release Notes

  • I added a suggested release notes entry in this Jira ticket
  • I assert that this change doesn't need Jira release notes because it doesn't impact Terra users

filesystems.http {}

# Cromwell 57+ reference disk manifest specification included here.
include "papi_v2_reference_image_manifest.conf"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll want to rename this manifest file once we have things working

@mcovarr
Copy link
Contributor Author

mcovarr commented Sep 9, 2024

With the current code I see in the logs:

"docker: Error response from daemon: error while creating mount source path '/mnt/11a4324d4472f639f3fc558b00afeacd': mkdir /mnt/11a4324d4472f639f3fc558b00afeacd: read-only file system."

and in the job metadata the following. Note the lack of ro specifiers in the docker command line invocation, and the presence of ro specifiers in the volumes:

            "container": {
              "commands": [
                "-c",
                "printf '%s %s\\n' \"$(date -u '+%Y/%m/%d %H:%M:%S')\" Running\\ user\\ runnable:\\ docker\\ run\\ -v\\ /mnt/disks/cromwell_root:/mnt/disks/cromwell_root\\ -v\\ /mnt/11a4324d4472f639f3fc558b00afeacd:/mnt/11a4324d4472f639f3fc558b00afeacd\\ -v\\ /mnt/d9e025138b28caa42dd4006fc3636661:/mnt/d9e025138b28caa42dd4006fc3636661\\ --entrypoint\\=/bin/bash\\ ubuntu@sha256:8a37d68f4f73ebf3d4efafbcf66379bf3728902a8038616808f04e34a9ab63ee\\ /mnt/disks/cromwell_root/script"
              ],
              "entrypoint": "/bin/sh",
              "imageUri": "gcr.io/google.com/cloudsdktool/cloud-sdk:461.0.0-alpine",
              "volumes": [
                "/mnt/disks/cromwell_root:/mnt/disks/cromwell_root:rw",
                "/mnt/11a4324d4472f639f3fc558b00afeacd:/mnt/11a4324d4472f639f3fc558b00afeacd:ro",
                "/mnt/d9e025138b28caa42dd4006fc3636661:/mnt/d9e025138b28caa42dd4006fc3636661:ro"
              ]
            }

@dspeck1
Copy link
Collaborator

dspeck1 commented Sep 9, 2024

Thanks! I was just about to ask you for the logs from GCP. Looks like the issue is the mount path. Batch only allows mounting to specific locations. I will push an update to the mount path.

@mcovarr
Copy link
Contributor Author

mcovarr commented Sep 9, 2024

That did the trick for the one test currently in the PR! There are a few other PAPI v2 reference disk tests for which I'll make GCP Batch versions and add those here.

@mcovarr mcovarr changed the title Code to repro broken reference disks in GCP Batch [WX-1819] Reference disks in GCP Batch [WX-1819] Sep 10, 2024
@mcovarr mcovarr marked this pull request as ready for review September 10, 2024 18:48
@mcovarr mcovarr requested a review from a team as a code owner September 10, 2024 18:48
@mcovarr mcovarr merged commit 7176a53 into develop Sep 11, 2024
37 checks passed
@mcovarr mcovarr deleted the wx_1819_reference_disks_in_batch branch September 11, 2024 14:56
@jbakerpmc
Copy link

Question: Is including reference-disk-localization manifest in the cromwell.config out of date when using GCPBatch as the backend? Should this just instead point to a separate config file? For example include "papi_v2_reference_image_manifest.conf" in the config file instead?

@mcovarr
Copy link
Contributor Author

mcovarr commented Sep 25, 2024

@jbakerpmc there have been no changes to reference disk localization configuration between GCP Batch and PAPI v2 beta that I'm aware of. You can include if you prefer to keep reference disk config in a separate file, or just inline to your main config if you prefer.

@jbakerpmc
Copy link

jbakerpmc commented Sep 26, 2024

@mcovarr Thanks for your response. The documentation can be somewhat unclear. I've updated the localization and have kept this inline with my main config for GCP Batch. I am using Cromwell v87.

However, while running a job, I’m encountering issues when Cromwell is attempting to mount my files to a local mount. I have been monitoring the VM and job, it seems Cromwell is unsure of how to handle this: For instance:

Error 1:

severity: "DEFAULT"
textPayload: "umount: /mnt/2d49bcb009113835140d638a10b535af: no mount point specified."
timestamp: "2024-09-26T14:07:54.88114

Error 2:

severity: "ERROR"
textPayload: "Copying gs://test-cromwell-genomics-resources/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai to file:///mnt/disks/cromwell_root/test-cromwell-genomics-resources/references/hg38/v0/Homo_sapiens_assembly38.fasta.fai"

@mcovarr
Copy link
Contributor Author

mcovarr commented Sep 26, 2024

@jbakerpmc GCP Batch reference disks are broken in Cromwell 87, you'll need to run from the develop branch at least until Cromwell 88 is released.

@jbakerpmc
Copy link

@mcovarr I continue to get the same error after creating a build with the develop branch seems that it is not ready! Thanks for letting me know :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants