Releases: broadinstitute/cromwell
88
88 Release Notes
Important Upgrade Note: Database Schema Change
Cromwell 88 includes a number of database schema changes to support new functionality and improve performance. Users should expect a longer-than-usual database migration due primarily to the IX_METADATA_ENTRY_WEU_MK
index added to METADATA_ENTRY
. In pre-release testing, this migration proceeded at about 3 million rows per minute. Please plan downtime accordingly.
GCP Batch Updates
- The
genomics
configuration entry was renamed tobatch
, see ReadTheDocs for more information. - Fixed a bug with not being able to recover jobs on Cromwell restart.
- Fixed machine type selection to match the Google Cloud Life Sciences backend, including default n1 non shared-core machine types and correct handling of
cpuPlatform
to select n2 or n2d machine types as appropriate. - Fixed preemption and maxRetries behavior. In particular, once a task has exhausted its allowed preemptible attempts, the task will be scheduled again on a non-preemptible VM.
- Fixed error message reporting for failed jobs.
- Fixed the "retry with more memory" feature.
- Fixed the reference disk feature.
- Fixed pulling Docker image metadata from private GCR repositories.
- Fixed
google_project
andgoogle_compute_service_account
workflow options not taking effect when using GCP Batch backend - Added a way to use a custom LogsPolicy for the job execution, setting
backend.providers.batch.config.batch.logs-policy
to "CLOUD_LOGGING" (default) keeps the current behavior, or, set it to "PATH" to stream the logs to Google Cloud Storage. - When "CLOUD_LOGGING" is used, many more Cromwell / WDL labels for workflow, root workflow, call, shard etc. are now assigned to GCP Batch log entries.
- Fixed subnet selection for networks that use custom subnet creation
- Updated runtime attributes documentation to clarify that the
nvidiaDriverVersion
key is ignored on GCP Batch.
Improvements
- A new optional feature prevents Cromwell from starting new jobs in a group that is currently experiencing cloud quota exhaustion. Jobs will be started once the group's quota becomes available. To enable this feature, set
quota-exhaustion-job-start-control.enabled
to true. - Users can now configure which algorithm is used to hash files for call caching purposes. See Configuring page in ReadTheDocs for details. Default behavior is unchanged.
- Cromwell now allows opting into configured soft links on shared file systems such as HPC environments. More details can be found here.
- Users reported cases where Life Sciences jobs failed due to insufficient quota, instead of queueing and waiting until quota is available (which is the expected behavior). Cromwell will now retry under these conditions, which present with errors such as "PAPI error code 9", "no available zones", and/or "quota too low".
- If Cromwell can't determine the size of the user command Docker image, it will increase Lifesciences API boot disk size by 30GB rather than 0. This should reduce incidence of tasks failing due to boot disk filling up.
- Resolved a hotspot in Cromwell to make the
size()
engine function perform much faster on file arrays. Common examples of file arrays could include globs or scatter-gather results. - The
IX_WORKFLOW_STORE_ENTRY_WS
index is removed fromWORKFLOW_STORE_ENTRY
. The index had low cardinality and workflow pickup is faster without it. - When Cromwell restarts during a workflow that is failing, it no longer reports pending tasks as a reason for that failure.
- As outlined in the WDL Spec, concatenating a string with an empty optional now correctly evaluates to the empty string.
Other Changes
- As of this version, a distribution of Java 17 is required to run Cromwell. Cromwell is developed, tested, and containerized using Eclipse Temurin.
RESTAPI.md
docs have been discontinued. Due to deprecation of the underlying library, Markdown docs will no longer be generated from the Cromwell API Swagger. The recommended alternative is starting a server and viewing the Swagger directly.- Removed obsolete health checks
- Docker Hub: Cromwell's healthcheck requests to Docker Hub were not authenticated, and thus became subject to rate limiting. To eliminate these false alarms, this functionality has been removed. The config key
services.HealthMonitor.config.check-dockerhub
is therefore obsolete. - GCS: Cromwell's health check of GCS has been removed. GCS does not have availability issues of note, and in typical configurations the check does not meaningfully test Cromwell's permissions. The config keys
services.HealthMonitor.config.check-gcs
and.gcs-bucket-to-check
are therefore obsolete.
- Docker Hub: Cromwell's healthcheck requests to Docker Hub were not authenticated, and thus became subject to rate limiting. To eliminate these false alarms, this functionality has been removed. The config key
- Code relating to the Google Genomics API (aka
v1Alpha
) has been removed since Google has entirely disabled that service. Cloud Life Sciences (akav2Beta
, deprecated) and Google Batch (akabatch
, recommended) remain the two viable GCP backends. Cloud Life Sciences is expected to be unavailable starting in July 2025 andv2Beta
support will be removed in a future Cromwell release. - Removed support for Nvidia K80 "Kepler" GPUs, which were discontinued by GCP in May 2024.
- Default GPU on Life Sciences is now Nvidia P100
- Default GPU on GCP Batch is now Nvidia T4
87
87 Release Notes
GCP Batch
- Added Nvidia driver install (default 418) (#7235)
- Fixed Docker mounting volumes with extra colon (#7240)
- Fixed issue with multiple zones defined in config (#7240)
- Fixed Batch label regex (#7355)
Progress toward WDL 1.1 Support
WDL 1.1 support is in progress. Users that would like to try out the current partial support can do so by using
WDL version development-1.1
. As of Cromwell 87, development-1.1
includes:
- Engine functions:
- Struct literals can be included in WDLs (#7391) (#7402)
- Added
returnCodes
runtime attribute (#7389)
upgrade
command removed from Womtool
Womtool previously supported a womtool upgrade
command for upgrading draft-2 WDLs to 1.0. With WDL 1.1 soon to
become the latest supported version, this functionality is retiring. (#7382)
Replacement of gsutil
with gcloud storage
In this release (#7359), all localization functionality on the GCP backend migrates to use the more modern and performant gcloud storage
. With sufficiently powerful worker VMs, Cromwell can now localize at up to 1200 MB/s [0][1][2].
In a future release, delocalization will also migrate to gcloud storage
. As part of that upcoming change, we are considering turning on parallel composite uploads by default to maximize performance. Delocalized composite objects will no longer have an md5 checksum in their metadata; refer to the matrix below [3]. If you have compatibility concerns for your workflow, please submit an issue.
Delocalization Strategy | Performance | crc32c | md5 |
---|---|---|---|
Classic | Baseline/slow | ✅ | ✅ |
Parallel Composite | Fast | ✅ | ❌ |
[0] Tested with Intel Ice Lake CPU platform, 16 vCPU, 32 GB RAM, 2500 GB SSD
[1] Throughput scales with vCPU count with a plateau at 16 vCPUs.
[2] Throughput scales with disk size and type with at a plateau at 2.5 TB SSD. Worked example: 1200 MB/s ÷ 0.48 MB/s per GB = 2500 GB.
[3] Cromwell itself uses crc32c hashes for call caching and is not affected
Other Improvements
- In certain cases DRS downloads have been found to hang forever. Cromwell will now time these out. (#7416)
- Increased default Akka
client.parsing.max-response-reason-length
to 1024 (#7406) - Workflow Completion Callback bodies now include fully-qualified output names (#7234)
- Improved workflow abort error handling (#7245)
- Improved logging for troubleshooting (#7246) (#7253) (#7388)
- Support for Intel Ice Lake chips in Life Sciences backend (#7252)
- Fix workflows getting stuck in Aborting when WDL has a type error (#7385)
- Updates to dependencies to fix security vulnerabilities.
86
86 Release Notes
GCP Batch
Cromwell now supports the GCP Batch backend for running workflows. See Backend
in ReadTheDocs for more information.
Workflow Completion Callback
Cromwell can be configured to send a POST request to a specified URL when a workflow completes. The request body includes the workflow ID, terminal state,
and (if applicable) final outputs or error message. See WorkflowCallback
in ReadTheDocs for more information.
Other Improvements
- Cromwell will now parallelize the downloads of DRS files that resolve to signed URLs. This significantly reduces the time localization takes in certain situations.
- WDL size engine function now works for HTTP files
- Improved Cromwell's handling of docker manifests. Additional logging information is emitted, and Cromwell will fall back to using OCI manifests if it encounters an error with a Docker Image Manifest V2.
85
85 Release Notes
Migration of PKs to BIGINT
The PK of below tables will be migrated from INT to BIGINT. Also, since ROOT_WORKFLOW_ID
in SUB_WORKFLOW_STORE_ENTRY
is a FK to WORKFLOW_STORE_ENTRY_ID
in WORKFLOW_STORE_ENTRY
it is also being migrated from INT to BIGINT.
- DOCKER_HASH_STORE_ENTRY
- WORKFLOW_STORE_ENTRY
- SUB_WORKFLOW_STORE_ENTRY
Improvement to "retry with more memory" behavior
Cromwell will now retry a task with more memory after it fails with return code 137, provided all
the other requirements for retrying with more memory are met.
DRS Improvements
Support for invoking CromwellDRSLocalizer
with manifest file
CromwellDRSLocalizer
can now handle multiple file localizations in a single invocation. Users can provide a
manifest file containing multiple (DRS id, local container path) pairs in CSV format, and they will be localized in
sequence, with the program exiting if any fail.
java -jar /path/to/localizer.jar [options] -m /local/path/to/manifest/file.txt
The previous method of passing in a single DRS file and container destination using positional arguments is still
supported.
Improvement to DRS localization in GCP papiv2beta backend
All DRS inputs to a task are now localized in a single PAPI action, which should improve speed and resolve
failures observed when attempting to localize a large number of DRS files.
Allow list for HTTP WDL resolution
Administrators can now configure Cromwell with an allow list that limits the domains from which WDLs can be resolved and imported.
Default behavior is unchanged (Cromwell attempts to resolve WDL files from any URI). Example configuration:
languages {
WDL {
http-allow-list {
enabled: true
allowed-http-hosts: [
"my.wdl.repo.org",
"raw.githubusercontent.com"
]
}
}
}
CWL implementation removed
This release removes the cwl
top-level artifact. Some nonfunctional references may remain, and will be addressed over time.
For more information, see the Cromwell 79 release notes.
TES Improvments
-
Tes system errors are are now reported in Cromwell execution logs when the TES backend returns a task error.
-
Cromwell now attempts to translate
disks
attributes written for GCP into validdisk
attributes for TES. For information on supported conversions, refer to the TES documentation.
Bug Fixes
-
Reference disks are only mounted if configured in the workflow options.
-
Recent docker images of Ubuntu use a new manifest format, ensure that these newer image versions can be pulled from Docker Registry without issue.
-
When converting ValueStore objects to strings for logging, we truncate long values to limit memory usage.
Security Patching
Updates to dependencies to fix security vulnerabilities.
84
84 Release Notes
CromIAM enabled user checks
For Cromwell instances utilizing the optional CromIAM identity and access management component, the following endpoints now verify that the calling user is enabled before forwarding the request.
/api/workflows/v1/backends
/api/womtool/v1/describe
This change makes the above endpoints consistent with the existing behavior of all the other endpoints in the /api/
path of CromIAM.
83
83 Release Notes
- Changes the type of several primary key columns in call caching tables from int to bigint. The database migration may be lengthy if your database contains a large amount of call caching data.
82
82 Release Notes
- Restored missing example configuration file
- Upgraded to latest version of the Google Cloud Storage NIO library (0.124.8)
- Cromwell will now finitely retry the following Google Cloud Storage I/O error.
- Response code
400
bad request, messageUser project specified in the request is invalid
- The default retry count is
5
and may be customized withsystem.io.number-of-attempts
.
- Response code
81
81 Release Notes
Workflow labels in TES tasks
Beginning in Cromwell 81 we will populate the tags
field of tasks created by the TES backend
with the labels applied to the workflow at creation time. No guarantee is made about labels
added while the workflow is running.
Alibaba BCS backend and OSS filesystem removed
The BCS backend and OSS filesystem (both of which support Alibaba Cloud) have been removed.
80
80 Release Notes
Direct WES support in Cromwell
Cromwell 80 no longer supports the wes2cromwell project within the Cromwell repository.
In the previous release, 3 Wes2Cromwell endpoints in the Cromwell project were implemented and documented in the Swagger API. Three new endpoints,
located within the wes2cromwell project, will also be moved, implemented, and documented within Cromwell. As a result of this, we can safely remove
and deprecate the wes2cromwell project from the repo.
Previous endpoints:
HTTP verb | Endpoint path | Description |
---|---|---|
GET | /api/ga4gh/wes/v1/service-info | Server info |
POST | /api/ga4gh/wes/v1/runs/{run_id}/cancel | Abort workflow |
GET | /api/ga4gh/wes/v1/runs/{run_id}/status | Workflow status |
Newly implemented endpoints:
HTTP verb | Endpoint path | Description |
---|---|---|
GET | /api/ga4gh/wes/v1/runs | List workflows |
POST | /api/ga4gh/wes/v1/runs | Submit workflow |
GET | /api/ga4gh/wes/v1/runs/{run_id} | Workflow details |
79
79 Release Notes
Last release with CWL support
Cromwell 79 is the last release with CWL. Support will be removed in Cromwell 80 and above.
CWL will be re-introduced at a later date in the Terra platform, using a solution other than Cromwell. See the blog post "Terra’s roadmap to supporting more workflow languages" for details.
Product | Language | Support |
---|---|---|
Cromwell standalone | WDL | ✅ |
Cromwell standalone | CWL | ❌ |
Terra SaaS platform | WDL | ✅ |
Terra SaaS platform | CWL | Future support planned |
Last release with Alibaba Cloud
The BCS backend and OSS filesystem (both of which support Alibaba Cloud) will be removed in version 80.
WES endpoints preview
As a means to stay on top of endpoints within our repo, 3 new Workflow Execution Service (WES) endpoints are now documented in the Cromwell Swagger (others to follow as part of later work):
HTTP verb | Endpoint path | Description |
---|---|---|
GET | /api/ga4gh/wes/v1/service-info | Server info |
POST | /api/ga4gh/wes/v1/runs/{run_id}/cancel | Abort workflow |
GET | /api/ga4gh/wes/v1/runs/{run_id}/status | Workflow status |
Scala 2.13
Cromwell is now built with Scala version 2.13. This change should not be noticeable to users but may be of interest to developers of Cromwell backend implementations.
Bug Fixes
- Fixed a call caching bug in which an invalid cache entry could cause a valid cache entry to be ignored.