Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement workflow evaluation. #292

Merged
merged 6 commits into from
Jan 15, 2025

Conversation

peterhuene
Copy link
Collaborator

@peterhuene peterhuene commented Jan 14, 2025

This PR implements workflow evaluation in wdl-engine and in the wdl run command.

Workflow evaluation works by taking the workflow evaluation graph and "splitting" the graph into subgraphs at each entry into a conditional and scatter statement.

For conditional statements, their subgraphs are only evaluated if the conditional expression evaluates to true.

For scatter statements, their subgraphs are evaluated for each element in the scatter array.

Workflow evaluation is currently single-threaded. Progress is made on the evaluation graph as much as possible until no more progress can be made, at which point evaluation will wait for outstanding tasks to complete.

Currently local task execution is limited to the amount of available parallelism supported on the host system; in the future, we'll want that to be configurable.

Before submitting this PR, please make sure:

  • You have added a few sentences describing the PR here.
  • You have added yourself or the appropriate individual as the assignee.
  • You have added at least one relevant code reviewer to the PR.
  • Your code builds clean without any errors or warnings.
  • You have added tests (when appropriate).
  • You have updated the README or other documentation to account for these
    changes (when appropriate).
  • You have added an entry to the relevant CHANGELOG.md (see
    "keep a changelog" for more information).
  • Your commit messages follow the conventional commit style.

This commit implements workflow evaluation in `wdl-engine` and in the `wdl run`
command.

Workflow evaluation works by taking the workflow evaluation graph and
"splitting" the graph into subgraphs at each entry into a conditional and
scatter statement.

For conditional statements, their subgraphs are only evaluated if the
conditional expression evaluates to true.

For scatter statements, their subgraphs are evaluated for each element in the
scatter array.

Workflow evaluation is currently single-threaded. Progress is made on the
evaluation graph as much as possible until no more progress can be made, at
which point evaluation will wait for outstanding tasks to complete.

Currently local task execution is limited to the amount of available
parallelism supported on the host system; in the future, we'll want that to be
configurable.
@peterhuene
Copy link
Collaborator Author

The test error is:

error: call to function `read_int` failed: file `calls/count_lines/stdout` does not contain an integer value on a single line
   ┌─ tests/workflows/task-outputs/source.wdl:27:22
   │
27 │     Int line_count = read_int(stdout())
   │                      ^^^^^^^^

Looking into it (probably yet another line ending difference issue).

@peterhuene
Copy link
Collaborator Author

Not a bug; the WDL test source isn't quoting the path in the task's command, causing the backslashes in the path on Windows to be interpreted as escape sequences.

The task was failing in a piped command (also wasn't setting pipefail), which is why the task succeeded, but the read_int call failed since the output file was empty.

I'll push up a fix momentarily.

The fix is to properly quote the file that is being line counted in the task so
that the backslashes in the path aren't misinterpreted.
Copy link
Member

@claymcleod claymcleod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

peterhuene and others added 4 commits January 14, 2025 14:41
Co-authored-by: Clay McLeod <3411613+claymcleod@users.noreply.github.com>
Co-authored-by: Andrew Frantz <andrew.frantz@stjude.org>
# The expression in this string results in an error (calling `select_first` on an array
# containing no non-`None` values) and so the placeholder evaluates to the empty string and
# `s` evalutes to: "Foo is "
String s = "Foo is ~{select_first([foo])}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expect this to fail at runtime, but because the failure is nested inside a placeholder it instead just creates an empty string? That's an interesting side effect that could lead to hard to find bugs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right, if a failure occurs due to an evaluation of a None, the placeholder doesn't fail but evaluates to empty.

If an expression within a placeholder evaluates to None, and either causes the entire placeholder to evaluate to None or causes an error, then the placeholder is replaced by the empty string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was aware of the of the first part but not the error clause. That seems problematic to me.

@peterhuene peterhuene merged commit 120ecc0 into stjude-rust-labs:main Jan 15, 2025
16 checks passed
@peterhuene peterhuene deleted the workflow-eval branch January 15, 2025 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants