Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: requirements and hints #595

Merged
merged 20 commits into from
Feb 6, 2024
Merged

feat: requirements and hints #595

merged 20 commits into from
Feb 6, 2024

Conversation

jdidion
Copy link
Collaborator

@jdidion jdidion commented Dec 11, 2023

This PR adds the new requirements and hints sections, and deprecates runtime.

There are a few additions here that were not in the original spec for this feature but which I think are needed to make it complete:

  • Added the fpga requirement - similar to gpu, it is a Boolean that specifies whether the task requires a FPGA. The popularity of DRAGEN, and the increasing number of other FPGA accelerated bioinformatics tools, make this essential.
  • Added the gpu and fpga hints to enable more detailed requests for hardware accelerators (e.g., number and type).
  • Added the disks hint to enable more detailed requests for disk volumes (e.g., class and permissions).
  • Stipulated compute environment-specific hints can override task-specific hints
  • Deprecated the previously allowed behavior implied by setting allowNestedInputs: true where required task/subworkflow inputs could be left unsatisfied. Now all inputs either need to have a default value or have their value specified in the call inputs. Only optional task/subworkflow inputs that are not explicitly set in the call inputs may have their value set at runtime.

Checklist

  • Pull request details were added to CHANGELOG.md

Closes issues: #540, #541 #543

Copy link
Contributor

@rhpvorderman rhpvorderman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent write-up. As a general discussion point, I strongly disagree with not allowing expressions in the hints section.

The SLURM cluster for instance requires jobs to be given a maximum runtime. This can be dynamically derived from the filesize when it is an expression. Currently we use time_minutes in the runtime section. Under the new scheme this would classify as a hint and thus expressions cannot be used.
Another solution is to add time to the requirements section, but I think this strategy is untenable. There are quite a lot of execution backends and as a consequence there will be unforeseen requirements that need to be solved in the hints section. Therefore allowing expressions in the hints section is preferable, so hints can be tuned to the input data. Since I work with widely disparate filesizes, that would make my life a lot easier. Furthermore, the machinery for expression evaluation is already present in the execution engines, so why not use it?

@jdidion
Copy link
Collaborator Author

jdidion commented Dec 11, 2023

@rhpvorderman I agree with you. I don't really remember the argument for hints being like meta rather than runtime but from an implementation perspective I don't feel like it would be adding that much additional complexity, and as you say it would make hints much more useful.

It would be great if others can weigh in on this PR, especially @mlin and someone from the Cromwell/Terra team (@cjllanwarne?). Hopefully we can discuss at the next meeting if there is disagreement.

@jdidion jdidion self-assigned this Dec 17, 2023
@jdidion
Copy link
Collaborator Author

jdidion commented Dec 20, 2023

Two challenges with allowing expressions in the hints section:

  • Unlike requirements, some hints are expected to have object values. The inputs and outputs hints are the best example. If we are deprecating the Object type, then how does the user actually create these? Maybe we allow object literals only within hints?
  • Hints can be specified at the workflow level. Do we allow expressions there? Are expressions evaluated in the context of the workflow, or the task that inherits them?

@jdidion
Copy link
Collaborator Author

jdidion commented Dec 21, 2023

I addressed this by introducing a (yet another) new concept: scoped types. These are similar to hidden types in that they may only be declared by the execution engine, but different in that a) they are only declared in a specific scope and b) they may be instantiated by the user.

I introduced three types scoped to the hints section: input, output, and hints. The use of reserved words for naming is intentional, so that they don't conflict with existing user-defined types.

Copy link
Contributor

@markjschreiber markjschreiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Might be some places where we can be more explicit that I have mentioned in comments.

Copy link
Contributor

@markjschreiber markjschreiber left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request for minor rewording around the maxCPu andMaxMemory hints.

@jdidion jdidion force-pushed the 540/requirements-and-hints branch from 966983a to 74b1ca0 Compare February 3, 2024 00:56
@jdidion jdidion merged commit 735eb59 into wdl-1.2 Feb 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants