-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarified Runtime and Added Hints #315
Conversation
versions/development/SPEC.md
Outdated
|
||
#### Reserved Keys | ||
|
||
Although `hints` are arbitrary, there are a number of keys which are reserved by the language in or oder to try and encourage interoperability of tasks and workflows between different execution engines. The list of Reserved keys is likely to grow, and in order to prevent name space collisions, engines should follow the [best practices](#conventions-and-best-practices) for specifying hints. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo or oder
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed a few other typos.. You are eagle eyed
A few other attributes exist which change behavior (and though "runtime" is a weird description for them, they're certainly not optional):
|
* **maxCpu**: Specify the maximum CPU to be provisioned for a task. It is up to the engine, whether or not it enforces the `maxCpu` hint. The type of this hint should be the same as `runtime.cpu` | ||
* **maxMemory**: Specify the maximum memory provisioned for a task. It is up to the engine, whether or not it enforces the `maxMemory` hint. the type of this hint should be the same as `runtime.memory` | ||
* **shortTask**: Tell the execution engine that this task is not expected to take long to execute, and therefore the engine can attempt to optimize the execution in whatever way it defines. This flag can be used to tell an engine to use the cost optimized instance types that many clouds provide (but are available only for a limited time). An example of this would be `preemptible` instances on `gcp` and `spot` instances on `aws`. | ||
* **localizationOptional**: Tell the execution engine that whenever possible it does not need to localize the defined `File` type inputs for this task. Important to note, is this directive should not have any impact on the success or failure of a task (ie it should run with or without localization). The type of this hint is a boolean value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note to anyone else who almost makes the same mistake as me:
Athough this global localizationOptional
is like a less useful version of the Cromwell parameter_meta
hint, keep reading because there's a separate inputs
hints section which covers the specific-per-input version of this hint.
@cjllanwarne to the runtime section I added a Additionally, I personally find the "failOnStdErr" a bit of an odd feature even in cromwell, which is why I did not include it in the current runtime section. I can add it here of course, however I just wanted to make the point that if a task ALWAYS returns with a 200 status, there is no reason why the user could not capture the stdout and exit if the file is nonempty. |
versions/development/SPEC.md
Outdated
|
||
The container key must accept one ore more locations which inform the engine where to retrieve a container image to execute the task. It is expected (but not enforced) that each container image provided are identical, and will provide the same final results when the task is run. It is the responsibility of the individual execution engine to define the specific image sources which it supports, and to determine which image is the "best" one to use at runtime. Defining multiple images enables greater portability across a broad range of execution environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should 100% be considered invalid if different images lead to different results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the only way to guarantee that though is 1, execute the WF under different images, or 2, assert checksums of the different images (in which case each engine might not know the actual access method). I will add stronger language here though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn’t it be sufficient to just compare image SHA in the engine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinvlad the image SHA is not required to be passed in by the user, and I think would actually be to cumbersome a restriction. The best approach IMO is just making it a non enforced requirment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@patmagee sorry, I guess what I meant is more on the implementation side (so perhaps it doesn't belong to this discussion). IIRC, engine implementations like Cromwell use image SHA in cache comparisons anyway. So this would be a low-hanging fruit to just compare SHAs that are already collected for these images anyway (I'm not talking about having to specify SHA inside the image URL, but rather the native caching mechanism already retrieves it from the Docker repo, if I understand that correctly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dinvlad Keep in mind that the period for comments. At this point the PR is set in stone and we're voting on it
versions/development/SPEC.md
Outdated
|
||
#### docker | ||
#### cpu (**Required**) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm nervous about increasing the amount of boilerplate this means adding into every. single. WDL. task. definition.
Can we instead say "if a cpu
value is not specified then the task must work with exactly '1' CPU"? And something similar for memory
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that is a good suggestion, hardcoding the default in the spec. my worry though, is that people will just ignore that entirely and continue on using things like the default-runtime
attributes, and relying on things that are very cromwell or other engine specific. This is what lead to alot of the motivation for redefining the runtime block. I suppose we could state:
If a
cpu
key is not defined, then then it must be interpreted that the task will run with >exactly 1 CPU. engines should not provide a mechanism to set their own default values >which circumvents this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjllanwarne the other idea I had, was to allow for a runtime section in the 'workflow' definition. This would allow for the definition of runtime defaults which are explicitly stated and it Also provides a mechanism for documenting them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cjllanwarne on second thought, I Think I prefer enforcing an implicit definition of a default within the WDL as opposed to allowing defaults at the workflow
level. Not every task will have a workflow, therefore people will likely abuse this approach for tasks and just assunme a user will define the default value
python process.py ${arg} | ||
#### gpu (**Optional**) | ||
|
||
The `gpu` key provides a way to accommodate modern workflows which are increasingly becoming reliant on GPU computations. This key is a `true` or `false` value and simply indicates to the engine that a task requires a GPU to run to completion. A task with this flag set to `true` is guaranteed to only run if a GPU is a available within the runtime environment. It is the responsibility of the engine to check prior to execution whether a GPU is provisionable, and if not, preemptively fail the task. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe someone with more experience using GPUs can chime in:
- Is "I just need a GPU, I don't care what type" enough information? (ie is it possible to be agnostic about the number, type and driver version required?)
- Is having a GPU a genuine requirement for running such a task
- ... or is it more a case of "if there isn't a GPU the task will still succeed but will be inefficient?"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is what I am a little naive about, so I will rely on someone else to have more information. I was trying to capture the requirement for a GPU without throwing in all of the provider specific terminology in this part of the block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to provide some color here.
The GPU instance types DNAnexus supports today, on AWS, are the g2 and p3 families. The p3 is the current generation. To allow users to actually make use of these, we install an NVIDIA driver. An alternative is nvidia-docker.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@orodeh does the user need to know specifics about this? or is it enough for them to know that a GPU was added, and then allow additional config in the hints section?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my motivatin here was to abstract as much of the specific vendor terminology into the hints section, then have an engine guarantee a gpu was added. the flavor and size of the gpu would need to be either configured in the hints or the engine would need to make a resonable default value here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good start to say have a GPU flag in the runtime section, and leave the rest for hints.
|
||
The following is a basic set of guidelines that engines and authors should follow when writing hints: | ||
|
||
1. A hint should never be required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should
=> must
?
Really excited that docker has been renamed to container (while still defaulting to Dockerhub), I think this is a super valuable step towards agnosticism of containers. |
versions/development/SPEC.md
Outdated
@@ -925,6 +968,7 @@ task test { | |||
command <<< | |||
ps ~{flags} | |||
>>> | |||
..... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the sake of consistency:
..... | |
.... |
versions/development/SPEC.md
Outdated
|
||
The container key must accept one ore more locations which inform the engine where to retrieve a container image to execute the task. It is expected that each container image provided are identical, and will MUST the same final results when the task is run. It is the responsibility of the individual execution engine to define the specific image sources which it supports, and to determine which image is the "best" one to use at runtime. Defining multiple images enables greater portability across a broad range of execution environments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's one sentence here which reads a bit strangely to me:
It is expected that each container image provided are identical, and will MUST the same final results when the task is run.
Perhaps it should be:
It is expected that all container images provided are identical. They MUST produce the same final results when the task is run.
versions/development/SPEC.md
Outdated
There are several ways to specify the `disks` attribute: | ||
|
||
* `Int` - GB of disk space to request, eg: `100`, `200`. | ||
* `String` (`"<size>" || "<mount-point> <size>"`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should some system similar to the one for memory be allowed here? ie. specifying whether the size is in KBs or MBs or TBs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semantically that makes sense. A disk is a a size, and sizes can be specified in blocks. In practice however I feel like we will only ever see GB
or TB
definitions (since specifying a disk in MB seems archaic). And from that we will NORMALLY only see GB
. So I figured the simplification makes sense. Wheras with memory, people can and regularily do specify both GB and MB for the size
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, are disk specifications like "100GB
, or "3TB"
valid?
In my personal opinion, they should be legal.
versions/development/SPEC.md
Outdated
|
||
Memory requirements for this task. Two kinds of values are supported for this attributes: | ||
The `memory` key defines the _minimum_ memory required for this task which must be available prior to the engine starting execution. The engine does not need to provide the exact amount of memory requested, however it may ONLY provision more memory then requested and not less. For example, if the wdl requested `1 GB` but only blocks of `4 GB` were available, the engine might choose to provision `4.0 GB` instead. Two kinds of values are supported for this attribute: | ||
|
||
* `Int` - Interpreted as bytes | ||
* `String` - This should be a decimal value with suffixes like `B`, `KB`, `MB` or binary suffixes `KiB`, `MiB`. For example: `6.2 GB`, `5MB`, `2GiB`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there or should there not be a space between the number and the suffix? Or does it not matter?
Wow! Great PR! |
} | ||
``` | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest having a default flag of false
for this field. Otherwise, we would be a making tasks more verbose than necessary.
I think this is good. My only comment is that I like Cromwell's default runtime attributes, and they are supported by dxWDL. I would prefer to avoid adding required boilerplate to WDL tasks, if it isn't strictly necessary. |
@orodeh As a language, this is one thing that we ultimately need to resolve. the |
@patmagee. Very well, can we then make defaults for cpu, memory, and disk, so that the only thing you absolutely must have in the runtime section is the container image? |
@orodeh yes, I am open to suggestions on what those values should be. for CPU I would think that 1 is fine, disk I would so 10GB but am a bit hazy on memory |
How about 2GiB as a default number for memory. |
that sounds reasonable to me |
This very much depends on the size of the input file. If I am running on a whole-genome sample and my input fastqs are 100 GB, then I need 100GB of disk space as well for the output when using cutadapt. But If I run a test workflow with 200 kb files I don't want cromwell to crash by default because there is no 10GB available. All my test workflows run in /tmp by default (pytest-workflow's default). I do have some spare gigabytes on Why not set it at 1GiB ? We have a |
@orodeh I have added the default values with restrictions on execution engines providing their own defaults. WDYT |
To be clear, the flag and the semantics suggested for it (job < 24h
runtime) only make sense on GCP.
On AWS, spot instances can (potentially) run indefinitely, given a high
enough bid.
…On Fri, Sep 13, 2019 at 06:39 Patrick Magee ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In versions/development/SPEC.md
<#315 (comment)>:
> +task foo {
+ ...
+ hints {
+ some-key: "some-value"
+
+ }
+}
+```
+
+#### Reserved Keys
+
+Although `hints` are arbitrary, there are a number of keys which are reserved by the language in or order to try and encourage interoperability of tasks and workflows between different execution engines. The list of Reserved keys is likely to grow, and in order to prevent name space collisions, engines should follow the [best practices](#conventions-and-best-practices) for specifying hints.
+
+* **maxCpu**: Specify the maximum CPU to be provisioned for a task. It is up to the engine, whether or not it enforces the `maxCpu` hint. The type of this hint should be the same as `runtime.cpu`
+* **maxMemory**: Specify the maximum memory provisioned for a task. It is up to the engine, whether or not it enforces the `maxMemory` hint. the type of this hint should be the same as `runtime.memory`
+* **shortTask**: Tell the execution engine that this task is not expected to take long to execute, and therefore the engine can attempt to optimize the execution in whatever way it defines. This flag can be used to tell an engine to use the cost optimized instance types that many clouds provide (but are available only for a limited time). An example of this would be `preemptible` instances on `gcp` and `spot` instances on `aws`. "Short" is a bit relative, but should generally be interpreted as << 24h.
@DavyCats <https://github.com/DavyCats> part of me wanted to avoid that
on the first pass of this, mainly becuase thats a level of granularity
that, although useful, might not be available when programatically running
jobs with diverse inputs. Additionally, it is not really applicable except
in the case of Some grid/clusters. The runtime block is the "Required"
section and I really dont want to have a required minTime and maxTime for
EVERY execution on every platform since its only required in maybe 5% of
situaations. Its harder to set sensible defaults for those as well. I would
rather minTime and maxTime be part of the hints section then then the
runtime section
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#315?email_source=notifications&email_token=AAMGHK7ZSFGWEHQZ3GW2HBTQJNUXVA5CNFSM4HR5L7Z2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCEUZSSY#discussion_r324134923>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAMGHKY4PBWLNIVGKSZX4BTQJNUXVANCNFSM4HR5L7ZQ>
.
|
@@ -2647,6 +3026,39 @@ In JSON, the inputs to the workflow in the previous section might be: | |||
|
|||
It's important to note that the type in JSON must be coercible to the WDL type. For example `wf.int_val` expects an integer, but if we specified it in JSON as `"wf.int_val": "three"`, this coercion from string to integer is not valid and would result in a coercion error. See the section on [Type Coercion](#type-coercion) for more details. | |||
|
|||
## Specifying / Overriding Runtime Attributes in JSON |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 👍
This is now implemented in wdlTools |
@jdidion incredible news!! I can coordinate merging this and resolving any conflicts |
Hi all, belated discussion point here. The PR is written with the On the other hand, if we want to allow general expressions for the hints values, there's a corner-case grammar collision between map literals and the nested runtime section, e.g. if we wrote
it's grammatically ambiguous whether the right-hand side of inputs: is a nested hints section or a |
Also importing a discussion point from #406 (comment) with @patmagee @jdidion @illusional.
The concern IMO is introducing |
One solution could be that the runtime {
container: {protocol: "docker", image: "biowdl/chunked-scatter:0.1.0"}
} The last solution also allows for several fields if the need arises, but since I can not think of any use cases right now, I prefer the simpler version with the protocol. |
I like the simple solution. For better backward compatibility, if there is no protocol, assume Docker. |
(I was worried about having to escape URIs with colons in them, but apparently colons are allowed without escaping https://stackoverflow.com/questions/1737575/are-colons-allowed-in-urls#:~:text=Yes%2C%20unless%20it's%20in%20the,org%2Fwiki%2FTemplate%3AWelcome) |
I agree with @jdidion, in this case simple is better. If we allow a protocol to be defined, and by default use docker that probably captures the vast majority of use cases |
|
@mlin I agree on your point 2 - this is explicitly stated in the RFC (container is an alias, not a replacement; arbitrary keys are deprecated but still allowed) |
|
||
Since values are expressions, they can also reference variables in the task: | ||
#### container (**Required**) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jdidion this is the "Required" I was referring to
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah - I agree that it shouldn't be required in 1.1, and I think there's a good argument to not ever make it required, for example if I just want to run my workflow locally in an environment where I already have all the dependencies installed.
Here are several ideas for how to allow expressions in hints. This is a tedious issue obviously, but probably much less that it would be ex post facto!
|
@mlin I actually think #3 is the best solution. In fact, I'd be in favor of also disallowing map literals in runtime and the meta sections as well - it just makes parsing more difficult and confusing to the end user for very little benefit. My second choices would be 0 or 5, and I'm pretty opposed to 1 and 2. |
Replace development spec with updated version of 1.1 spec. It is the 1.1 spec with all deprecated items removed, and with Directory and PR #315 added in.
Merged as part of #435 |
The Problem
One of the key motivators for developing WDL has always been the dream of interopable workflows. That is Workflows that can run in any execution engine, on any platform, in the exact same way and resaonably expect the same outputs.
Recently several community members have tested this theory to see just how far we are from that target. Unfortunately, WDL did not get a passing grade as it currently is. One of the primary reasons for this is the
runtime
section within a task. We have always had the desire to rework this section, moving it away from something that is a grab bag of parameters and more into minimum execution requirements. However, this was never done, and because of that, a lot of properties have crept in there which are not part of theWDL
specification but have largely become used within the community to execute a workflow on a specific platform or infrastructure. This does not help interopability or sharing.My Solution
To me the best solution moving forward is to start to be a bit opinionated about what goes into the
runtime
section and ensure that there will ALWAYS be enough information present there to run a task. To this end, I have codified the keys supported within the runtime section, and have made certain properties required for every task (even if the engine supports runtime defaults). While some people may groan at the idea of this, since it means more lines to write, I think its important to remember what atask
is meant to do. It is essentially a definition of a wrappedcommand
and the execution environment that command is supposed to run in. Previously, we only required thecommand
part, and allowed the engine to specify defaults if it sees fit for theenvironment
. This unfortunately meant that for some tasks, there was no concrete definition of what was required to run it, immediately making the task bound to the engine and infrastructure the author intended it for. Requiring a set of minimal attributes in the runtime gets around this, without being TOO obtrusive IMO.I have additionally removed the ability to specify arbitrary KV pairs within the
runtime
block, and have introduced a new section calledhints
. There is a very important distinction between the two sections: The runtime defines what is REQUIRED to run the task (and engine that can provide those resources should be able to run the task), and hints are optimizations that an engine can CHOOSE to apply if they understand the meaning of the hint.I have opted for making hints a semi-structured section similar to how the runtime used to operate in V1.0 and earlier. There are a set of reserved keys which should carry similar meaning between engines (whether the engine chooses to adopt them is completely optional), and then additionally I have added a strong emphasis on convention and reuse of arbitrary properties.