-
Notifications
You must be signed in to change notification settings - Fork 549
A proposal of PAI protocol (Preview) #2007
Comments
Adding related issue: #1575 |
Comment inline: protocolVersion: String, required # Protocol version, current version is 2
name: String, required
type: String, required # The type of the component. Must be one of the following: job, data, script, dockerimage, or output
version: String, optional # Component version. Default is latest Can we use contributor: String, optional
description: String, optional Maybe a prerequisites: # Optional
- protocolVersion: String, optional # If omitted, follow the protocolVersion in root
name: String, required
type: String, required # Component type. Must be one of the following: data, script, dockerimage, or output. Prerequisites.type cannot be "job"
version: String, optional # Component version. Default is latest
contributor: String, optional
description: String, optional In my opinion, auth: Object, optional # Only available when the type is dockerimage
username: String, optional
password: String, optional Should we avoid to use password in plain text? We can use its credential ("username:password" encoded in base64) instead of registryuri: String, optional
uri: String or list, required # Only when the type is data can the uri be a list
parameters: # Optional, can be omitted
<param1>: value1Type # Specify name and value of all the referencable parameters that will be used in the whole job template. They can be referenced by $$paramName$$.
<param2>: value2Type
jobRetryCount: Integer, optional # Default is 0
taskRoles:
- protocol_version: String, optional # Protocol version, default is 2
name: String, required # Name of the taskRole
instances: Integer, optional # Default is 1, instances of a taskRole
completion:
minFailedInstances: Integer or null, optional # Default 1
minSucceededInstances: Integer or null, optional # Default null
taskRetryCount: Integer, optional # Default is 0
dockerImage: String, required # Should reference to a dockerimage defined in prerequisites.
data: Object, optional # Default is None
output: Object, optional # Default is None
script: Object, optional # Default is None
In distributed tensorflow example, we can simply use extraContainerOptions:
shmMB: Integer, optional # config the /dev/shm in a docker container, https://docs.docker.com/compose/compose-file/#shm_size
resourcePerInstance:
cpu: Integer, required
memoryMB: Integer, required
gpu: Integer, required
ports:
<portLabel1>: Integer, optional, default is 0 # Only for host network
commands:
- String, required
# to handle that a component may interact with different component differently, user is encouraged to place the codes handling such difference in the "deployments" field.
# e.g., a job may get input data through wget, hdfc -dfs cp, copy, or just directly read from remote storage. This logic can be placed here.
# in summary, the deployments field is responsible to make sure the job to run properly in a deployment specific runtime environment.
# one could have many deployments, but only the first deployment can be activated at runtime. User can choose the deployment at job submission time.
deployments:
- protocolVersion: String, optional # If omitted, follow the root protocolVersion
name: String, required
taskRoles:
- name: String, required # Should be the same as taskRoles.name
preCommands:
- String, required # execute before $$commands$$
postCommands:
- String, required # execute after $$commands$$ The attachments: # optional, cluster specific parameters
- protocolVersion: String, optional
virtualCluster: String, optional |
@abuccts , thanks for the comments. Can you submit a new version to the branch "pai-proto" for reference? |
Should I edit the existing |
How about you submit a PR that merges to the pai-proto branch? |
a related PR. #2141 |
pai-protocol.yaml
The text was updated successfully, but these errors were encountered: