You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.
in commands forms in job-submission
e.e. spell mistakes
e.g. add underlines to those un-created / un-mounted environment variables, path and file names, etc.
2.4. Connect to job container by one-click
in job details, add a button named ssh, clicking on which open a xterm page that would ssh login
in command line, opai job connect would support ssh (now only support connecting to jupyter server)
login-once experience, help users to handle private key
3. At some point
3.1. Documentations
Editor selected recommendation based on job profiling
e.g. (monthly) newsletters
data pipeline
e.g. wget with making directory first
check multiple data readiness at one time
4.2. Billing
4.3. Scalable deployment on Azure (AKS?)
5. Archive
5.1. Unified storage api
In pure K8s version, the storage is supposed to be diverse and taken over by admin (e.g. user may not access the authentication info of a team wise storage, the runtime will mount it in background in the job container)
Question - how to access the storage from local machine?
Solution 1 - mount nfs / samba in windows
cons: how to be used by 3rd-party tools like nni
Solution 2 - SDK provides api wrapping for every storage type
cons: fragmentation, maybe need to leverage file system library pyfilesystem
cons: some storage cannot be done because of lack of authentication
Solution 3 - jump box job
a type of consistent, long-running, low resource usage job
data transferring service based on ssh or REST API
cons: every data access operation requires checking or relaunching the jump box job
The text was updated successfully, but these errors were encountered:
1. Major features in near iterations
1.1. A complete deployment tool (script)
1.2. Marketplace
1.3. Favorite job list (star a job)
database to store the favorite job list, (refer to group list)etcd
1.4. User Expression
key-value
pairs in the cluster<% expression.<key> %>
in the protocolyaml
, and the rest will replace it with stored value during submission1.5. setup local environment for
OpenPAI
user could copy and execute these commands in the command prompt
Above
pai add-cluster
command will try to connect the cluster, and query the necessary information back (e.g. team wise storage, virtual clusters)User could access the storages from a unified interface like
user get a list of accessible storages from REST api
will support necessary file-level opeartion in
pyfilesystem
, such aslistdir
,makedir(s)
,copy
,delete
may require user to manually enable
nfs
clinet (mount
command in windows/linux/mac)2. Minor features in near iterations
2.1. Behavior of REST api to access job config
2.2. job-submission accept
yaml
content2.3. Job editing experience improvement (during submission)
Customized tips by per user's job
e.g. if user use
hdfs
(frequently), recommend storage-pluginIntelligence syntax and semantic checker commands parser
e.e. spell mistakes
e.g. add underlines to those un-created / un-mounted environment variables, path and file names, etc.
2.4. Connect to job container by one-click
ssh
, clicking on which open a xterm page that would ssh loginopai job connect
would support ssh (now only support connecting tojupyter
server)3. At some point
3.1. Documentations
Editor selected recommendation based on job profiling
e.g. (monthly) newsletters
one page cheat sheet similar to this k8s cheat sheet
3.2. job profiling
job history backup
daily (weekly) capture jobs from existing clusters and save as
csv
filesuser behavior analysis
e.g. git / wget / curl / pip install ... dockerimage / hdfs /
3.3. debug
codes for algorithms debugged locally environment related codes debugged in cpu-container(remote debug plugin)4. Depends on others or in a far future
4.1. diagnostics support
stdout/stderr analysis
Runtime would get lots explicit error info
extract known error patterns and provide friendly message to users
data pipeline
e.g. wget with making directory first
check multiple data readiness at one time
4.2. Billing
4.3. Scalable deployment on Azure (AKS?)
5. Archive
5.1. Unified storage api
The text was updated successfully, but these errors were encountered: