Use the experiment
command to submit and manage experiments. This main command also includes the following subcommands:
- submit Subcommand
- list Subcommand
- cancel Subcommand
- view Subcommand
- logs Subcommand
- interact Subcommand
Use the submit
subcommand to submit training jobs. Use this command to submit single and multi-node training jobs (by passing –t parameter with a name of a multi-node pack), and many jobs at once (by passing –pr/-ps parameters). If -n/--name
parameter is not passed, experiment name will be automatically generated, by deriving a new, unique name from training script's name.
nctl experiment submit [options] SCRIPT-LOCATION [-- script-parameters]
Name | Required | Description |
---|---|---|
SCRIPT-LOCATION |
Yes | Location and name of a Python script with a description of training. |
script-parameters |
No | String with a list of parameters that are passed to a training script. All such parameters should be added at the end of command after "--" string. |
Name | Required | Description |
---|---|---|
-sfl, --script-folder-location [folder_name] PATH |
No | Location and name of a folder with additional files used by a script, for example: other .py files,data, and so on. If not given, then its content will not be copied into the Docker image created by the nctl submit command. nctl copies all content, preserving its structure, including subfolder(s). |
-t, --template [template_name] TEXT |
No | Name of a template that will be used by nctl to create a description of a job to be submitted. If not given, a default template for single node TensorFlow training is used (tf-training). List of available templates can be obtained by issuing the nctl template list command. |
-n, --name TEXT |
No | Name for this experiment. |
-p, --pack-param <TEXT TEXT>… |
No | Additional pack parameter in format: key value or key.subkey.subkey2 value . For lists use: 'key "['val1', 'val2']"' For maps use: 'key "{'a': 'b'}"' |
-pr, --parameter-range TEXT... [definition] <TEXT TEXT>... |
No | If the parameter is given, nctl starts as many experiments as there is a combination of parameters passed in -pr options Optional[param_name] is a name of a parameter that is passed to a training script. [definition] Contains values of this parameter that are passed to different instance of experiments. [definition] can have two forms: range: {x...y:step} This form says that nctl will launch a number of experiments equal to a number of values between x and y (including both values) with step step . set of values: {x, y, z} This form says that nctl will launch number of experiments equal to a number of values given in this definition. |
-ps, --parameter-set [definition] TEXT |
No | If this parameter is given, nctl launches an experiment with a set of parameters that will be passed to experiment's script. Format of the [definition] argument is as follows: {[param1_name]: [param1_value], [param2_name]: [param2_value], ..., [paramn_name]:[paramn_value]} . All parameters given in the [definition] argument will be passed to a training script under their names stated in this argument. If ps parameter is given more than once, nctl will start as many experiments as there is occurrences of this parameter in a call. |
-e, --env TEXT |
No | This is the environment variable passed to training. You can pass as many environmental variables, as desired. Each variable should be passed as a separate -e parameter. |
-r, --requirements PATH |
No | This is the path to the file with experiment's pip requirements. Dependencies listed in this file will be automatically installed using pip. |
-f, --force |
No | Force command execution by ignoring (most) confirmation prompts. |
-v, --verbose |
No | Set verbosity level: -v for INFO -vv for DEBUG |
-h, --help |
No | Displays help messaging information. |
For both types of parameters: -ps
and -pr
; if, parameter stated in their definitions
is also given in a [script_parameters]
argument of the nctl
command, then values taken from -ps
and -pr
are passed to a script.
If a combination of both parameters is given, then nctl
launches a number of experiments
equal to combination of values passed in those parameters. For example, if the following
combination of parameters is passed to nctl
command:
-pr param1 "{0.1, 0.2, 0.3}" -ps "{param2: 3, param4: 5}" -ps "{param6: 7}"
Then the following experiments will be launched:
param1 = 0.1, param2 = 3, param4 = 5, param6 - not set
param1 = 0.2, param2 = 3, param4 = 5, param6 - not set
param1 = 0.3, param2 = 3, param4 = 5, param6 - not set`
param1 = 0.1, param2 = not set, param4 = not set, param6 - 7
param1 = 0.2, param2 = not set, param4 = not set, param6 - 7
param1 = 0.3, param2 = not set, param4 = not set, param6 - 7
This command returns a list of submitted experiments with their names and statuses. In case of problems during submission, the command displays message/messages describing the causes. Errors may cause some experiments to not be created and will be empty. If any error appears, then messages describing it are displayed with experiment's names/statuses.
If one or more of experiment has not been submitted successfully, then the command returns an exit code: > 0
. The exact value of the code depends on the cause of error(s) that prevented submitting the experiment(s).
nctl experiment submit --name para-range --parameter-range lr "{0.1, 0.2, 0.3}" examples/mnist_single_node.py -- --data_dir=/mnt/input/root/public/MNIST
Starts multiple single node training jobs using mnist_single_node.py
script located in the examples folder.
Each training job uses a different learning rate value.
Use the list
subcommand to display a list of all experiments with some basic information for each, regardless of the owner. Results are sorted using the date-of-creation of the experiment, starting with the most recent experiment.
nctl experiment list [options]
Name | Required | Description |
---|---|---|
-a, --all_users |
No | List contains experiments submitted by all users. |
-n, --name TEXT |
No | A regular expression to filter list to experiments that match this expression. |
-s, --status |
No | QUEUED, RUNNING, COMPLETE, CANCELLED, FAILED, CREATING - Lists experiments based on indicated status. |
-u, --uninitialized |
No | List uninitialized experiments, that is, experiments without resources submitted for creation. |
-c, --count INTEGER RANGE |
No | An integer, command displays c last rows. |
-b, --brief |
No | Print short version of the result table. Only 'name', 'submission date', 'owner' and 'state' columns will be printed. |
-f, --force |
No | Force command execution by ignoring (most) confirmation prompts. |
-v, --verbose |
No | Set verbosity level: -v for INFO, -vv for DEBUG |
-h, --help |
No | Displays help messaging information. |
Displays the list of experiments matching a criteria given in the command's options. Each row contains the experiment name and additional data of each experiment, such parameters used for this certain training, time and date when it was submitted, name of a user which submitted this training and current status of an experiment. The example table below shows the results returned by this command (the brief option is shown).
| Experiment | Submission date | Owner | State |
|--------------+----------------------+---------+----------------------------------|
| mnist-single-node-tb | 2019-03-13 04:57:58 PM | user1 | QUEUED |
| mnist-tb | 2019-03-13 05:00:39 PM | user1 | COMPLETE |
| mnist-tb 2-1 | 2019-03-13 05:49:59 PM | user1 | COMPLETE |
| test-experiment | 2019-03-13 06:00:39 PM | user1 | QUEUED |
| single-experiment | 2019-03-13 01:49:59 PM | user1 | QUEUED |
The following command displays all experiments submitted by a current user.
nctl experiment list
The following command displays all experiments submitted by a current user and with name starting with train
word.
nctl experiment list -n train
Use the cancel
subcommand to cancel any training chosen based on the provided parameters.
nctl experiment cancel [options] NAME
Name | Required | Description |
---|---|---|
NAME |
Yes | The name of an experiment/pod/status of a pod to be cancelled. If any such object is found, the command displays a question whether this object should be cancelled. |
Name | Required | Description |
---|---|---|
-m, --match TEXT |
No | If given, the command searches for experiments matching the value of this option. This option cannot be used along with the NAME argument. |
-p, --purge |
No | If given, then all information concerning for identified experiments, completed and currently running, is removed from the system. |
-i, --pod-ids TEXT |
No | Comma-separated pods IDs. If given, command matches pods by their IDs and deletes them. |
-s, --pod-status TEXT |
No | One of: 'PENDING', 'RUNNING', 'SUCCEEDED', 'FAILED', or 'UNKNOWN'. If given, the command searches pods by their status and deletes them. |
-f, --force |
No | Force command execution by ignoring (most) confirmation prompts. |
-v, --verbose |
No | Set verbosity level: -v for INFO, -vv for DEBUG |
-h, --help |
No | Displays help messaging information. |
The description of a problem; if, any problem occurs. Otherwise, displays the information that training job/jobs was/were cancelled successfully.
nctl experiment cancel t20180423121021851
This cancels the experiment with t20180423121021851
name, as shown in the example.
Use the view
subcommand to display basic details of an experiment, such as the name of an experiment, parameters, submission date, and so on.
nctl experiment view [options] EXPERIMENT-NAME
Name | Required | Description |
---|---|---|
EXPERIMENT-NAME |
Yes | Name of an experiment to be displayed. |
Name | Required | Description |
---|---|---|
-tb, --tensorboard |
No | If given, the command displays a TensorBoard with an experiment's data. |
-u, --username TEXT |
No | Name of the user who submitted this experiment. If not given, then only experiments of a current user are shown. |
-f, --force |
No | Force command execution by ignoring (most) confirmation prompts. |
-v, --verbose |
No | Set verbosity level: -v for INFO, -vv for DEBUG |
-h, --help |
No | Displays help messaging information. |
Displays details of an experiment. If -tb, --tensorboard
option is given, then the command returns a link to a TensorBoard's instance with data from an experiment.
nctl experiment view experiment-name-2 -tb
Displays details of an experiment-name-2
experiment and exposes a TensorBoard instance with experiment's data to a user.
Use the logs
subcommand to display the logs from experiments. Logs to be displayed are chosen based on parameters given in the command's call.
nctl experiment logs [options] EXPERIMENT-NAME
Name | Required | Description |
---|---|---|
EXPERIMENT-NAME |
Yes | Displays the name of experiment logs. |
Name | Required | Description |
---|---|---|
-s, --min-severity |
No | Minimal severity of logs. Available choices are: CRITICAL: Displays only CRITICAL logs. ERROR: Displays ERROR and CRITICAL logs. WARNING: Displays ERROR, CRITICAL and WARNING logs. INFO: Displays ERROR, CRITICAL, WARNING and INFO. DEBUG: - Displays ERROR, CRITICAL, WARNING, INFO and DEBUG. |
-sd, --start-date |
No | Retrieve logs produced from this date (format ISO-8061 - yyyy-mm-ddThh:mm:ss). |
-ed, --end-date |
No | retrieve logs produced until this date (format ISO-8061 - yyyy-mm-ddThh:mm:ss). |
-i, --pod-ids TEXT |
No | Comma-separated pods IDs. If given, then matches pods by their IDs and only logs from these pods from an experiment with EXPERIMENT-NAME name will be returned. |
- p, --pod-status TEXT |
No | One of: 'PENDING', 'RUNNING', 'SUCCEEDED', 'FAILED', or 'UNKNOWN' - command returns logs with matching status from an experiment and matching EXPERIMENT-NAME. |
-m, --match TEXT |
No | If given, command searches for logs from experiments matching the value of this option. This option cannot be used along with the NAME argument. |
-o, --output |
No | If given, logs are stored in a file with a name derived from a name of an experiment. |
-pa, --pager |
No | Display logs in interactive pager. Press q to exit the pager. |
-fl, --follow |
No | Specify if logs should be streamed. Only logs from a single experiment can be streamed. |
-f, --force |
No | Force command execution by ignoring (most) confirmation prompts. |
-v, --verbose |
No | Set verbosity level: -v for INFO, -vv for DEBUG |
-h, --help |
No | Displays help messaging information. |
Should issues arise, a message (or messages) with a description of their cause (or causes) displays. Otherwise, the logs are filtered based on command's parameters.
nctl experiment logs experiment-name-2 --min-severity DEBUG
Displays logs from experiment-name-2
experiment with severity DEBUG and higher (INFO
, WARNING
, and so on).
Use the interact
subcommand to launch a local browser with a Jupyter notebook. If a script's name is given as a parameter of the command, then this script is displayed in a notebook.
nctl experiment interact [options]
Name | Required | Description |
---|---|---|
-n, --name TEXT |
No | The name of a Jupyter notebook's session. If a session with a given name already exists, then you are connected to this session. |
-fl, --filename TEXT |
No | The file with a notebook that should be opened in Jupyter notebook. |
-p, --pack-param <TEXT TEXT>... |
No | Additional pack parameter in format: 'key value' or 'key.subkey.subkey2 value'. For lists use: 'key "['val1', 'val2']"' For maps use: 'key "{'a': 'b'}"' |
--no-launch |
No | Run this command without a web browser starting, only proxy tunnel is created. |
-pn, --port-number INTEGER RANGE |
No | Port on which service will be exposed locally. |
-e, --env TEXT |
No | This is the environment variable passed to Jupyter instance. User can pass as many environmental variables as it is needed. Each variable should be in such case passed as a separate -e parameter. |
-t, --template [jupyter] |
No | Name of a Jupyter notebook template used to create a deployment. Supported templates for interact command are: jupyter (python3). |
-f, --force |
No | Force command execution by ignoring (most) confirmation prompts. |
-v, --verbose |
No | Set verbosity level: -v for INFO, -vv for DEBUG |
-h, --help |
No | Displays help messaging information. |
Should issues arise, a message (or messages) with a description of their cause (or causes) displays. Otherwise, the command launches a default web browser with a Jupyter notebook, and displays the address under which this session is provided.
nctl experiment interact --filename training_script.py
Launches in a default browser a Jupyter notebook with training_script.py
script.