-
Notifications
You must be signed in to change notification settings - Fork 141
Azureml train pipeline #186
Changes from all commits
897c4bd
197b599
b7e62fc
cddb065
71fb2b2
e2d8edb
36d0a89
2d118fc
4bfdf51
5d188c5
6855b40
705a86d
8e0c2d7
2a462c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
git+https://github.com/olgaliak/seismic-deeplearning.git@azureml-train-pipeline#egg=cv_lib&subdirectory=cv_lib | ||
git+https://github.com/microsoft/seismic-deeplearning.git#egg=deepseismic-interpretation&subdirectory=interpretation | ||
opencv-python==4.1.2.30 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it possible to use Pillow for your work instead of openCV? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we are only using OpenCV for the boarder_contant feature to pad images. This is in the train.py that we are leveraging and in other areas of the code. Are you recommending to substitute this for similar functionality in Pillow? |
||
numpy>=1.17.0 | ||
torch==1.4.0 | ||
pytorch-ignite==0.3.0.dev20191105 # pre-release until stable available | ||
fire==0.2.1 | ||
albumentations==0.4.3 | ||
toolz==0.10.0 | ||
segyio==1.8.8 | ||
scipy==1.1.0 | ||
gitpython==3.0.5 | ||
yacs==0.1.6 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,175 @@ | ||
# Integrating with AzureML | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you please also add some verbiage in the main README and point to this file? AML team would love this! |
||
|
||
## Running a Pipeline in AzureML | ||
Set the following environment variables: | ||
``` | ||
BLOB_ACCOUNT_NAME | ||
BLOB_CONTAINER_NAME | ||
BLOB_ACCOUNT_KEY | ||
BLOB_SUB_ID | ||
AML_COMPUTE_CLUSTER_NAME | ||
AML_COMPUTE_CLUSTER_MIN_NODES | ||
AML_COMPUTE_CLUSTER_MAX_NODES | ||
AML_COMPUTE_CLUSTER_SKU | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please use https://pypi.org/project/python-dotenv/ in the code and then ask the user to create a .env file where these will reside. We don't want to set these in the notebook There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does use python-dotenv to grab the variables. This is just the readme instructing them to set those variables in any way they choose (.env file for vscode is mentioned) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @annazietlow The DeepSeismic team has requested the PR to go to a contrib branch so I've closed this one and we are continuing the conversations on that PR #195 |
||
On Windows you can use: | ||
`set VARIABLE=value` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't support Windows :-p please feel free to get rid of this. |
||
On Linux: | ||
`export VARIABLE=value` | ||
These can be set automatically in VSCode in an .env file, run with `source .env` in Linux or made into a `.bat` file to easily run from the command line in Windows. You can ask a team member for a .env file configured for our development environment to save time. | ||
|
||
Create a .azureml/config.json file in the project's root directory that looks like so: | ||
```json | ||
{ | ||
"subscription_id": "<subscription id>", | ||
"resource_group": "<resource group>", | ||
"workspace_name": "<workspace name>" | ||
} | ||
|
||
``` | ||
|
||
## Training Pipeline | ||
Here's an example of a possible config file: | ||
```json | ||
{ | ||
"step1": | ||
{ | ||
"type": "PythonScriptStep", | ||
"name": "process all files step", | ||
"script": "process_all_files.py", | ||
"input_datareference_path": "", | ||
"input_datareference_name": "raw_input_data", | ||
"input_dataset_name": "raw_input_data", | ||
"source_directory": "src/first_preprocess/", | ||
"arguments": ["--remote_run", | ||
"--input_path", "input/", | ||
"--output_path", "normalized_data"], | ||
"requirements": "src/first_preprocess/preprocess_requirements.txt", | ||
"node_count": 1, | ||
"processes_per_node": 1 | ||
}, | ||
"step2": | ||
{ | ||
"type": "PythonScriptStep", | ||
"name": "prepare files step", | ||
"script": "prepare_files.py", | ||
"input_datareference_path": "normalized_data/", | ||
"input_datareference_name": "normalized_data_conditioned", | ||
"input_dataset_name": "normalizeddataconditioned", | ||
"source_directory": "src/second_preprocess", | ||
"arguments": ["split_train_val", | ||
"patch", | ||
"--label_file", "label.npy", | ||
"--output_dir", "splits/", | ||
"--stride=25", | ||
"--patch=100.", | ||
"--log_config", "configs/logging.conf"], | ||
"requirements": "src/second_preprocess/prepare_files_requirements.txt", | ||
"node_count": 1, | ||
"processes_per_node": 1, | ||
"base_image": "pytorch/pytorch" | ||
}, | ||
"step3": | ||
{ | ||
"type": "MpiStep", | ||
"name": "train step", | ||
"script": "train.py", | ||
"input_datareference_path": "normalized_data/", | ||
"input_datareference_name": "normalized_data_conditioned", | ||
"input_dataset_name": "normalizeddataconditioned", | ||
"source_directory": "train/", | ||
"arguments": ["--splits", "splits", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should the example mention how the input\output params could be be passed in? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It adds more detail to the params around line 111. Do you mean more than that? |
||
"--train_data_paths", "normalized_data/file.npy", | ||
"--label_paths", "label.npy"], | ||
"requirements": "train/requirements.txt", | ||
"node_count": 1, | ||
"processes_per_node": 1, | ||
"base_image": "pytorch/pytorch" | ||
} | ||
} | ||
``` | ||
|
||
If you want to create a train pipeline: | ||
1) All of your steps are isolated | ||
- Your scripts will need to conform to the interface you define in the config file | ||
- I.e., if step1 is expected to output X and step 2 is expecting X as an input, your scripts need to reflect that | ||
- If one of your steps has pip package dependencies, make sure it's specified in a requirements.txt file | ||
- If your script has local dependencies (i.e., is importing from another script) make sure that all dependencies fall underneath the source_directory | ||
2) You have configured your config file to specify the steps needed (see the section below "Configuring a Pipeline" for guidance) | ||
|
||
Note: the following arguments are automatically added to any script steps by AzureML: | ||
```--input_data``` and ```--output``` (if output is specified in the pipeline_config.json) | ||
Make sure to add these arguments in your scripts like so: | ||
```python | ||
parser.add_argument('--input_data', type=str, help='path to preprocessed data') | ||
parser.add_argument('--output', type=str, help='output from training') | ||
``` | ||
```input_data``` is the absolute path to the input_datareference_path on the blob you specified. | ||
|
||
# Configuring a Pipeline | ||
|
||
## Train Pipeline | ||
Define parameters for the run in a config file. See an example [here](../pipeline_config.json) | ||
```json | ||
{ | ||
"step1": | ||
{ | ||
"type": "<type of step. Supported types include PythonScriptStep and MpiStep>", | ||
"name": "<name in AzureML for this step>", | ||
"script": "<path to script for this step>", | ||
"output": "<name of the output in AzureML for this step - optional>", | ||
"input_datareference_path": "<path on the data reference for the input data - optional>", | ||
"input_datareference_name": "<name of the data reference in AzureML where the input data lives - optional>", | ||
"input_dataset_name": "<name of the datastore in AzureML - optional>", | ||
"source_directory": "<source directory containing the files for this step>", | ||
"arguments": "<arguments to pass to the script - optional>", | ||
"requirements": "<path to the requirements.txt file for the step - optional>", | ||
"node_count": "<number of nodes to run the script on - optional>", | ||
"processes_per_node": "<number of processes to run on each node - optional>", | ||
"base_image": "<name of an image registered on dockerhub that you want to use as your base image" | ||
}, | ||
|
||
"step2": | ||
{ | ||
. | ||
. | ||
. | ||
} | ||
} | ||
``` | ||
|
||
## Kicking off a Pipeline | ||
In order to kick off a pipeline, you will need to use the AzureCLI to login to the subscription where your workspace resides: | ||
```bash | ||
az login | ||
az account set -s <subscription id> | ||
``` | ||
Kick off the training pipeline defined in your config via your python environment of choice. The code will look like this: | ||
```python | ||
from src.azml.train_pipeline.train_pipeline import TrainPipeline | ||
|
||
orchestrator = TrainPipeline("<path to your config file>") | ||
orchestrator.construct_pipeline() | ||
run = orchestrator.run_pipeline(experiment_name="DEV-train-pipeline") | ||
``` | ||
See an example in [dev/kickoff_train_pipeline.py](dev/kickoff_train_pipeline.py) | ||
|
||
If this fails due to access to the Azure ML subscription, you may be able to connect by using a workaround: | ||
Go to [base_pipeline.py](../base_pipeline.py) and add the following import: | ||
```python | ||
from azureml.core.authentication import AzureCliAuthentication | ||
``` | ||
Then find the code where we connect to the workspace which looks like this: | ||
```python | ||
self.ws = Workspace.from_config(path=ws_config) | ||
``` | ||
and replace it with this: | ||
```python | ||
cli_auth = AzureCliAuthentication() | ||
self.ws = Workspace(subscription_id=<subscription id>, resource_group=<resource group>, workspace_name=<workspace name>, auth=cli_auth) | ||
``` | ||
to get this to run, you will also need to `pip install azure-cli-core` | ||
Then you can go back and follow the instructions above, including az login and setting the subscription, and kick off the pipeline. | ||
|
||
## Cancelling a Pipeline Run | ||
If you kicked off a pipeline and want to cancel it, run the [cancel_run.py](dev/cancel_run.py) script with the corresponding run_id and step_id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please pull from /microsoft. Also is there a way to specify master branch here? We're on staging by default which might break one of these days
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately doing that would require the init.py file that I added so this wouldn't work until that file is added to /microsoft and /master (there is a way to specify the master branch). I have a note about changing it to microsoft/staging before the merge in the PR comment. Just leaving for now so it can be tested