Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Commit

Permalink
Merge branch '318' of github.com:microsoft/seismic-deeplearning into 318
Browse files Browse the repository at this point in the history
  • Loading branch information
maxkazmsft committed Jun 3, 2020
2 parents 48d4654 + 90f468e commit 4af6e41
Show file tree
Hide file tree
Showing 13 changed files with 86 additions and 57 deletions.
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,11 @@

This repository shows you how to perform seismic imaging and interpretation on Azure. It empowers geophysicists and data scientists to run seismic experiments using state-of-art DSL-based PDE solvers and segmentation algorithms on Azure.


The repository provides sample notebooks, data loaders for seismic data, utilities, and out-of-the-box ML pipelines, organized as follows:
- **sample notebooks**: these can be found in the `examples` folder - they are standard Jupyter notebooks which highlight how to use the codebase by walking the user through a set of pre-made examples
- **experiments**: the goal is to provide runnable Python scripts that train and test (score) our machine learning models in the `experiments` folder. The models themselves are swappable, meaning a single train script can be used to run a different model on the same dataset by simply swapping out the configuration file which defines the model.
- **pip installable utilities**: we provide `cv_lib` and `deepseismic_interpretation` utilities (more info below) which are used by both sample notebooks and experiments mentioned above
- **pip installable utilities**: we provide `cv_lib` and `interpretation` utilities (more info below) which are used by both sample notebooks and experiments mentioned above

DeepSeismic currently focuses on Seismic Interpretation (3D segmentation aka facies classification) with experimental code provided around Seismic Imaging in the contrib folder.

Expand All @@ -26,7 +27,7 @@ If you run into any problems, chances are your problem has already been solved i
The notebook is designed to be run in demo mode by default using a pre-trained model in under 5 minutes on any reasonable Deep Learning GPU such as nVidia K80/P40/P100/V100/TitanV.

### Azure Machine Learning
[Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/) enables you to train and deploy your machine learning models and pipelines at scale, ane leverage open-source Python frameworks, such as PyTorch, TensorFlow, and scikit-learn. If you are looking at getting started with using the code in this repository with Azure Machine Learning, refer to [Azure Machine Learning How-to](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml) to get started.
[Azure Machine Learning](https://docs.microsoft.com/en-us/azure/machine-learning/) enables you to train and deploy your machine learning models and pipelines at scale, and leverage open-source Python frameworks, such as PyTorch, TensorFlow, and scikit-learn. If you are looking at getting started with using the code in this repository with Azure Machine Learning, refer to [Azure Machine Learning How-to](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml) to get started.

## Interpretation
For seismic interpretation, the repository consists of extensible machine learning pipelines, that shows how you can leverage state-of-the-art segmentation algorithms (UNet, SEResNET, HRNet) for seismic interpretation.
Expand Down Expand Up @@ -120,9 +121,12 @@ To prepare the data for the experiments (e.g. split into train/val/test), please
cd scripts
# For patch-based experiments
python prepare_dutchf3.py split_train_val patch --data_dir=${data_dir} --label_file=train/train_labels.npy --output_dir=splits \
python prepare_dutchf3.py split_train_val patch --data_dir=${data_dir}/data --label_file=train/train_labels.npy --output_dir=splits \
--stride=50 --patch_size=100 --split_direction=both
# For section-based experiments
python prepare_dutchf3.py split_train_val section --data-dir=${data_dir}/data --label_file=train/train_labels.npy --output_dir=splits \ --split_direction=both
# go back to repo root
cd ..
```
Expand Down Expand Up @@ -229,8 +233,8 @@ This section contains benchmarks of different algorithms for seismic interpretat


#### Reproduce benchmarks
In order to reproduce the benchmarks, you will need to navigate to the [experiments](experiments) folder. In there, each of the experiments are split into different folders. To run the Netherlands F3 experiment navigate to the [dutchf3_patch/local](experiments/dutchf3_patch/local) folder. In there is a training script [([train.sh](experiments/dutchf3_patch/local/train.sh))
which will run the training for any configuration you pass in. Once you have run the training you will need to run the [test.sh](experiments/dutchf3_patch/local/test.sh) script. Make sure you specify
In order to reproduce the benchmarks, you will need to navigate to the [experiments](experiments) folder. In there, each of the experiments are split into different folders. To run the Netherlands F3 experiment navigate to the [dutchf3_patch/local](experiments/interpretation/dutchf3_patch/local) folder. In there is a training script [([train.sh](experiments/interpretation/dutchf3_patch/local/train.sh))
which will run the training for any configuration you pass in. Once you have run the training you will need to run the [test.sh](experiments/interpretation/dutchf3_patch/local/test.sh) script. Make sure you specify
the path to the best performing model from your training run, either by passing it in as an argument or altering the YACS config file.

## Contributing
Expand Down
17 changes: 13 additions & 4 deletions cv_lib/cv_lib/utils.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.

Expand All @@ -8,6 +7,7 @@
import numpy as np
from matplotlib import pyplot as plt


def normalize(array):
"""
Normalizes a segmentation mask array to be in [0,1] range
Expand All @@ -16,12 +16,19 @@ def normalize(array):
min = array.min()
return (array - min) / (array.max() - min)

def mask_to_disk(mask, fname, cmap_name="Paired"):

def mask_to_disk(mask, fname, n_classes, cmap_name="rainbow"):
"""
write segmentation mask to disk using a particular colormap
mask (float): this contains the predicted labels in the range [0, n_classes].
fname (str): of the the image to be saved
n_classes (int): total number of classes in the dataset
cmap_name (str): name of the matplotlib colormap to be used. The default "rainbow"
colormap works well for any number of classes.
"""
cmap = plt.get_cmap(cmap_name)
Image.fromarray(cmap(normalize(mask), bytes=True)).save(fname)
Image.fromarray(cmap(mask / n_classes, bytes=True)).save(fname)


def image_to_disk(mask, fname, cmap_name="seismic"):
"""
Expand All @@ -30,7 +37,8 @@ def image_to_disk(mask, fname, cmap_name="seismic"):
cmap = plt.get_cmap(cmap_name)
Image.fromarray(cmap(normalize(mask), bytes=True)).save(fname)

def decode_segmap(label_mask, colormap_name="Paired"):

def decode_segmap(label_mask, colormap_name="rainbow"):
"""
Decode segmentation class labels into a colour image
Args:
Expand All @@ -48,6 +56,7 @@ def decode_segmap(label_mask, colormap_name="Paired"):

return out


def load_log_configuration(log_config_file):
"""
Loads logging configuration from the given configuration file.
Expand Down
7 changes: 4 additions & 3 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,8 @@ RUN wget --quiet https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86
wget --quiet https://github.com/microsoft/seismic-deeplearning/archive/master.zip -O master.zip && \
unzip master.zip && rm master.zip

RUN cd seismic-deeplearning-master && \
RUN mv seismic-deeplearning-master seismic-deeplearning && \
cd seismic-deeplearning && \
conda env create -n seismic-interpretation --file environment/anaconda/local/environment.yml && \
source activate seismic-interpretation && \
python -m ipykernel install --user --name seismic-interpretation && \
Expand All @@ -34,7 +35,7 @@ RUN cd seismic-deeplearning-master && \

# TODO: add back in later when Penobscot notebook is available
# Download Penobscot dataset:
# RUN cd seismic-deeplearning-master && \
# RUN cd seismic-deeplearning && \
# data_dir="/home/username/data/penobscot" && \
# mkdir -p "$data_dir" && \
# ./scripts/download_penobscot.sh "$data_dir" && \
Expand All @@ -44,7 +45,7 @@ RUN cd seismic-deeplearning-master && \
# cd ..

# Download F3 dataset:
RUN cd seismic-deeplearning-master && \
RUN cd seismic-deeplearning && \
data_dir="/home/username/data/dutch" && \
mkdir -p "$data_dir" && \
./scripts/download_dutch_f3.sh "$data_dir" && \
Expand Down
8 changes: 4 additions & 4 deletions docker/README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
This Docker image allows the user to run the notebooks in this repository on any operating system without having to setup the environment or install anything other than the Docker engine. For instructions on how to install the Docker engine, click [here](https://www.docker.com/get-started).
This Docker image allows the user to run the notebooks in this repository on any Unix based operating system without having to setup the environment or install anything other than the Docker engine. We recommend using [Azure Data Science Virtual Machine (DSVM) for Linux (Ubuntu)](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro) as outlined [here](../README.md#compute-environment). For instructions on how to install the Docker engine, click [here](https://www.docker.com/get-started).

# Download the HRNet model:

To run the [`HRNet_Penobscot_demo_notebook.ipynb`](https://github.com/microsoft/seismic-deeplearning/blob/master/examples/interpretation/notebooks/HRNet_Penobscot_demo_notebook.ipynb), you will need to manually download the [HRNet-W48-C](https://1drv.ms/u/s!Aus8VCZ_C_33dKvqI6pBZlifgJk) pretrained model. You can follow the instructions [here.](https://github.com/microsoft/seismic-deeplearning#hrnet).
To run the [`Dutch_F3_patch_model_training_and_evaluation.ipynb`](https://github.com/microsoft/seismic-deeplearning/blob/master/examples/interpretation/notebooks/Dutch_F3_patch_model_training_and_evaluation.ipynb), you will need to manually download the [HRNet-W48-C](https://1drv.ms/u/s!Aus8VCZ_C_33dKvqI6pBZlifgJk) pretrained model. You can follow the instructions [here.](../README.md#pretrained-models).

If you are using an Azure Virtual Machine to run this code, you can download the model to your local machine, and then copy it to your Azure VM through the command below. Please make sure you update the `<azureuser>` and `<azurehost>` feilds.
```bash
scp hrnetv2_w48_imagenet_pretrained.pth <azureuser>@<azurehost>:/home/<azureuser>/seismic-deeplearning/docker/hrnetv2_w48_imagenet_pretrained.pth
```
Once you have the model downloaded (ideally under the `docker` directory), you can process to build the Docker image.
Once you have the model downloaded (ideally under the `docker` directory), you can proceed to build the Docker image.

# Build the Docker image:

Expand All @@ -22,7 +22,7 @@ This process will take a few minutes to complete.
# Run the Docker image:
Once the Docker image is built, you can run it anytime using the following command:
```bash
sudo docker run --rm -it -p 9000:9000 -p 9001:9001 --gpus=all --shm-size 11G --mount type=bind,source=$PWD/hrnetv2_w48_imagenet_pretrained.pth,target=/home/models/hrnetv2_w48_imagenet_pretrained.pth seismic-deeplearning
sudo docker run --rm -it -p 9000:9000 -p 9001:9001 --gpus=all --shm-size 11G --mount type=bind,source=$PWD/hrnetv2_w48_imagenet_pretrained.pth,target=/home/username/seismic-deeplearning/docker/hrnetv2_w48_imagenet_pretrained.pth seismic-deeplearning
```
If you have saved the pretrained model in a different directory, make sure you replace `$PWD/hrnetv2_w48_imagenet_pretrained.pth` with the **absolute** path to the pretrained HRNet model. The command above will run a Jupyter Lab instance that you can access by clicking on the link in your terminal. You can then navigate to the notebook or script that you would like to run.

Expand Down
4 changes: 3 additions & 1 deletion examples/interpretation/README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
The folder contains notebook examples illustrating the use of segmentation algorithms on openly available datasets. Make sure you have followed the [set up instructions](../README.md) before running these examples. We provide the following notebook examples
The folder contains notebook examples illustrating the use of segmentation algorithms on openly available datasets. Make sure you have followed the [set up instructions](../../README.md) before running these examples. We provide the following notebook examples

* [Dutch F3 dataset](notebooks/Dutch_F3_patch_model_training_and_evaluation.ipynb): This notebook illustrates section and patch based segmentation approaches on the [Dutch F3](https://terranubis.com/datainfo/Netherlands-Offshore-F3-Block-Complete) open dataset. This notebook uses denconvolution based segmentation algorithm on 2D patches. The notebook will guide you through visualization of the input volume, setting up model training and evaluation.

To understand the configuration files and the dafault parameters refer to this [section in the top level README](../../README.md#configuration-files)
6 changes: 3 additions & 3 deletions experiments/interpretation/dutchf3_patch/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@ You can run five different models on this dataset:
* [HRNet](local/configs/hrnet.yaml)
* [SEResNet](local/configs/seresnet_unet.yaml)
* [UNet](local/configs/unet.yaml)
* [PatchDeconvNet](local/configs/patch_patch_deconvnet.yaml)
* [PatchDeconvNet-Skip](local/configs/patch_deconvnet_skip.yaml.yaml)
* [PatchDeconvNet](local/configs/patch_deconvnet.yaml)
* [PatchDeconvNet-Skip](local/configs/patch_deconvnet_skip.yaml)

All these models take 2D patches of the dataset as input and provide predictions for those patches. The patches need to be stitched together to form a whole inline or crossline.

Expand All @@ -18,7 +18,7 @@ Also follow instructions for [downloading and preparing](../../../README.md#f3-N

### Running experiments

Now you're all set to run training and testing experiments on the F3 Netherlands dataset. Please start from the `train.sh` and `test.sh` scripts under the `local/` and `distributed/` directories, which invoke the corresponding python scripts. Take a look at the project configurations in (e.g in `default.py`) for experiment options and modify if necessary.
Now you're all set to run training and testing experiments on the F3 Netherlands dataset. Please start from the `train.sh` and `test.sh` scripts under the `local/` directory, which invoke the corresponding python scripts. Take a look at the project configurations in (e.g in `default.py`) for experiment options and modify if necessary.

### Monitoring progress with TensorBoard
- from the this directory, run `tensorboard --logdir='output'` (all runtime logging information is
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ VALIDATION:
BATCH_SIZE_PER_GPU: 128

TEST:
MODEL_PATH: "/data/home/mat/repos/DeepSeismic/experiments/interpretation/dutchf3_patch/local/output/staging/0d1d2bbf9685995a0515ca1d9de90f9bcec0db90/seg_hrnet/Dec20_233535/models/seg_hrnet_running_model_33.pth"
MODEL_PATH: "/home/username/seismic-deeplearning/docker/hrnetv2_w48_imagenet_pretrained.pth"
TEST_STRIDE: 10
SPLIT: 'Both' # Can be Both, Test1, Test2
INLINE: True
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ VALIDATION:
BATCH_SIZE_PER_GPU: 64

TEST:
MODEL_PATH: "/data/home/mat/repos/DeepSeismic/interpretation/experiments/segmentation/dutchf3/local/output/mat/exp/5cc37bbe5302e1989ef1388d629400a16f82d1a9/patch_deconvnet/Aug27_200339/models/patch_deconvnet_snapshot1model_50.pth"
MODEL_PATH: ""
TEST_STRIDE: 10
SPLIT: 'Both' # Can be Both, Test1, Test2
INLINE: True
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ VALIDATION:
BATCH_SIZE_PER_GPU: 32

TEST:
MODEL_PATH: "/data/home/mat/repos/DeepSeismic/interpretation/experiments/segmentation/dutchf3/local/output/mat/exp/dc2e2d20b7f6d508beb779ffff37c77d0139e588/resnet_unet/Sep01_125513/models/resnet_unet_snapshot1model_52.pth"
MODEL_PATH: ""
TEST_STRIDE: 10
SPLIT: 'Both' # Can be Both, Test1, Test2
INLINE: True
Expand Down
25 changes: 14 additions & 11 deletions experiments/interpretation/dutchf3_patch/local/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,20 +231,23 @@ def _patch_label_2d(
outdir = f"debug/batch_{split}"
generate_path(outdir)
for i in range(batch.shape[0]):
image_to_disk(
np.array(batch[i, 0, :, :]), f"{outdir}/{batch_indexes[i][0]}_{batch_indexes[i][1]}_img.png"
)
# now dump model predictions
path_prefix = f"{outdir}/{batch_indexes[i][0]}_{batch_indexes[i][1]}"
model_output = model_output.detach().cpu()
# save image:
image_to_disk(np.array(batch[i, 0, :, :]), path_prefix + "_img.png")
# dump model prediction:
mask_to_disk(model_output[i, :, :, :].argmax(dim=1).numpy(), path_prefix + "_pred.png", num_classes)
# dump model confidence values
for nclass in range(num_classes):
mask_to_disk(
np.array(model_output[i, nclass, :, :].detach().cpu()),
f"{outdir}/{batch_indexes[i][0]}_{batch_indexes[i][1]}_class_{nclass}_pred.png",
image_to_disk(
model_output[i, nclass, :, :].numpy(), path_prefix + f"_class_{nclass}_conf.png",
)

# crop the output_p in the middle
output = output_p[:, :, ps:-ps, ps:-ps]
return output


def _evaluate_split(
split, section_aug, model, pre_processing, output_processing, device, running_metrics_overall, config, debug=False,
):
Expand Down Expand Up @@ -273,12 +276,12 @@ def _evaluate_split(
output_dir = generate_path(
f"debug/{config.OUTPUT_DIR}_test_{split}", git_branch(), git_hash(), config.MODEL.NAME, current_datetime(),
)
except TypeError:
except:
output_dir = generate_path(f"debug/{config.OUTPUT_DIR}_test_{split}", config.MODEL.NAME, current_datetime(),)

running_metrics_split = runningScore(n_classes)

# testing mode:
# evaluation mode:
with torch.no_grad(): # operations inside don't track history
model.eval()
total_iteration = 0
Expand Down Expand Up @@ -306,8 +309,8 @@ def _evaluate_split(
running_metrics_overall.update(gt, pred)

# dump images to disk for review
mask_to_disk(pred.squeeze(), os.path.join(output_dir, f"{i}_pred.png"))
mask_to_disk(gt.squeeze(), os.path.join(output_dir, f"{i}_gt.png"))
mask_to_disk(pred.squeeze(), os.path.join(output_dir, f"{i}_pred.png"), n_classes)
mask_to_disk(gt.squeeze(), os.path.join(output_dir, f"{i}_gt.png"), n_classes)

# get scores
score, class_iou = running_metrics_split.get_scores()
Expand Down
4 changes: 2 additions & 2 deletions experiments/interpretation/dutchf3_patch/local/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def run(*options, cfg=None, debug=False):
output_dir = generate_path(
config.OUTPUT_DIR, git_branch(), git_hash(), config_file_name, config.TRAIN.MODEL_DIR, current_datetime(),
)
except TypeError:
except:
output_dir = generate_path(config.OUTPUT_DIR, config_file_name, config.TRAIN.MODEL_DIR, current_datetime(),)

# Logging:
Expand All @@ -83,7 +83,7 @@ def run(*options, cfg=None, debug=False):
# Set CUDNN benchmark mode:
torch.backends.cudnn.benchmark = config.CUDNN.BENCHMARK

# we will write the model under outputs / config_file_name / model_dir
# We will write the model under outputs / config_file_name / model_dir
config_file_name = "default_config" if not cfg else cfg.split("/")[-1].split(".")[0]

# Fix random seeds:
Expand Down
Loading

0 comments on commit 4af6e41

Please sign in to comment.