Skip to content
This repository has been archived by the owner on Nov 16, 2023. It is now read-only.

Commit

Permalink
Merge branch 'staging' into staging
Browse files Browse the repository at this point in the history
  • Loading branch information
georgeAccnt-GH authored Dec 10, 2019
2 parents e1c25e9 + 9bbcc0b commit c9401f0
Show file tree
Hide file tree
Showing 17 changed files with 118 additions and 50 deletions.
35 changes: 19 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ DeepSeismic currently focuses on Seismic Interpretation (3D segmentation aka fac
### Quick Start

There are two ways to get started with the DeepSeismic codebase, which currently focuses on Interpretation:
- if you'd like to get an idea of how our interpretation (segmentation) models are used, simply review the [HRNet demo notebook](https://github.com/microsoft/DeepSeismic/blob/staging/examples/interpretation/notebooks/HRNet_demo_notebook.ipynb)
- if you'd like to get an idea of how our interpretation (segmentation) models are used, simply review the [HRNet demo notebook](https://github.com/microsoft/DeepSeismic/blob/staging/examples/interpretation/notebooks/HRNet_Penobscot_demo_notebook.ipynb)
- to actually run the code, you'll need to set up a compute environment (which includes setting up a GPU-enabled Linux VM and downloading the appropriate Anaconda Python packages) and download the datasets which you'd like to work with - detailed steps for doing this are provided in the next `Interpretation` section below.

If you run into any problems, chances are your problem has already been solved in the [Troubleshooting](#troubleshooting) section.
Expand All @@ -30,19 +30,17 @@ To run examples available on the repo, please follow instructions below to:

Follow the instruction bellow to read about compute requirements and install required libraries.

<details>
<summary><b>Compute environment</b></summary>

#### Compute environment

We recommend using a virtual machine to run the example notebooks and scripts. Specifically, you will need a GPU powered Linux machine, as this repository is developed and tested on __Linux only__. The easiest way to get started is to use the [Azure Data Science Virtual Machine (DSVM) for Linux (Ubuntu)](https://docs.microsoft.com/en-us/azure/machine-learning/data-science-virtual-machine/dsvm-ubuntu-intro). This VM will come installed with all the system requirements that are needed to create the conda environment described below and then run the notebooks in this repository.

For this repo, we recommend selecting a multi-GPU Ubuntu VM of type [Standard_NC12](https://docs.microsoft.com/en-us/azure/virtual-machines/windows/sizes-gpu#ncv3-series). The machine is powered by NVIDIA Tesla K80 (or V100 GPU for NCv2 series) which can be found in most Azure regions.

> NOTE: For users new to Azure, your subscription may not come with a quota for GPUs. You may need to go into the Azure portal to increase your quota for GPU VMs. Learn more about how to do this here: https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits.
</details>

<details>
<summary><b>Package Installation</b></summary>
#### Package Installation

To install packages contained in this repository, navigate to the directory where you pulled the DeepSeismic repo to run:
```bash
Expand All @@ -69,11 +67,16 @@ conda env update --file environment/anaconda/local/environment.yml
```
from the root of DeepSeismic repo.

</details>

### Dataset download and preparation

This repository provides examples on how to run seismic interpretation on two publicly available annotated seismic datasets: [Penobscot](https://zenodo.org/record/1341774) and [F3 Netherlands](https://github.com/olivesgatech/facies_classification_benchmark).
This repository provides examples on how to run seismic interpretation on two publicly available annotated seismic datasets: [Penobscot](https://zenodo.org/record/1341774) and [F3 Netherlands](https://github.com/olivesgatech/facies_classification_benchmark). Their respective sizes (uncompressed on disk in your folder after downloading and pre-processing) are:
- **Penobscot**: 7.9 GB
- **Dutch F3**: 2.2 GB

Please make sure you have enough disk space to download either dataset.

We have experiments and notebooks which use either one dataset or the other. Depending on which experiment/notebook you want to run you'll need to download the corresponding dataset. We suggest you start by looking at [HRNet demo notebook](https://github.com/microsoft/DeepSeismic/blob/staging/examples/interpretation/notebooks/HRNet_Penobscot_demo_notebook.ipynb) which requires the Penobscot dataset.

#### Penobscot
To download the Penobscot dataset run the [download_penobscot.sh](scripts/download_penobscot.sh) script, e.g.
Expand All @@ -91,7 +94,7 @@ To make things easier, we suggested you use your home directory where you might
To prepare the data for the experiments (e.g. split into train/val/test), please run the following script (modifying arguments as desired):

```
python scripts/prepare_penobscot.py split_inline --data-dir=/data/penobscot --val-ratio=.1 --test-ratio=.2
python scripts/prepare_penobscot.py split_inline --data-dir="$HOME/data/penobscot" --val-ratio=.1 --test-ratio=.2
```

#### F3 Netherlands
Expand Down Expand Up @@ -173,12 +176,12 @@ We use [YACS](https://github.com/rbgirshick/yacs) configuration library to manag
#### HRNet
To achieve the same results as the benchmarks above you will need to download the HRNet model [pretrained](https://github.com/HRNet/HRNet-Image-Classification) on ImageNet. We are specifically using the [HRNet-W48-C](https://1drv.ms/u/s!Aus8VCZ_C_33dKvqI6pBZlifgJk) pre-trained model - download this model to your local drive and make sure you add the path to the experiment (or notebook) configuration file under `TEST.MODEL_PATH` setting. Other HRNet variants are also available [here](https://github.com/HRNet/HRNet-Image-Classification) - you can navigate to those from the [main HRNet landing page](https://github.com/HRNet/HRNet-Object-Detection) for object detection.
To achieve the same results as the benchmarks above you will need to download the HRNet model [pretrained](https://github.com/HRNet/HRNet-Image-Classification) on ImageNet. We are specifically using the [HRNet-W48-C](https://1drv.ms/u/s!Aus8VCZ_C_33dKvqI6pBZlifgJk) pre-trained model; other HRNet variants are also available [here](https://github.com/HRNet/HRNet-Image-Classification) - you can navigate to those from the [main HRNet landing page](https://github.com/HRNet/HRNet-Object-Detection) for object detection.
Unfortunately the OneDrive location which is used to host the model is using a temporary authentication token, so there is no way for us to scipt up model download. There are two ways to upload and use the pre-trained HRNet model on DS VM:
- download the model to your local drive using a web browser of your choice and then upload the model to the DS VM using something like `scp`; navigate to Portal and copy DS VM's public IP from the Overview panel of your DS VM (you can search your DS VM by name in the search bar of the Portal) then use `scp local_model_location username@DS_VM_public_IP:./model/save/path` to upload
- alternatively you can use the same public IP to open remote desktop over SSH to your Linux VM using [X2Go](https://wiki.x2go.org/doku.php/download:start): you can basically open the web browser on your VM this way and download the model to VM's disk
To facilitate easier download on a Linux machine of your choice (or [Azure Data Science Virtual Machine](https://azure.microsoft.com/en-us/services/virtual-machines/data-science-virtual-machines/) which we recommend), we created an automated download scipt for you, just run
```bash
./scripts/download_hrnet.sh 'your_folder_to_store_the_model' 'model_file'
```
### Viewers (optional)
Expand All @@ -198,7 +201,7 @@ pip install segyviewer
To visualize cross-sections of a 3D volume, you can run
[segyviewer](https://github.com/equinor/segyviewer) like so:
```bash
segyviewer /mnt/dutchf3/data.segy
segyviewer "${HOME}/data/dutchf3/data.segy"
```

### Benchmarks
Expand Down Expand Up @@ -324,7 +327,7 @@ which will indicate that anaconda folder is __/anaconda__. We'll refer to this l

To test whether this setup worked, right after you can open `ipython` and execute the following code
```python
import torch
import torch
torch.cuda.is_available()
```

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,20 @@ CUDNN:
DETERMINISTIC: false
ENABLED: true
GPUS: (0,)
OUTPUT_DIR: 'output'
OUTPUT_DIR: 'output'
LOG_DIR: 'log'
WORKERS: 4
PRINT_FREQ: 10
LOG_CONFIG: logging.conf
SEED: 2019


DATASET:
NUM_CLASSES: 6
ROOT: /mnt/dutchf3
CLASS_WEIGHTS: [0.7151, 0.8811, 0.5156, 0.9346, 0.9683, 0.9852]

MODEL:
NAME: patch_deconvnet
NAME: patch_deconvnet_skip
IN_CHANNELS: 1


Expand All @@ -31,7 +30,7 @@ TRAIN:
WEIGHT_DECAY: 0.0001
SNAPSHOTS: 5
AUGMENTATION: True
DEPTH: "No" # Options are No, Patch and Section
DEPTH: "none" #"patch" # Options are None, Patch and Section
STRIDE: 50
PATCH_SIZE: 99
AUGMENTATIONS:
Expand All @@ -48,12 +47,13 @@ TRAIN:
VALIDATION:
BATCH_SIZE_PER_GPU: 512

TEST:
MODEL_PATH: "/data/home/mat/repos/DeepSeismic/interpretation/experiments/segmentation/dutchf3/local/output/mat/exp/5cc37bbe5302e1989ef1388d629400a16f82d1a9/patch_deconvnet/Aug27_200339/models/patch_deconvnet_snapshot1model_50.pth"
TEST:
MODEL_PATH: ""
TEST_STRIDE: 10
SPLIT: 'Both' # Can be Both, Test1, Test2
INLINE: True
CROSSLINE: True
POST_PROCESSING:
SIZE: 99
SIZE: 99 #
CROP_PIXELS: 0 # Number of pixels to crop top, bottom, left and right

Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ SEED: 2019
DATASET:
NUM_CLASSES: 6
ROOT: /mnt/dutchf3
DEPTH: 'no'
CLASS_WEIGHTS: [0.7151, 0.8811, 0.5156, 0.9346, 0.9683, 0.9852]

MODEL:
NAME: patch_deconvnet_skip
Expand All @@ -29,6 +29,31 @@ TRAIN:
MOMENTUM: 0.9
WEIGHT_DECAY: 0.0001
SNAPSHOTS: 5
AUGMENTATION: True
DEPTH: "none" #"patch" # Options are None, Patch and Section
STRIDE: 50
PATCH_SIZE: 99
AUGMENTATIONS:
RESIZE:
HEIGHT: 99
WIDTH: 99
PAD:
HEIGHT: 99
WIDTH: 99
MEAN: 0.0009997 # 0.0009996710808862074
STD: 0.20977 # 0.20976548783479299
MODEL_DIR: "models"

VALIDATION:
BATCH_SIZE_PER_GPU: 512

TEST:
MODEL_PATH: ""
TEST_STRIDE: 10
SPLIT: 'Both' # Can be Both, Test1, Test2
INLINE: True
CROSSLINE: True
POST_PROCESSING:
SIZE: 99 #
CROP_PIXELS: 0 # Number of pixels to crop top, bottom, left and right

TEST:
BATCH_SIZE_PER_GPU: 128
6 changes: 4 additions & 2 deletions experiments/interpretation/dutchf3_patch/distributed/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,10 +262,12 @@ def _select_pred_and_mask(model_out_dict):
trainer.add_event_handler(Events.EPOCH_STARTED, logging_handlers.log_lr(optimizer))

try:
output_dir = generate_path(config.OUTPUT_DIR, git_branch(), git_hash(), config.MODEL.NAME, current_datetime(),)
output_dir = generate_path(
config.OUTPUT_DIR, git_branch(), git_hash(), config.MODEL.NAME, current_datetime(),
)
except TypeError:
output_dir = generate_path(config.OUTPUT_DIR, config.MODEL.NAME, current_datetime(),)

summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR))
logger.info(f"Logging Tensorboard to {path.join(output_dir, config.LOG_DIR)}")
trainer.add_event_handler(
Expand Down
10 changes: 9 additions & 1 deletion experiments/interpretation/dutchf3_patch/local/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,15 @@ def test(*options, cfg=None, debug=False):
section_file = path.join(config.DATASET.ROOT, "splits", "section_" + split + ".txt")
_write_section_file(labels, section_file)
_evaluate_split(
split, section_aug, model, pre_processing, output_processing, device, running_metrics_overall, config, debug=debug
split,
section_aug,
model,
pre_processing,
output_processing,
device,
running_metrics_overall,
config,
debug=debug,
)

# FINAL TEST RESULTS:
Expand Down
2 changes: 0 additions & 2 deletions experiments/interpretation/dutchf3_patch/local/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,8 +94,6 @@ def run(*options, cfg=None, debug=False):

update_config(config, options=options, config_file=cfg)



# Start logging
load_log_configuration(config.LOG_CONFIG)
logger = logging.getLogger(__name__)
Expand Down
4 changes: 1 addition & 3 deletions experiments/interpretation/dutchf3_section/local/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,9 +87,7 @@ def reset(self):
self.confusion_matrix = np.zeros((self.n_classes, self.n_classes))


def _evaluate_split(
split, section_aug, model, device, running_metrics_overall, config, debug=False
):
def _evaluate_split(split, section_aug, model, device, running_metrics_overall, config, debug=False):
logger = logging.getLogger(__name__)

TestSectionLoader = get_test_loader(config)
Expand Down
3 changes: 1 addition & 2 deletions experiments/interpretation/penobscot/local/test.py
Original file line number Diff line number Diff line change
Expand Up @@ -186,12 +186,11 @@ def run(*options, cfg=None, debug=False):
device = "cuda"
model = model.to(device) # Send to GPU


try:
output_dir = generate_path(config.OUTPUT_DIR, git_branch(), git_hash(), config.MODEL.NAME, current_datetime(),)
except TypeError:
output_dir = generate_path(config.OUTPUT_DIR, config.MODEL.NAME, current_datetime(),)

summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR))

# weights are inversely proportional to the frequency of the classes in
Expand Down
2 changes: 1 addition & 1 deletion experiments/interpretation/penobscot/local/train.py
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ def run(*options, cfg=None, debug=False):
output_dir = generate_path(config.OUTPUT_DIR, git_branch(), git_hash(), config.MODEL.NAME, current_datetime(),)
except TypeError:
output_dir = generate_path(config.OUTPUT_DIR, config.MODEL.NAME, current_datetime(),)

summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR))
snapshot_duration = scheduler_step * len(train_loader)
scheduler = CosineAnnealingScheduler(optimizer, "lr", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, snapshot_duration)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,3 @@

from deepseismic_interpretation.azureml_tools.workspace import workspace_for_user
from deepseismic_interpretation.azureml_tools.experiment import PyTorchExperiment

Original file line number Diff line number Diff line change
Expand Up @@ -281,4 +281,3 @@ def submit(

self._logger.debug(estimator.conda_dependencies.__dict__)
return self._experiment.submit(estimator)

Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,7 @@
from azure.mgmt.storage.models import StorageAccountCreateParameters
from azure.mgmt.storage.v2019_04_01.models import Kind, Sku, SkuName

from deepseismic_interpretation.azureml_tools.resource_group import \
create_resource_group
from deepseismic_interpretation.azureml_tools.resource_group import create_resource_group


class StorageAccountCreateFailure(Exception):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,12 @@
from pathlib import Path

import azureml
from azureml.core.authentication import (AuthenticationException,
AzureCliAuthentication,
InteractiveLoginAuthentication,
ServicePrincipalAuthentication)
from azureml.core.authentication import (
AuthenticationException,
AzureCliAuthentication,
InteractiveLoginAuthentication,
ServicePrincipalAuthentication,
)

_DEFAULT_AML_PATH = "aml_config/azml_config.json"

Expand Down
6 changes: 6 additions & 0 deletions scripts/autoformat.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
#!/bin/bash

# autoformats all files in the repo to black

# example of using regex -regex ".*\.\(py\|ipynb\|md\|txt\)"
find . -type f -regex ".*\.py" -exec black {} +
34 changes: 30 additions & 4 deletions tests/cicd/main_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -194,19 +194,32 @@ jobs:
cd experiments/interpretation/dutchf3_patch/distributed
python -m torch.distributed.launch --nproc_per_node=$(nproc) train.py 'DATASET.ROOT' '/home/alfred/data/dutch_f3/data' 'TRAIN.END_EPOCH' 1 'TRAIN.SNAPSHOTS' 1 --cfg=configs/hrnet.yaml --debug
- job: unet_dutchf3_dist
- job: patch_deconvnet_skip_dist
dependsOn: setup
timeoutInMinutes: 5
displayName: unet dutchf3 distributed
displayName: patch deconvnet skip distributed
pool:
name: deepseismicagentpool
steps:
- bash: |
source activate seismic-interpretation
# run the tests
cd experiments/interpretation/dutchf3_patch/distributed
python -m torch.distributed.launch --nproc_per_node=$(nproc) train.py 'DATASET.ROOT' '/home/alfred/data/dutch_f3/data' 'TRAIN.END_EPOCH' 1 'TRAIN.SNAPSHOTS' 1 --cfg=configs/unet.yaml --debug
python -m torch.distributed.launch --nproc_per_node=$(nproc) train.py 'TRAIN.BATCH_SIZE_PER_GPU' 1 'DATASET.ROOT' '/home/alfred/data/dutch_f3/data' 'TRAIN.END_EPOCH' 1 'TRAIN.SNAPSHOTS' 1 --cfg=configs/patch_deconvnet_skip.yaml --debug
- job: patch_deconvnet_dist
dependsOn: setup
timeoutInMinutes: 5
displayName: patch deconvnet distributed
pool:
name: deepseismicagentpool
steps:
- bash: |
source activate seismic-interpretation
# run the tests
cd experiments/interpretation/dutchf3_patch/distributed
python -m torch.distributed.launch --nproc_per_node=$(nproc) train.py 'TRAIN.BATCH_SIZE_PER_GPU' 1 'DATASET.ROOT' '/home/alfred/data/dutch_f3/data' 'TRAIN.END_EPOCH' 1 'TRAIN.SNAPSHOTS' 1 --cfg=configs/patch_deconvnet.yaml --debug
- job: seresnet_unet_dutchf3_dist
dependsOn: setup
timeoutInMinutes: 5
Expand All @@ -219,6 +232,19 @@ jobs:
# run the tests
cd experiments/interpretation/dutchf3_patch/distributed
python -m torch.distributed.launch --nproc_per_node=$(nproc) train.py 'DATASET.ROOT' '/home/alfred/data/dutch_f3/data' 'TRAIN.END_EPOCH' 1 'TRAIN.SNAPSHOTS' 1 --cfg=configs/seresnet_unet.yaml --debug
- job: unet_dutchf3_dist
dependsOn: setup
timeoutInMinutes: 5
displayName: unet dutchf3 distributed
pool:
name: deepseismicagentpool
steps:
- bash: |
source activate seismic-interpretation
# run the tests
cd experiments/interpretation/dutchf3_patch/distributed
python -m torch.distributed.launch --nproc_per_node=$(nproc) train.py 'DATASET.ROOT' '/home/alfred/data/dutch_f3/data' 'TRAIN.END_EPOCH' 1 'TRAIN.SNAPSHOTS' 1 --cfg=configs/unet.yaml --debug
###################################################################################################
# LOCAL SECTION JOBS
Expand Down
6 changes: 5 additions & 1 deletion tests/cicd/src/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,22 @@

import pytest


def pytest_addoption(parser):
parser.addoption("--nbname", action="store", type=str, default=None)
parser.addoption("--dataset_root", action="store", type=str, default=None)


@pytest.fixture
def nbname(request):
return request.config.getoption("--nbname")


@pytest.fixture
def dataset_root(request):
return request.config.getoption("--dataset_root")


"""
def pytest_generate_tests(metafunc):
# This is called for every test. Only get/set command line arguments
Expand All @@ -25,4 +29,4 @@ def pytest_generate_tests(metafunc):
option_value = metafunc.config.option.dataset_root
if 'dataset_root' in metafunc.fixturenames and option_value is not None:
metafunc.parametrize("dataset_root", [option_value])
"""
"""

0 comments on commit c9401f0

Please sign in to comment.