From e0de54ca237a557c609fc6032c7eb1d076d5e1ea Mon Sep 17 00:00:00 2001 From: yalaudah Date: Wed, 22 Apr 2020 14:42:43 +0000 Subject: [PATCH] Refactoring train.py, removing OpenCV, adding training results to Tensborboard, bug fixes (#264) I think moving forward, we'll use smaller PRs. But here are the changes in this one: Fixes issue #236 that involves rewriting a big portion of train.py such that: All the tensorboard event handlers are organized in tensorboard_handlers.py and only called in train.py to log training and validation results in Tensorboard The code logs the same results for training and validation. Also, it adds the class IoU score as well. All single-use functions (e.g. _select_max, _tensor_to_numpy, _select_pred_and_mask) are lambda functions now The code is organized into more meaningful "chunks".. e.g. all the optimizer-related code should be together if possible, same thing for logging, configuration, loaders, tensorboard, ..etc. In addition: Fixed a visualization bug where the seismic images where not normalized correctly. This solves Issue #217. Fixed a visualization bug where the predictions where not masked where the input image was padded. This improves the ability to visually inspect and evaluate the results. This solves Issue #230. Fixes a potential issue where Tensorboard can crash when a large training batchsize is used. Now the number of images visualized in Tensorboard from every batch has an upper limit. Completely removed OpenCV as a dependency from the DeepSeismic Repo. It was only used in a small part of the code where it wasn't really necessary, and OpenCV is a huge library. Fixes Issue #218 where the epoch number for the images in Tensorboard was always logged as 1 (therefore, not allowing use to see the epoch number of the different results in Tensorboard. Removes the HorovodLRScheduler class since its no longer used Removes toolz.take from Debug mode, and uses PyTorch's native Subset() dataset class Changes default patch size for the HRNet model to 256 In addition to several other minor changes Co-authored-by: Yazeed Alaudah Co-authored-by: Ubuntu Co-authored-by: Max Kaznady --- AUTHORS.md | 3 +- README.md | 10 +- .../distributed/configs/hrnet.yaml | 3 +- .../distributed/configs/patch_deconvnet.yaml | 2 +- .../configs/patch_deconvnet_skip.yaml | 2 +- .../distributed/configs/seresnet_unet.yaml | 2 +- .../distributed/configs/unet.yaml | 2 +- .../dutchf3_patch/distributed/default.py | 3 +- .../dutchf3_patch/distributed/train.py | 118 ++++---- .../dutchf3_section/local/default.py | 3 +- .../dutchf3_section/local/train.py | 6 +- .../dutchf3_voxel/configs/texture_net.yaml | 2 +- .../interpretation/dutchf3_voxel/default.py | 4 +- .../interpretation/dutchf3_voxel/train.py | 2 +- .../penobscot/local/configs/hrnet.yaml | 3 +- .../local/configs/seresnet_unet.yaml | 2 +- .../interpretation/penobscot/local/default.py | 3 +- .../interpretation/penobscot/local/test.py | 41 +-- .../interpretation/penobscot/local/train.py | 63 ++--- .../cv_lib/event_handlers/logging_handlers.py | 43 +-- .../event_handlers/tensorboard_handlers.py | 76 ++++-- environment/anaconda/local/environment.yml | 1 - environment/docker/apex/dockerfile | 2 +- environment/docker/horovod/dockerfile | 2 +- ..._patch_model_training_and_evaluation.ipynb | 9 +- .../dutchf3_patch/local/configs/hrnet.yaml | 11 +- .../local/configs/patch_deconvnet.yaml | 2 +- .../local/configs/patch_deconvnet_skip.yaml | 2 +- .../local/configs/seresnet_unet.yaml | 2 +- .../dutchf3_patch/local/configs/unet.yaml | 2 +- .../dutchf3_patch/local/default.py | 6 +- .../dutchf3_patch/local/test.py | 95 ++----- .../dutchf3_patch/local/train.py | 253 ++++++------------ .../dutchf3/data.py | 14 +- tests/cicd/main_build.yml | 25 +- 35 files changed, 325 insertions(+), 494 deletions(-) diff --git a/AUTHORS.md b/AUTHORS.md index cb2995fa..b903ddb4 100644 --- a/AUTHORS.md +++ b/AUTHORS.md @@ -9,14 +9,15 @@ Contributors (sorted alphabetically) ------------------------------------- To contributors: please add your name to the list when you submit a patch to the project. +* Yazeed Alaudah * Ashish Bhatia +* Sharat Chikkerur * Daniel Ciborowski * George Iordanescu * Ilia Karmanov * Max Kaznady * Vanja Paunic * Mathew Salvaris -* Sharat Chikkerur * Wee Hyong Tok ## How to be a contributor to the repository diff --git a/README.md b/README.md index cf8886a7..44d60b5a 100644 --- a/README.md +++ b/README.md @@ -287,7 +287,7 @@ for the Penobscot dataset follow the same instructions but navigate to the [peno ## Contributing -This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com. +This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit [https://cla.opensource.microsoft.com](https://cla.opensource.microsoft.com). ### Submitting a Pull Request @@ -321,7 +321,7 @@ A typical output will be: someusername@somevm:/projects/DeepSeismic$ which python /anaconda/envs/py35/bin/python ``` -which will indicate that anaconda folder is __/anaconda__. We'll refer to this location in the instructions below, but you should update the commands according to your local anaconda folder. +which will indicate that anaconda folder is `__/anaconda__`. We'll refer to this location in the instructions below, but you should update the commands according to your local anaconda folder.
Data Science Virtual Machine conda package installation errors @@ -339,7 +339,7 @@ which will indicate that anaconda folder is __/anaconda__. We'll refer to this l
Data Science Virtual Machine conda package installation warnings - It could happen that while creating the conda environment defined by environment/anaconda/local/environment.yml on an Ubuntu DSVM, one can get multiple warnings like so: + It could happen that while creating the conda environment defined by `environment/anaconda/local/environment.yml` on an Ubuntu DSVM, one can get multiple warnings like so: ``` WARNING conda.gateways.disk.delete:unlink_or_rename_to_trash(140): Could not remove or rename /anaconda/pkgs/ipywidgets-7.5.1-py_0/site-packages/ipywidgets-7.5.1.dist-info/LICENSE. Please remove this file manually (you may need to reboot to free file handles) ``` @@ -350,7 +350,7 @@ which will indicate that anaconda folder is __/anaconda__. We'll refer to this l sudo chown -R $USER /anaconda ``` - After these command completes, try creating the conda environment in __environment/anaconda/local/environment.yml__ again. + After these command completes, try creating the conda environment in `__environment/anaconda/local/environment.yml__` again.
@@ -395,7 +395,7 @@ which will indicate that anaconda folder is __/anaconda__. We'll refer to this l
GPU out of memory errors - You should be able to see how much GPU memory your process is using by running + You should be able to see how much GPU memory your process is using by running: ```bash nvidia-smi ``` diff --git a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/hrnet.yaml b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/hrnet.yaml index 04ad6479..fe3995f6 100644 --- a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/hrnet.yaml +++ b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/hrnet.yaml @@ -9,6 +9,7 @@ WORKERS: 4 PRINT_FREQ: 10 LOG_CONFIG: logging.conf SEED: 2019 +OPENCV_BORDER_CONSTANT: 0 DATASET: @@ -73,7 +74,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "section" #"patch" # Options are No, Patch and Section + DEPTH: "section" #"patch" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 100 AUGMENTATIONS: diff --git a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/patch_deconvnet.yaml b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/patch_deconvnet.yaml index eb89ff00..fa1d6add 100644 --- a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/patch_deconvnet.yaml +++ b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/patch_deconvnet.yaml @@ -30,7 +30,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "none" #"patch" # Options are None, Patch and Section + DEPTH: "none" #"patch" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 99 AUGMENTATIONS: diff --git a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/patch_deconvnet_skip.yaml b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/patch_deconvnet_skip.yaml index eb89ff00..fa1d6add 100644 --- a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/patch_deconvnet_skip.yaml +++ b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/patch_deconvnet_skip.yaml @@ -30,7 +30,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "none" #"patch" # Options are None, Patch and Section + DEPTH: "none" #"patch" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 99 AUGMENTATIONS: diff --git a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/seresnet_unet.yaml b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/seresnet_unet.yaml index d0b8126f..9bc10d34 100644 --- a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/seresnet_unet.yaml +++ b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/seresnet_unet.yaml @@ -30,7 +30,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "section" # Options are No, Patch and Section + DEPTH: "section" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 100 AUGMENTATIONS: diff --git a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/unet.yaml b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/unet.yaml index 2843e62c..3fe5f439 100644 --- a/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/unet.yaml +++ b/contrib/experiments/interpretation/dutchf3_patch/distributed/configs/unet.yaml @@ -33,7 +33,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "section" # Options are No, Patch and Section + DEPTH: "section" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 100 AUGMENTATIONS: diff --git a/contrib/experiments/interpretation/dutchf3_patch/distributed/default.py b/contrib/experiments/interpretation/dutchf3_patch/distributed/default.py index bf23527b..34d3c4d3 100644 --- a/contrib/experiments/interpretation/dutchf3_patch/distributed/default.py +++ b/contrib/experiments/interpretation/dutchf3_patch/distributed/default.py @@ -20,6 +20,7 @@ _C.PIN_MEMORY = True _C.LOG_CONFIG = "logging.conf" _C.SEED = 42 +_C.OPENCV_BORDER_CONSTANT = 0 # Cudnn related params _C.CUDNN = CN() @@ -58,7 +59,7 @@ _C.TRAIN.PATCH_SIZE = 99 _C.TRAIN.MEAN = 0.0009997 # 0.0009996710808862074 _C.TRAIN.STD = 0.21 # 0.20976548783479299 -_C.TRAIN.DEPTH = "None" # Options are None, Patch and Section +_C.TRAIN.DEPTH = "none" # Options are: none, patch, and section # None adds no depth information and the num of channels remains at 1 # Patch adds depth per patch so is simply the height of that patch from 0 to 1, channels=3 # Section adds depth per section so contains depth information for the whole section, channels=3 diff --git a/contrib/experiments/interpretation/dutchf3_patch/distributed/train.py b/contrib/experiments/interpretation/dutchf3_patch/distributed/train.py index bc28249a..33bb0045 100644 --- a/contrib/experiments/interpretation/dutchf3_patch/distributed/train.py +++ b/contrib/experiments/interpretation/dutchf3_patch/distributed/train.py @@ -21,59 +21,30 @@ import os from os import path -import cv2 import fire import numpy as np import toolz import torch -from albumentations import Compose, HorizontalFlip, Normalize, Resize, PadIfNeeded -from cv_lib.utils import load_log_configuration -from cv_lib.event_handlers import ( - SnapshotHandler, - logging_handlers, - tensorboard_handlers, -) -from cv_lib.event_handlers.logging_handlers import Evaluator -from cv_lib.event_handlers.tensorboard_handlers import ( - create_image_writer, - create_summary_writer, -) -from cv_lib.segmentation import models -from cv_lib.segmentation import extract_metric_from -from deepseismic_interpretation.dutchf3.data import get_patch_loader, decode_segmap -from cv_lib.segmentation.dutchf3.engine import ( - create_supervised_evaluator, - create_supervised_trainer, -) - -from ignite.metrics import Loss -from cv_lib.segmentation.metrics import ( - pixelwise_accuracy, - class_accuracy, - mean_class_accuracy, - class_iou, - mean_iou, -) - -from cv_lib.segmentation.dutchf3.utils import ( - current_datetime, - generate_path, - git_branch, - git_hash, - np_to_tb, -) -from default import _C as config -from default import update_config -from ignite.contrib.handlers import ( - ConcatScheduler, - CosineAnnealingScheduler, - LinearCyclicalScheduler, -) +from albumentations import Compose, HorizontalFlip, Normalize, PadIfNeeded, Resize +from ignite.contrib.handlers import ConcatScheduler, CosineAnnealingScheduler, LinearCyclicalScheduler from ignite.engine import Events +from ignite.metrics import Loss from ignite.utils import convert_tensor from toolz import compose, curry from torch.utils import data +from cv_lib.event_handlers import SnapshotHandler, logging_handlers, tensorboard_handlers +from cv_lib.event_handlers.logging_handlers import Evaluator +from cv_lib.event_handlers.tensorboard_handlers import create_image_writer, create_summary_writer +from cv_lib.segmentation import extract_metric_from, models +from cv_lib.segmentation.dutchf3.engine import create_supervised_evaluator, create_supervised_trainer +from cv_lib.segmentation.dutchf3.utils import current_datetime, generate_path, git_branch, git_hash, np_to_tb +from cv_lib.segmentation.metrics import class_accuracy, class_iou, mean_class_accuracy, mean_iou, pixelwise_accuracy +from cv_lib.utils import load_log_configuration +from deepseismic_interpretation.dutchf3.data import decode_segmap, get_patch_loader +from default import _C as config +from default import update_config + def prepare_batch(batch, device=None, non_blocking=False): x, y = batch @@ -123,7 +94,7 @@ def run(*options, cfg=None, local_rank=0, debug=False): # provide environment variables, and requires that you use init_method=`env://`. torch.distributed.init_process_group(backend="nccl", init_method="env://") - scheduler_step = config.TRAIN.END_EPOCH // config.TRAIN.SNAPSHOTS + epochs_per_cycle = config.TRAIN.END_EPOCH // config.TRAIN.SNAPSHOTS torch.backends.cudnn.benchmark = config.CUDNN.BENCHMARK torch.manual_seed(config.SEED) @@ -137,7 +108,7 @@ def run(*options, cfg=None, local_rank=0, debug=False): PadIfNeeded( min_height=config.TRAIN.PATCH_SIZE, min_width=config.TRAIN.PATCH_SIZE, - border_mode=cv2.BORDER_CONSTANT, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=255, ), @@ -147,7 +118,7 @@ def run(*options, cfg=None, local_rank=0, debug=False): PadIfNeeded( min_height=config.TRAIN.AUGMENTATIONS.PAD.HEIGHT, min_width=config.TRAIN.AUGMENTATIONS.PAD.WIDTH, - border_mode=cv2.BORDER_CONSTANT, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=255, ), @@ -185,15 +156,16 @@ def run(*options, cfg=None, local_rank=0, debug=False): logger.info(f"Validation examples {len(val_set)}") n_classes = train_set.n_classes - #if debug: - #val_set = data.Subset(val_set, range(config.VALIDATION.BATCH_SIZE_PER_GPU)) - #train_set = data.Subset(train_set, range(config.TRAIN.BATCH_SIZE_PER_GPU*2)) + if debug: + logger.info("Running in debug mode..") + train_set = data.Subset(train_set, list(range(4))) + val_set = data.Subset(val_set, list(range(4))) logger.info(f"Training examples {len(train_set)}") logger.info(f"Validation examples {len(val_set)}") - train_sampler = torch.utils.data.distributed.DistributedSampler(train_set, num_replicas=world_size, rank=local_rank) + train_sampler = torch.utils.data.distributed.DistributedSampler(train_set, num_replicas=world_size, rank=local_rank) train_loader = data.DataLoader( train_set, batch_size=config.TRAIN.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS, sampler=train_sampler, ) @@ -226,9 +198,7 @@ def run(*options, cfg=None, local_rank=0, debug=False): model = torch.nn.parallel.DistributedDataParallel(model, device_ids=[device], find_unused_parameters=True) - snapshot_duration = scheduler_step * len(train_loader) - if debug: - snapshot_duration = 2 + snapshot_duration = epochs_per_cycle * len(train_loader) if not debug else 2*len(train_loader) warmup_duration = 5 * len(train_loader) warmup_scheduler = LinearCyclicalScheduler( optimizer, @@ -238,7 +208,7 @@ def run(*options, cfg=None, local_rank=0, debug=False): cycle_size=10 * len(train_loader), ) cosine_scheduler = CosineAnnealingScheduler( - optimizer, "lr", config.TRAIN.MAX_LR * world_size, config.TRAIN.MIN_LR * world_size, snapshot_duration, + optimizer, "lr", config.TRAIN.MAX_LR * world_size, config.TRAIN.MIN_LR * world_size, cycle_size=snapshot_duration, ) scheduler = ConcatScheduler(schedulers=[warmup_scheduler, cosine_scheduler], durations=[warmup_duration]) @@ -270,18 +240,27 @@ def _select_pred_and_mask(model_out_dict): device=device, ) - # Set the validation run to start on the epoch completion of the training run + # Set the validation run to start on the epoch completion of the training run + trainer.add_event_handler(Events.EPOCH_COMPLETED, Evaluator(evaluator, val_loader)) if local_rank == 0: # Run only on master process trainer.add_event_handler( - Events.ITERATION_COMPLETED, logging_handlers.log_training_output(log_interval=config.TRAIN.BATCH_SIZE_PER_GPU), + Events.ITERATION_COMPLETED, + logging_handlers.log_training_output(log_interval=config.TRAIN.BATCH_SIZE_PER_GPU), ) - trainer.add_event_handler(Events.EPOCH_STARTED, logging_handlers.log_lr(optimizer)) + trainer.add_event_handler(Events.EPOCH_STARTED, logging_handlers.log_lr(optimizer)) try: - output_dir = generate_path(config.OUTPUT_DIR, git_branch(), git_hash(), config_file_name, config.TRAIN.MODEL_DIR, current_datetime(),) + output_dir = generate_path( + config.OUTPUT_DIR, + git_branch(), + git_hash(), + config_file_name, + config.TRAIN.MODEL_DIR, + current_datetime(), + ) except TypeError: output_dir = generate_path(config.OUTPUT_DIR, config_file_name, config.TRAIN.MODEL_DIR, current_datetime(),) @@ -322,9 +301,7 @@ def _tensor_to_numpy(pred_tensor): return pred_tensor.squeeze().cpu().numpy() transform_func = compose(np_to_tb, decode_segmap(n_classes=n_classes), _tensor_to_numpy) - transform_pred = compose(transform_func, _select_max) - evaluator.add_event_handler( Events.EPOCH_COMPLETED, create_image_writer(summary_writer, "Validation/Image", "image"), ) @@ -341,19 +318,22 @@ def snapshot_function(): return (trainer.state.iteration % snapshot_duration) == 0 checkpoint_handler = SnapshotHandler( - output_dir, - config.MODEL.NAME, - extract_metric_from("mIoU"), - snapshot_function, + output_dir, config.MODEL.NAME, extract_metric_from("mIoU"), snapshot_function, ) evaluator.add_event_handler(Events.EPOCH_COMPLETED, checkpoint_handler, {"model": model}) - logger.info("Starting training") - + if debug: - trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length = config.TRAIN.BATCH_SIZE_PER_GPU*2, seed = config.SEED) + trainer.run( + train_loader, + max_epochs=config.TRAIN.END_EPOCH, + epoch_length=config.TRAIN.BATCH_SIZE_PER_GPU * 2, + seed=config.SEED, + ) else: - trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length = len(train_loader), seed = config.SEED) + trainer.run( + train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length=len(train_loader), seed=config.SEED + ) if __name__ == "__main__": diff --git a/contrib/experiments/interpretation/dutchf3_section/local/default.py b/contrib/experiments/interpretation/dutchf3_section/local/default.py index 5e296295..2b4888d2 100644 --- a/contrib/experiments/interpretation/dutchf3_section/local/default.py +++ b/contrib/experiments/interpretation/dutchf3_section/local/default.py @@ -21,6 +21,7 @@ _C.PIN_MEMORY = True _C.LOG_CONFIG = "./logging.conf" # Logging config file relative to the experiment _C.SEED = 42 +_C.OPENCV_BORDER_CONSTANT = 0 # Cudnn related params _C.CUDNN = CN() @@ -55,7 +56,7 @@ _C.TRAIN.AUGMENTATION = True _C.TRAIN.MEAN = 0.0009997 # 0.0009996710808862074 _C.TRAIN.STD = 0.20977 # 0.20976548783479299 -_C.TRAIN.DEPTH = "none" # Options are 'none', 'patch' and 'section' +_C.TRAIN.DEPTH = "none" # Options are: none, patch, and section # None adds no depth information and the num of channels remains at 1 # Patch adds depth per patch so is simply the height of that patch from 0 to 1, channels=3 # Section adds depth per section so contains depth information for the whole section, channels=3 diff --git a/contrib/experiments/interpretation/dutchf3_section/local/train.py b/contrib/experiments/interpretation/dutchf3_section/local/train.py index b216268e..5a9b4900 100644 --- a/contrib/experiments/interpretation/dutchf3_section/local/train.py +++ b/contrib/experiments/interpretation/dutchf3_section/local/train.py @@ -84,7 +84,7 @@ def run(*options, cfg=None, debug=False): load_log_configuration(config.LOG_CONFIG) logger = logging.getLogger(__name__) logger.debug(config.WORKERS) - scheduler_step = config.TRAIN.END_EPOCH // config.TRAIN.SNAPSHOTS + epochs_per_cycle = config.TRAIN.END_EPOCH // config.TRAIN.SNAPSHOTS torch.backends.cudnn.benchmark = config.CUDNN.BENCHMARK torch.manual_seed(config.SEED) @@ -164,8 +164,8 @@ def __len__(self): summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR)) - snapshot_duration = scheduler_step * len(train_loader) - scheduler = CosineAnnealingScheduler(optimizer, "lr", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, snapshot_duration) + snapshot_duration = epochs_per_cycle * len(train_loader) if not debug else 2*len(train_loader) + scheduler = CosineAnnealingScheduler(optimizer, "lr", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, cycle_size=snapshot_duration) # weights are inversely proportional to the frequency of the classes in # the training set diff --git a/contrib/experiments/interpretation/dutchf3_voxel/configs/texture_net.yaml b/contrib/experiments/interpretation/dutchf3_voxel/configs/texture_net.yaml index aeeffb86..3ff72dca 100644 --- a/contrib/experiments/interpretation/dutchf3_voxel/configs/texture_net.yaml +++ b/contrib/experiments/interpretation/dutchf3_voxel/configs/texture_net.yaml @@ -29,7 +29,7 @@ TRAIN: LR: 0.02 MOMENTUM: 0.9 WEIGHT_DECAY: 0.0001 - DEPTH: "voxel" # Options are No, Patch, Section and Voxel + DEPTH: "voxel" # Options are none, patch, section and voxel MODEL_DIR: "models" VALIDATION: diff --git a/contrib/experiments/interpretation/dutchf3_voxel/default.py b/contrib/experiments/interpretation/dutchf3_voxel/default.py index 100da598..bcf84731 100644 --- a/contrib/experiments/interpretation/dutchf3_voxel/default.py +++ b/contrib/experiments/interpretation/dutchf3_voxel/default.py @@ -24,6 +24,8 @@ _C.PRINT_FREQ = 20 _C.LOG_CONFIG = "logging.conf" _C.SEED = 42 +_C.OPENCV_BORDER_CONSTANT = 0 + # size of voxel cube: WINDOW_SIZE x WINDOW_SIZE x WINDOW_SIZE; used for 3D models only _C.WINDOW_SIZE = 65 @@ -50,7 +52,7 @@ _C.TRAIN.LR = 0.01 _C.TRAIN.MOMENTUM = 0.9 _C.TRAIN.WEIGHT_DECAY = 0.0001 -_C.TRAIN.DEPTH = "voxel" # Options are None, Patch and Section +_C.TRAIN.DEPTH = "voxel" # Options are none, patch and section _C.TRAIN.MODEL_DIR = "models" # This will be a subdirectory inside OUTPUT_DIR # validation diff --git a/contrib/experiments/interpretation/dutchf3_voxel/train.py b/contrib/experiments/interpretation/dutchf3_voxel/train.py index bd8cdf4b..3864e38f 100644 --- a/contrib/experiments/interpretation/dutchf3_voxel/train.py +++ b/contrib/experiments/interpretation/dutchf3_voxel/train.py @@ -208,7 +208,7 @@ def _select_pred_and_mask(model_out): summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR)) - snapshot_duration = 1 + snapshot_duration = 2 def snapshot_function(): return (trainer.state.iteration % snapshot_duration) == 0 diff --git a/contrib/experiments/interpretation/penobscot/local/configs/hrnet.yaml b/contrib/experiments/interpretation/penobscot/local/configs/hrnet.yaml index ba4b3967..7c711177 100644 --- a/contrib/experiments/interpretation/penobscot/local/configs/hrnet.yaml +++ b/contrib/experiments/interpretation/penobscot/local/configs/hrnet.yaml @@ -9,6 +9,7 @@ WORKERS: 4 PRINT_FREQ: 10 LOG_CONFIG: logging.conf SEED: 2019 +OPENCV_BORDER_CONSTANT: 0 DATASET: @@ -75,7 +76,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "patch" # Options are none, patch and section + DEPTH: "patch" # Options are none, patch, and section STRIDE: 64 PATCH_SIZE: 128 AUGMENTATIONS: diff --git a/contrib/experiments/interpretation/penobscot/local/configs/seresnet_unet.yaml b/contrib/experiments/interpretation/penobscot/local/configs/seresnet_unet.yaml index 3ba4d807..800cf4ce 100644 --- a/contrib/experiments/interpretation/penobscot/local/configs/seresnet_unet.yaml +++ b/contrib/experiments/interpretation/penobscot/local/configs/seresnet_unet.yaml @@ -32,7 +32,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "patch" # Options are none, patch and section + DEPTH: "patch" # Options are none, patch, and section STRIDE: 64 PATCH_SIZE: 128 AUGMENTATIONS: diff --git a/contrib/experiments/interpretation/penobscot/local/default.py b/contrib/experiments/interpretation/penobscot/local/default.py index fa8e540e..d72946ce 100644 --- a/contrib/experiments/interpretation/penobscot/local/default.py +++ b/contrib/experiments/interpretation/penobscot/local/default.py @@ -21,6 +21,7 @@ _C.PIN_MEMORY = True _C.LOG_CONFIG = "logging.conf" _C.SEED = 42 +_C.OPENCV_BORDER_CONSTANT = 0 # size of voxel cube: WINDOW_SIZE x WINDOW_SIZE x WINDOW_SIZE; used for 3D models only _C.WINDOW_SIZE = 65 @@ -72,7 +73,7 @@ _C.TRAIN.MEAN = [-0.0001777, 0.49, -0.0000688] # 0.0009996710808862074 _C.TRAIN.STD = [0.14076, 0.2717, 0.06286] # 0.20976548783479299 _C.TRAIN.MAX = 1 -_C.TRAIN.DEPTH = "patch" # Options are none, patch and section +_C.TRAIN.DEPTH = "patch" # Options are none, patch, and section # None adds no depth information and the num of channels remains at 1 # Patch adds depth per patch so is simply the height of that patch from 0 to 1, channels=3 # Section adds depth per section so contains depth information for the whole section, channels=3 diff --git a/contrib/experiments/interpretation/penobscot/local/test.py b/contrib/experiments/interpretation/penobscot/local/test.py index 8687085a..073c72f1 100644 --- a/contrib/experiments/interpretation/penobscot/local/test.py +++ b/contrib/experiments/interpretation/penobscot/local/test.py @@ -18,45 +18,30 @@ from itertools import chain from os import path -import cv2 import fire import numpy as np import torch import torchvision from albumentations import Compose, Normalize, PadIfNeeded, Resize -from cv_lib.utils import load_log_configuration +from ignite.engine import Events +from ignite.metrics import Loss +from ignite.utils import convert_tensor +from toolz import compose, tail, take +from toolz.sandbox.core import unzip +from torch.utils import data + from cv_lib.event_handlers import logging_handlers, tensorboard_handlers -from cv_lib.event_handlers.tensorboard_handlers import ( - create_image_writer, - create_summary_writer, -) +from cv_lib.event_handlers.tensorboard_handlers import create_image_writer, create_summary_writer from cv_lib.segmentation import models -from cv_lib.segmentation.metrics import ( - pixelwise_accuracy, - class_accuracy, - mean_class_accuracy, - class_iou, - mean_iou, -) -from cv_lib.segmentation.dutchf3.utils import ( - current_datetime, - generate_path, - git_branch, - git_hash, - np_to_tb, -) +from cv_lib.segmentation.dutchf3.utils import current_datetime, generate_path, git_branch, git_hash, np_to_tb +from cv_lib.segmentation.metrics import class_accuracy, class_iou, mean_class_accuracy, mean_iou, pixelwise_accuracy from cv_lib.segmentation.penobscot.engine import create_supervised_evaluator +from cv_lib.utils import load_log_configuration from deepseismic_interpretation.dutchf3.data import decode_segmap from deepseismic_interpretation.penobscot.data import get_patch_dataset from deepseismic_interpretation.penobscot.metrics import InlineMeanIoU from default import _C as config from default import update_config -from ignite.engine import Events -from ignite.metrics import Loss -from ignite.utils import convert_tensor -from toolz import compose, tail, take -from toolz.sandbox.core import unzip -from torch.utils import data def _prepare_batch(batch, device=None, non_blocking=False): @@ -139,7 +124,7 @@ def run(*options, cfg=None, debug=False): PadIfNeeded( min_height=config.TRAIN.PATCH_SIZE, min_width=config.TRAIN.PATCH_SIZE, - border_mode=cv2.BORDER_CONSTANT, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=mask_value, value=0, @@ -150,7 +135,7 @@ def run(*options, cfg=None, debug=False): PadIfNeeded( min_height=config.TRAIN.AUGMENTATIONS.PAD.HEIGHT, min_width=config.TRAIN.AUGMENTATIONS.PAD.WIDTH, - border_mode=cv2.BORDER_CONSTANT, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=mask_value, value=0, diff --git a/contrib/experiments/interpretation/penobscot/local/train.py b/contrib/experiments/interpretation/penobscot/local/train.py index 22394b53..04f563b7 100644 --- a/contrib/experiments/interpretation/penobscot/local/train.py +++ b/contrib/experiments/interpretation/penobscot/local/train.py @@ -17,7 +17,6 @@ import logging.config from os import path -import cv2 import fire import numpy as np import torch @@ -29,43 +28,19 @@ from toolz import compose from torch.utils import data +from cv_lib.event_handlers import SnapshotHandler, logging_handlers, tensorboard_handlers +from cv_lib.event_handlers.logging_handlers import Evaluator +from cv_lib.event_handlers.tensorboard_handlers import create_image_writer, create_summary_writer +from cv_lib.segmentation import extract_metric_from, models +from cv_lib.segmentation.dutchf3.utils import current_datetime, generate_path, git_branch, git_hash, np_to_tb +from cv_lib.segmentation.metrics import class_accuracy, class_iou, mean_class_accuracy, mean_iou, pixelwise_accuracy +from cv_lib.segmentation.penobscot.engine import create_supervised_evaluator, create_supervised_trainer +from cv_lib.utils import load_log_configuration from deepseismic_interpretation.dutchf3.data import decode_segmap from deepseismic_interpretation.penobscot.data import get_patch_dataset -from cv_lib.utils import load_log_configuration -from cv_lib.event_handlers import ( - SnapshotHandler, - logging_handlers, - tensorboard_handlers, -) -from cv_lib.event_handlers.logging_handlers import Evaluator -from cv_lib.event_handlers.tensorboard_handlers import ( - create_image_writer, - create_summary_writer, -) -from cv_lib.segmentation import models, extract_metric_from -from cv_lib.segmentation.penobscot.engine import ( - create_supervised_evaluator, - create_supervised_trainer, -) -from cv_lib.segmentation.metrics import ( - pixelwise_accuracy, - class_accuracy, - mean_class_accuracy, - class_iou, - mean_iou, -) -from cv_lib.segmentation.dutchf3.utils import ( - current_datetime, - generate_path, - git_branch, - git_hash, - np_to_tb, -) - from default import _C as config from default import update_config - mask_value = 255 _SEG_COLOURS = np.asarray( [[241, 238, 246], [208, 209, 230], [166, 189, 219], [116, 169, 207], [54, 144, 192], [5, 112, 176], [3, 78, 123],] @@ -107,7 +82,7 @@ def run(*options, cfg=None, debug=False): load_log_configuration(config.LOG_CONFIG) logger = logging.getLogger(__name__) logger.debug(config.WORKERS) - scheduler_step = config.TRAIN.END_EPOCH // config.TRAIN.SNAPSHOTS + epochs_per_cycle = config.TRAIN.END_EPOCH // config.TRAIN.SNAPSHOTS torch.backends.cudnn.benchmark = config.CUDNN.BENCHMARK torch.manual_seed(config.SEED) @@ -126,7 +101,7 @@ def run(*options, cfg=None, debug=False): PadIfNeeded( min_height=config.TRAIN.PATCH_SIZE, min_width=config.TRAIN.PATCH_SIZE, - border_mode=cv2.BORDER_CONSTANT, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=mask_value, value=0, @@ -137,7 +112,7 @@ def run(*options, cfg=None, debug=False): PadIfNeeded( min_height=config.TRAIN.AUGMENTATIONS.PAD.HEIGHT, min_width=config.TRAIN.AUGMENTATIONS.PAD.WIDTH, - border_mode=cv2.BORDER_CONSTANT, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=mask_value, value=0, @@ -182,7 +157,7 @@ def run(*options, cfg=None, debug=False): if debug: val_set = data.Subset(val_set, range(3)) - val_loader = data.DataLoader(val_set, batch_size=config.VALIDATION.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS) + val_loader = data.DataLoader(val_set, batch_size=config.VALIDATION.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS) model = getattr(models, config.MODEL.NAME).get_seg_model(config) @@ -203,8 +178,8 @@ def run(*options, cfg=None, debug=False): output_dir = generate_path(config.OUTPUT_DIR, config_file_name, config.TRAIN.MODEL_DIR, current_datetime(),) summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR)) - snapshot_duration = scheduler_step * len(train_loader) - scheduler = CosineAnnealingScheduler(optimizer, "lr", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, snapshot_duration) + snapshot_duration = epochs_per_cycle * len(train_loader) if not debug else 2*len(train_loader) + scheduler = CosineAnnealingScheduler(optimizer, "lr", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, cycle_size=snapshot_duration) # weights are inversely proportional to the frequency of the classes in # the training set @@ -306,9 +281,15 @@ def snapshot_function(): logger.info("Starting training") if debug: - trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length = config.TRAIN.BATCH_SIZE_PER_GPU, seed = config.SEED) + trainer.run( + train_loader, + max_epochs=config.TRAIN.END_EPOCH, + epoch_length=config.TRAIN.BATCH_SIZE_PER_GPU, + seed=config.SEED, + ) else: - trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length = len(train_loader), seed = config.SEED) + trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length=len(train_loader), seed=config.SEED) + if __name__ == "__main__": fire.Fire(run) diff --git a/cv_lib/cv_lib/event_handlers/logging_handlers.py b/cv_lib/cv_lib/event_handlers/logging_handlers.py index b7c41651..ea883a36 100644 --- a/cv_lib/cv_lib/event_handlers/logging_handlers.py +++ b/cv_lib/cv_lib/event_handlers/logging_handlers.py @@ -25,11 +25,8 @@ def log_lr(optimizer, engine): logger.info(f"lr - {lr}") -_DEFAULT_METRICS = {"pixacc": "Avg accuracy :", "nll": "Avg loss :"} - - @curry -def log_metrics(log_msg, engine, metrics_dict=_DEFAULT_METRICS): +def log_metrics(log_msg, engine, metrics_dict={"pixacc": "Avg accuracy :", "nll": "Avg loss :"}): logger = logging.getLogger(__name__) metrics = engine.state.metrics metrics_msg = " ".join([f"{metrics_dict[k]} {metrics[k]:.2f}" for k in metrics_dict]) @@ -44,6 +41,7 @@ def log_class_metrics(log_msg, engine, metrics_dict): logger.info(f"{log_msg} - Epoch {engine.state.epoch} [{engine.state.max_epochs}]\n" + metrics_msg) +# TODO: remove Evaluator once other train.py scripts are updated class Evaluator: def __init__(self, evaluation_engine, data_loader): self._evaluation_engine = evaluation_engine @@ -51,40 +49,3 @@ def __init__(self, evaluation_engine, data_loader): def __call__(self, engine): self._evaluation_engine.run(self._data_loader) - - -class HorovodLRScheduler: - """ - Horovod: using `lr = base_lr * hvd.size()` from the very beginning leads to worse final - accuracy. Scale the learning rate `lr = base_lr` ---> `lr = base_lr * hvd.size()` during - the first five epochs. See https://arxiv.org/abs/1706.02677 for details. - After the warmup reduce learning rate by 10 on the 30th, 60th and 80th epochs. - """ - - def __init__( - self, base_lr, warmup_epochs, cluster_size, data_loader, optimizer, batches_per_allreduce, - ): - self._warmup_epochs = warmup_epochs - self._cluster_size = cluster_size - self._data_loader = data_loader - self._optimizer = optimizer - self._base_lr = base_lr - self._batches_per_allreduce = batches_per_allreduce - self._logger = logging.getLogger(__name__) - - def __call__(self, engine): - epoch = engine.state.epoch - if epoch < self._warmup_epochs: - epoch += float(engine.state.iteration + 1) / len(self._data_loader) - lr_adj = 1.0 / self._cluster_size * (epoch * (self._cluster_size - 1) / self._warmup_epochs + 1) - elif epoch < 30: - lr_adj = 1.0 - elif epoch < 60: - lr_adj = 1e-1 - elif epoch < 80: - lr_adj = 1e-2 - else: - lr_adj = 1e-3 - for param_group in self._optimizer.param_groups: - param_group["lr"] = self._base_lr * self._cluster_size * self._batches_per_allreduce * lr_adj - self._logger.debug(f"Adjust learning rate {param_group['lr']}") diff --git a/cv_lib/cv_lib/event_handlers/tensorboard_handlers.py b/cv_lib/cv_lib/event_handlers/tensorboard_handlers.py index a9ba5f4c..30cb5fc4 100644 --- a/cv_lib/cv_lib/event_handlers/tensorboard_handlers.py +++ b/cv_lib/cv_lib/event_handlers/tensorboard_handlers.py @@ -1,12 +1,14 @@ # Copyright (c) Microsoft Corporation. # Licensed under the MIT License. -from toolz import curry import torchvision +from tensorboardX import SummaryWriter import logging import logging.config +from toolz import curry -from tensorboardX import SummaryWriter +from cv_lib.segmentation.dutchf3.utils import np_to_tb +from deepseismic_interpretation.dutchf3.data import decode_segmap def create_summary_writer(log_dir): @@ -14,18 +16,29 @@ def create_summary_writer(log_dir): return writer +def _transform_image(output_tensor): + output_tensor = output_tensor.cpu() + return torchvision.utils.make_grid(output_tensor, normalize=True, scale_each=True) + + +def _transform_pred(output_tensor, n_classes): + output_tensor = output_tensor.squeeze().cpu().numpy() + decoded = decode_segmap(output_tensor, n_classes) + return torchvision.utils.make_grid(np_to_tb(decoded), normalize=False, scale_each=False) + + def _log_model_output(log_label, summary_writer, engine): summary_writer.add_scalar(log_label, engine.state.output["loss"], engine.state.iteration) @curry def log_training_output(summary_writer, engine): - _log_model_output("training/loss", summary_writer, engine) + _log_model_output("Training/loss", summary_writer, engine) @curry def log_validation_output(summary_writer, engine): - _log_model_output("validation/loss", summary_writer, engine) + _log_model_output("Validation/loss", summary_writer, engine) @curry @@ -42,31 +55,60 @@ def log_lr(summary_writer, optimizer, log_interval, engine): summary_writer.add_scalar("lr", lr[0], getattr(engine.state, log_interval)) -_DEFAULT_METRICS = {"accuracy": "Avg accuracy :", "nll": "Avg loss :"} - - +# TODO: This is deprecated, and will be removed in the future. @curry -def log_metrics(summary_writer, train_engine, log_interval, engine, metrics_dict=_DEFAULT_METRICS): +def log_metrics(summary_writer, train_engine, log_interval, engine, metrics_dict={"pixacc": "Avg accuracy :", "nll": "Avg loss :"}): metrics = engine.state.metrics for m in metrics_dict: - summary_writer.add_scalar( - metrics_dict[m], metrics[m], getattr(train_engine.state, log_interval) - ) + summary_writer.add_scalar(metrics_dict[m], metrics[m], getattr(train_engine.state, log_interval)) -def create_image_writer( - summary_writer, label, output_variable, normalize=False, transform_func=lambda x: x -): +# TODO: This is deprecated, and will be removed in the future. +def create_image_writer(summary_writer, label, output_variable, normalize=False, transform_func=lambda x: x): logger = logging.getLogger(__name__) + logger.warning( + "create_image_writer() in tensorboard_handlers.py is deprecated, and will be removed in a future update." + ) def write_to(engine): try: data_tensor = transform_func(engine.state.output[output_variable]) - image_grid = torchvision.utils.make_grid( - data_tensor, normalize=normalize, scale_each=True - ) + image_grid = torchvision.utils.make_grid(data_tensor, normalize=normalize, scale_each=True) summary_writer.add_image(label, image_grid, engine.state.epoch) except KeyError: logger.warning("Predictions and or ground truth labels not available to report") return write_to + + +def log_results(engine, evaluator, summary_writer, n_classes, stage): + epoch = engine.state.epoch + metrics = evaluator.state.metrics + outputs = evaluator.state.output + + # Log Metrics: + summary_writer.add_scalar(f"{stage}/mIoU", metrics["mIoU"], epoch) + summary_writer.add_scalar(f"{stage}/nll", metrics["nll"], epoch) + summary_writer.add_scalar(f"{stage}/mca", metrics["mca"], epoch) + summary_writer.add_scalar(f"{stage}/pixacc", metrics["pixacc"], epoch) + + for i in range(n_classes): + summary_writer.add_scalar(f"{stage}/IoU_class_" + str(i), metrics["ciou"][i], epoch) + + # Log Images: + image = outputs["image"] + mask = outputs["mask"] + y_pred = outputs["y_pred"].max(1, keepdim=True)[1] + VISUALIZATION_LIMIT = 8 + + if evaluator.state.batch[0].shape[0] > VISUALIZATION_LIMIT: + image = image[:VISUALIZATION_LIMIT] + mask = mask[:VISUALIZATION_LIMIT] + y_pred = y_pred[:VISUALIZATION_LIMIT] + + # Mask out the region in y_pred where padding exists in the mask: + y_pred[mask == 255] = 255 + + summary_writer.add_image(f"{stage}/Image", _transform_image(image), epoch) + summary_writer.add_image(f"{stage}/Mask", _transform_pred(mask, n_classes), epoch) + summary_writer.add_image(f"{stage}/Pred", _transform_pred(y_pred, n_classes), epoch) diff --git a/environment/anaconda/local/environment.yml b/environment/anaconda/local/environment.yml index de0d65af..40b28a34 100644 --- a/environment/anaconda/local/environment.yml +++ b/environment/anaconda/local/environment.yml @@ -11,7 +11,6 @@ dependencies: - ipykernel - torchvision>=0.5.0 - pandas==0.25.3 - - opencv==4.1.2 - scikit-learn==0.21.3 - tensorflow==2.0 - opt-einsum>=2.3.2 diff --git a/environment/docker/apex/dockerfile b/environment/docker/apex/dockerfile index 3becd3c4..9dcf5615 100644 --- a/environment/docker/apex/dockerfile +++ b/environment/docker/apex/dockerfile @@ -10,7 +10,7 @@ RUN git clone https://github.com/NVIDIA/apex && \ cd apex && \ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./ -RUN pip install toolz pytorch-ignite torchvision pandas opencv-python fire tensorboardx scikit-learn yacs +RUN pip install toolz pytorch-ignite torchvision pandas fire tensorboardx scikit-learn yacs WORKDIR /workspace CMD /bin/bash \ No newline at end of file diff --git a/environment/docker/horovod/dockerfile b/environment/docker/horovod/dockerfile index 0e12f455..04ed2f67 100644 --- a/environment/docker/horovod/dockerfile +++ b/environment/docker/horovod/dockerfile @@ -60,7 +60,7 @@ RUN pip install future typing RUN pip install numpy RUN pip install https://download.pytorch.org/whl/cu100/torch-${PYTORCH_VERSION}-$(python -c "import wheel.pep425tags as w; print('-'.join(w.get_supported()[0]))").whl \ https://download.pytorch.org/whl/cu100/torchvision-${TORCHVISION_VERSION}-$(python -c "import wheel.pep425tags as w; print('-'.join(w.get_supported()[0]))").whl -RUN pip install --no-cache-dir torchvision h5py toolz pytorch-ignite pandas opencv-python fire tensorboardx scikit-learn tqdm yacs albumentations gitpython +RUN pip install --no-cache-dir torchvision h5py toolz pytorch-ignite pandas fire tensorboardx scikit-learn tqdm yacs albumentations gitpython COPY ComputerVision_fork/contrib /contrib RUN pip install -e /contrib COPY DeepSeismic /DeepSeismic diff --git a/examples/interpretation/notebooks/Dutch_F3_patch_model_training_and_evaluation.ipynb b/examples/interpretation/notebooks/Dutch_F3_patch_model_training_and_evaluation.ipynb index de4e8542..6f830701 100644 --- a/examples/interpretation/notebooks/Dutch_F3_patch_model_training_and_evaluation.ipynb +++ b/examples/interpretation/notebooks/Dutch_F3_patch_model_training_and_evaluation.ipynb @@ -473,7 +473,7 @@ " PadIfNeeded(\n", " min_height=config.TRAIN.AUGMENTATIONS.PAD.HEIGHT,\n", " min_width=config.TRAIN.AUGMENTATIONS.PAD.WIDTH,\n", - " border_mode=cv2.BORDER_CONSTANT,\n", + " border_mode=config.OPENCV_BORDER_CONSTANT,\n", " always_apply=True,\n", " mask_value=255,\n", " ),\n", @@ -534,7 +534,6 @@ " augmentations=val_aug,\n", ")\n", "\n", - "# TODO: workaround for Ignite 0.3.0 bug as epoch_lengh in trainer.run method below doesn't apply to validation set\n", "if papermill:\n", " val_set = data.Subset(val_set, range(3))\n", "elif DEMO:\n", @@ -578,7 +577,7 @@ "else:\n", " train_len = len(train_loader)\n", "\n", - "snapshot_duration = scheduler_step * train_len" + "snapshot_duration = scheduler_step * train_len if not papermill else 2*len(train_loader)" ] }, { @@ -643,7 +642,7 @@ "\n", "# learning rate scheduler\n", "scheduler = CosineAnnealingScheduler(\n", - " optimizer, \"lr\", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, snapshot_duration\n", + " optimizer, \"lr\", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, cycle_size=snapshot_duration\n", ")\n", "\n", "# weights are inversely proportional to the frequency of the classes in the training set\n", @@ -984,7 +983,7 @@ " PadIfNeeded(\n", " min_height=config.TRAIN.AUGMENTATIONS.PAD.HEIGHT,\n", " min_width=config.TRAIN.AUGMENTATIONS.PAD.WIDTH,\n", - " border_mode=cv2.BORDER_CONSTANT,\n", + " border_mode=config.OPENCV_BORDER_CONSTANT,\n", " always_apply=True,\n", " mask_value=255,\n", " ),\n", diff --git a/experiments/interpretation/dutchf3_patch/local/configs/hrnet.yaml b/experiments/interpretation/dutchf3_patch/local/configs/hrnet.yaml index 9d705df2..52263bbf 100644 --- a/experiments/interpretation/dutchf3_patch/local/configs/hrnet.yaml +++ b/experiments/interpretation/dutchf3_patch/local/configs/hrnet.yaml @@ -9,6 +9,7 @@ WORKERS: 4 PRINT_FREQ: 10 LOG_CONFIG: logging.conf SEED: 2019 +OPENCV_BORDER_CONSTANT: 0 DATASET: @@ -73,7 +74,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "section" #"patch" # Options are No, Patch and Section + DEPTH: "section" # Options are: none, patch, and section STRIDE: 50 PATCH_SIZE: 100 AUGMENTATIONS: @@ -82,7 +83,7 @@ TRAIN: WIDTH: 200 PAD: HEIGHT: 256 - WIDTH: 256 + WIDTH: 256 MEAN: 0.0009997 # 0.0009996710808862074 STD: 0.20977 # 0.20976548783479299 MODEL_DIR: "models" @@ -91,12 +92,12 @@ TRAIN: VALIDATION: BATCH_SIZE_PER_GPU: 128 -TEST: +TEST: MODEL_PATH: "/data/home/mat/repos/DeepSeismic/experiments/interpretation/dutchf3_patch/local/output/staging/0d1d2bbf9685995a0515ca1d9de90f9bcec0db90/seg_hrnet/Dec20_233535/models/seg_hrnet_running_model_33.pth" TEST_STRIDE: 10 SPLIT: 'Both' # Can be Both, Test1, Test2 INLINE: True CROSSLINE: True - POST_PROCESSING: - SIZE: 128 # + POST_PROCESSING: + SIZE: 128 # CROP_PIXELS: 14 # Number of pixels to crop top, bottom, left and right diff --git a/experiments/interpretation/dutchf3_patch/local/configs/patch_deconvnet.yaml b/experiments/interpretation/dutchf3_patch/local/configs/patch_deconvnet.yaml index 7c695f96..35787b95 100644 --- a/experiments/interpretation/dutchf3_patch/local/configs/patch_deconvnet.yaml +++ b/experiments/interpretation/dutchf3_patch/local/configs/patch_deconvnet.yaml @@ -29,7 +29,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "none" # Options are None, Patch and Section + DEPTH: "none" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 99 AUGMENTATIONS: diff --git a/experiments/interpretation/dutchf3_patch/local/configs/patch_deconvnet_skip.yaml b/experiments/interpretation/dutchf3_patch/local/configs/patch_deconvnet_skip.yaml index d14ea134..46fab9f6 100644 --- a/experiments/interpretation/dutchf3_patch/local/configs/patch_deconvnet_skip.yaml +++ b/experiments/interpretation/dutchf3_patch/local/configs/patch_deconvnet_skip.yaml @@ -29,7 +29,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "none" #"patch" # Options are None, Patch and Section + DEPTH: "none" #"patch" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 99 AUGMENTATIONS: diff --git a/experiments/interpretation/dutchf3_patch/local/configs/seresnet_unet.yaml b/experiments/interpretation/dutchf3_patch/local/configs/seresnet_unet.yaml index d0b8126f..9bc10d34 100644 --- a/experiments/interpretation/dutchf3_patch/local/configs/seresnet_unet.yaml +++ b/experiments/interpretation/dutchf3_patch/local/configs/seresnet_unet.yaml @@ -30,7 +30,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "section" # Options are No, Patch and Section + DEPTH: "section" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 100 AUGMENTATIONS: diff --git a/experiments/interpretation/dutchf3_patch/local/configs/unet.yaml b/experiments/interpretation/dutchf3_patch/local/configs/unet.yaml index c31157bf..3a8ee71a 100644 --- a/experiments/interpretation/dutchf3_patch/local/configs/unet.yaml +++ b/experiments/interpretation/dutchf3_patch/local/configs/unet.yaml @@ -33,7 +33,7 @@ TRAIN: WEIGHT_DECAY: 0.0001 SNAPSHOTS: 5 AUGMENTATION: True - DEPTH: "section" # Options are No, Patch and Section + DEPTH: "section" # Options are none, patch, and section STRIDE: 50 PATCH_SIZE: 100 AUGMENTATIONS: diff --git a/experiments/interpretation/dutchf3_patch/local/default.py b/experiments/interpretation/dutchf3_patch/local/default.py index aac539ea..f2cadfc1 100644 --- a/experiments/interpretation/dutchf3_patch/local/default.py +++ b/experiments/interpretation/dutchf3_patch/local/default.py @@ -20,7 +20,7 @@ _C.PIN_MEMORY = True _C.LOG_CONFIG = "logging.conf" _C.SEED = 42 - +_C.OPENCV_BORDER_CONSTANT = 0 # Cudnn related params _C.CUDNN = CN() @@ -58,8 +58,8 @@ _C.TRAIN.PATCH_SIZE = 99 _C.TRAIN.MEAN = 0.0009997 # 0.0009996710808862074 _C.TRAIN.STD = 0.20977 # 0.20976548783479299 # TODO: Should we apply std scaling? -# issue: https://github.com/microsoft/seismic-deeplearning/issues/269 -_C.TRAIN.DEPTH = "no" # Options are None, Patch and Section +_C.TRAIN.DEPTH = "none" # Options are: none, patch, and section + # None adds no depth information and the num of channels remains at 1 # Patch adds depth per patch so is simply the height of that patch from 0 to 1, channels=3 # Section adds depth per section so contains depth information for the whole section, channels=3 diff --git a/experiments/interpretation/dutchf3_patch/local/test.py b/experiments/interpretation/dutchf3_patch/local/test.py index ee4c0b09..631fd4f4 100644 --- a/experiments/interpretation/dutchf3_patch/local/test.py +++ b/experiments/interpretation/dutchf3_patch/local/test.py @@ -18,34 +18,22 @@ import os from os import path -import cv2 import fire import numpy as np import torch import torch.nn.functional as F -from PIL import Image from albumentations import Compose, Normalize, PadIfNeeded, Resize -from cv_lib.utils import load_log_configuration +from matplotlib import cm +from PIL import Image +from toolz import compose, curry, itertoolz, pipe, take +from torch.utils import data + from cv_lib.segmentation import models -from cv_lib.segmentation.dutchf3.utils import ( - current_datetime, - generate_path, - git_branch, - git_hash, -) -from deepseismic_interpretation.dutchf3.data import ( - add_patch_depth_channels, - get_seismic_labels, - get_test_loader, -) +from cv_lib.segmentation.dutchf3.utils import current_datetime, generate_path, git_branch, git_hash +from cv_lib.utils import load_log_configuration +from deepseismic_interpretation.dutchf3.data import add_patch_depth_channels, get_seismic_labels, get_test_loader from default import _C as config from default import update_config -from toolz import compose, curry, itertoolz, pipe -from torch.utils import data -from toolz import take - -from matplotlib import cm - _CLASS_NAMES = [ "upper_ns", @@ -63,9 +51,9 @@ def __init__(self, n_classes): def _fast_hist(self, label_true, label_pred, n_class): mask = (label_true >= 0) & (label_true < n_class) - hist = np.bincount( - n_class * label_true[mask].astype(int) + label_pred[mask], minlength=n_class ** 2, - ).reshape(n_class, n_class) + hist = np.bincount(n_class * label_true[mask].astype(int) + label_pred[mask], minlength=n_class ** 2,).reshape( + n_class, n_class + ) return hist def update(self, label_trues, label_preds): @@ -201,9 +189,7 @@ def _compose_processing_pipeline(depth, aug=None): def _generate_batches(h, w, ps, patch_size, stride, batch_size=64): - hdc_wdx_generator = itertools.product( - range(0, h - patch_size + ps, stride), range(0, w - patch_size + ps, stride), - ) + hdc_wdx_generator = itertools.product(range(0, h - patch_size + ps, stride), range(0, w - patch_size + ps, stride),) for batch_indexes in itertoolz.partition_all(batch_size, hdc_wdx_generator): yield batch_indexes @@ -214,9 +200,7 @@ def _output_processing_pipeline(config, output): _, _, h, w = output.shape if config.TEST.POST_PROCESSING.SIZE != h or config.TEST.POST_PROCESSING.SIZE != w: output = F.interpolate( - output, - size=(config.TEST.POST_PROCESSING.SIZE, config.TEST.POST_PROCESSING.SIZE,), - mode="bilinear", + output, size=(config.TEST.POST_PROCESSING.SIZE, config.TEST.POST_PROCESSING.SIZE,), mode="bilinear", ) if config.TEST.POST_PROCESSING.CROP_PIXELS > 0: @@ -231,15 +215,7 @@ def _output_processing_pipeline(config, output): def _patch_label_2d( - model, - img, - pre_processing, - output_processing, - patch_size, - stride, - batch_size, - device, - num_classes, + model, img, pre_processing, output_processing, patch_size, stride, batch_size, device, num_classes, ): """Processes a whole section """ @@ -254,19 +230,14 @@ def _patch_label_2d( # generate output: for batch_indexes in _generate_batches(h, w, ps, patch_size, stride, batch_size=batch_size): batch = torch.stack( - [ - pipe(img_p, _extract_patch(hdx, wdx, ps, patch_size), pre_processing,) - for hdx, wdx in batch_indexes - ], + [pipe(img_p, _extract_patch(hdx, wdx, ps, patch_size), pre_processing,) for hdx, wdx in batch_indexes], dim=0, ) model_output = model(batch.to(device)) for (hdx, wdx), output in zip(batch_indexes, model_output.detach().cpu()): output = output_processing(output) - output_p[ - :, :, hdx + ps : hdx + ps + patch_size, wdx + ps : wdx + ps + patch_size, - ] += output + output_p[:, :, hdx + ps : hdx + ps + patch_size, wdx + ps : wdx + ps + patch_size,] += output # crop the output_p in the middle output = output_p[:, :, ps:-ps, ps:-ps] @@ -291,22 +262,12 @@ def to_image(label_mask, n_classes=6): def _evaluate_split( - split, - section_aug, - model, - pre_processing, - output_processing, - device, - running_metrics_overall, - config, - debug=False, + split, section_aug, model, pre_processing, output_processing, device, running_metrics_overall, config, debug=False, ): logger = logging.getLogger(__name__) TestSectionLoader = get_test_loader(config) - test_set = TestSectionLoader( - config.DATASET.ROOT, split=split, is_transform=True, augmentations=section_aug, - ) + test_set = TestSectionLoader(config.DATASET.ROOT, split=split, is_transform=True, augmentations=section_aug,) n_classes = test_set.n_classes @@ -318,16 +279,10 @@ def _evaluate_split( try: output_dir = generate_path( - config.OUTPUT_DIR + "_test", - git_branch(), - git_hash(), - config.MODEL.NAME, - current_datetime(), + config.OUTPUT_DIR + "_test", git_branch(), git_hash(), config.MODEL.NAME, current_datetime(), ) except TypeError: - output_dir = generate_path( - config.OUTPUT_DIR + "_test", config.MODEL.NAME, current_datetime(), - ) + output_dir = generate_path(config.OUTPUT_DIR + "_test", config.MODEL.NAME, current_datetime(),) running_metrics_split = runningScore(n_classes) @@ -415,23 +370,19 @@ def test(*options, cfg=None, debug=False): running_metrics_overall = runningScore(n_classes) # Augmentation - section_aug = Compose( - [Normalize(mean=(config.TRAIN.MEAN,), std=(config.TRAIN.STD,), max_pixel_value=1,)] - ) + section_aug = Compose([Normalize(mean=(config.TRAIN.MEAN,), std=(config.TRAIN.STD,), max_pixel_value=1,)]) # TODO: make sure that this is consistent with how normalization and agumentation for train.py # issue: https://github.com/microsoft/seismic-deeplearning/issues/270 patch_aug = Compose( [ Resize( - config.TRAIN.AUGMENTATIONS.RESIZE.HEIGHT, - config.TRAIN.AUGMENTATIONS.RESIZE.WIDTH, - always_apply=True, + config.TRAIN.AUGMENTATIONS.RESIZE.HEIGHT, config.TRAIN.AUGMENTATIONS.RESIZE.WIDTH, always_apply=True, ), PadIfNeeded( min_height=config.TRAIN.AUGMENTATIONS.PAD.HEIGHT, min_width=config.TRAIN.AUGMENTATIONS.PAD.WIDTH, - border_mode=cv2.BORDER_CONSTANT, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=255, ), diff --git a/experiments/interpretation/dutchf3_patch/local/train.py b/experiments/interpretation/dutchf3_patch/local/train.py index 9c77d713..14dcf726 100644 --- a/experiments/interpretation/dutchf3_patch/local/train.py +++ b/experiments/interpretation/dutchf3_patch/local/train.py @@ -17,52 +17,24 @@ import logging.config from os import path -import cv2 import fire import numpy as np import torch +from torch.utils import data from albumentations import Compose, HorizontalFlip, Normalize, PadIfNeeded, Resize from ignite.contrib.handlers import CosineAnnealingScheduler from ignite.engine import Events from ignite.metrics import Loss from ignite.utils import convert_tensor -from toolz import compose -from torch.utils import data -from deepseismic_interpretation.dutchf3.data import get_patch_loader, decode_segmap +from cv_lib.event_handlers import SnapshotHandler, logging_handlers, tensorboard_handlers +from cv_lib.event_handlers.tensorboard_handlers import create_summary_writer, log_results +from cv_lib.segmentation import extract_metric_from, models +from cv_lib.segmentation.dutchf3.engine import create_supervised_evaluator, create_supervised_trainer +from cv_lib.segmentation.dutchf3.utils import current_datetime, generate_path, git_branch, git_hash +from cv_lib.segmentation.metrics import class_accuracy, class_iou, mean_class_accuracy, mean_iou, pixelwise_accuracy from cv_lib.utils import load_log_configuration -from cv_lib.event_handlers import ( - SnapshotHandler, - logging_handlers, - tensorboard_handlers, -) -from cv_lib.event_handlers.logging_handlers import Evaluator -from cv_lib.event_handlers.tensorboard_handlers import ( - create_image_writer, - create_summary_writer, -) -from cv_lib.segmentation import models, extract_metric_from -from cv_lib.segmentation.dutchf3.engine import ( - create_supervised_evaluator, - create_supervised_trainer, -) - -from cv_lib.segmentation.metrics import ( - pixelwise_accuracy, - class_accuracy, - mean_class_accuracy, - class_iou, - mean_iou, -) - -from cv_lib.segmentation.dutchf3.utils import ( - current_datetime, - generate_path, - git_branch, - git_hash, - np_to_tb, -) - +from deepseismic_interpretation.dutchf3.data import get_patch_loader from default import _C as config from default import update_config @@ -90,44 +62,50 @@ def run(*options, cfg=None, debug=False): cfg (str, optional): Location of config file to load. Defaults to None. debug (bool): Places scripts in debug/test mode and only executes a few iterations """ - + # Configuration: update_config(config, options=options, config_file=cfg) - - # we will write the model under outputs / config_file_name / model_dir + # The model will be saved under: outputs// config_file_name = "default_config" if not cfg else cfg.split("/")[-1].split(".")[0] + try: + output_dir = generate_path( + config.OUTPUT_DIR, git_branch(), git_hash(), config_file_name, config.TRAIN.MODEL_DIR, current_datetime(), + ) + except TypeError: + output_dir = generate_path(config.OUTPUT_DIR, config_file_name, config.TRAIN.MODEL_DIR, current_datetime(),) - # Start logging + # Logging: load_log_configuration(config.LOG_CONFIG) logger = logging.getLogger(__name__) logger.debug(config.WORKERS) + + # Set CUDNN benchmark mode: torch.backends.cudnn.benchmark = config.CUDNN.BENCHMARK + # Fix random seeds: torch.manual_seed(config.SEED) if torch.cuda.is_available(): torch.cuda.manual_seed_all(config.SEED) np.random.seed(seed=config.SEED) - # Setup Augmentations + # Augmentation: basic_aug = Compose( [ Normalize(mean=(config.TRAIN.MEAN,), std=(config.TRAIN.STD,), max_pixel_value=1), PadIfNeeded( min_height=config.TRAIN.PATCH_SIZE, min_width=config.TRAIN.PATCH_SIZE, - border_mode=0, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=255, value=0, ), Resize( - config.TRAIN.AUGMENTATIONS.RESIZE.HEIGHT, - config.TRAIN.AUGMENTATIONS.RESIZE.WIDTH, - always_apply=True, + config.TRAIN.AUGMENTATIONS.RESIZE.HEIGHT, config.TRAIN.AUGMENTATIONS.RESIZE.WIDTH, always_apply=True, ), PadIfNeeded( min_height=config.TRAIN.AUGMENTATIONS.PAD.HEIGHT, min_width=config.TRAIN.AUGMENTATIONS.PAD.WIDTH, - border_mode=cv2.BORDER_CONSTANT, + border_mode=config.OPENCV_BORDER_CONSTANT, always_apply=True, mask_value=255, ), @@ -139,8 +117,8 @@ def run(*options, cfg=None, debug=False): else: train_aug = val_aug = basic_aug + # Training and Validation Loaders: TrainPatchLoader = get_patch_loader(config) - train_set = TrainPatchLoader( config.DATASET.ROOT, split="train", @@ -150,6 +128,7 @@ def run(*options, cfg=None, debug=False): augmentations=train_aug, ) logger.info(train_set) + n_classes = train_set.n_classes val_set = TrainPatchLoader( config.DATASET.ROOT, split="val", @@ -160,27 +139,22 @@ def run(*options, cfg=None, debug=False): ) logger.info(val_set) - train_loader = data.DataLoader( - train_set, - batch_size=config.TRAIN.BATCH_SIZE_PER_GPU, - num_workers=config.WORKERS, - shuffle=True, - ) - if debug: - val_set = data.Subset(val_set, range(3)) + logger.info("Running in debug mode..") + train_set = data.Subset(train_set, list(range(4))) + val_set = data.Subset(val_set, list(range(4))) - val_loader = data.DataLoader( - val_set, batch_size=config.VALIDATION.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS, + train_loader = data.DataLoader( + train_set, batch_size=config.TRAIN.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS, shuffle=True ) + val_loader = data.DataLoader(val_set, batch_size=config.VALIDATION.BATCH_SIZE_PER_GPU, num_workers=config.WORKERS) + # Model: model = getattr(models, config.MODEL.NAME).get_seg_model(config) + device = "cuda" if torch.cuda.is_available() else "cpu" + model = model.to(device) - device = "cpu" - if torch.cuda.is_available(): - device = "cuda" - model = model.to(device) # Send to GPU - + # Optimizer and LR Scheduler: optimizer = torch.optim.SGD( model.parameters(), lr=config.TRAIN.MAX_LR, @@ -188,92 +162,44 @@ def run(*options, cfg=None, debug=False): weight_decay=config.TRAIN.WEIGHT_DECAY, ) - # learning rate scheduler - scheduler_step = config.TRAIN.END_EPOCH // config.TRAIN.SNAPSHOTS - snapshot_duration = scheduler_step * len(train_loader) + epochs_per_cycle = config.TRAIN.END_EPOCH // config.TRAIN.SNAPSHOTS + snapshot_duration = epochs_per_cycle * len(train_loader) if not debug else 2*len(train_loader) scheduler = CosineAnnealingScheduler( - optimizer, "lr", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, snapshot_duration + optimizer, "lr", config.TRAIN.MAX_LR, config.TRAIN.MIN_LR, cycle_size=snapshot_duration ) - # weights are inversely proportional to the frequency of the classes in the - # training set + # Tensorboard writer: + summary_writer = create_summary_writer(log_dir=path.join(output_dir, "logs")) + + # class weights are inversely proportional to the frequency of the classes in the training set class_weights = torch.tensor(config.DATASET.CLASS_WEIGHTS, device=device, requires_grad=False) + # Loss: criterion = torch.nn.CrossEntropyLoss(weight=class_weights, ignore_index=255, reduction="mean") + # Ignite trainer and evaluator: trainer = create_supervised_trainer(model, optimizer, criterion, prepare_batch, device=device) - - trainer.add_event_handler(Events.ITERATION_STARTED, scheduler) - - ######################### - # Logging setup below - - try: - output_dir = generate_path( - config.OUTPUT_DIR, - git_branch(), - git_hash(), - config_file_name, - config.TRAIN.MODEL_DIR, - current_datetime(), - ) - except TypeError: - output_dir = generate_path( - config.OUTPUT_DIR, config_file_name, config.TRAIN.MODEL_DIR, current_datetime(), - ) - - summary_writer = create_summary_writer(log_dir=path.join(output_dir, config.LOG_DIR)) - - # log all training output - trainer.add_event_handler( - Events.ITERATION_COMPLETED, - logging_handlers.log_training_output(log_interval=config.TRAIN.BATCH_SIZE_PER_GPU), - ) - - # add logging of learning rate - trainer.add_event_handler(Events.EPOCH_STARTED, logging_handlers.log_lr(optimizer)) - - # log LR to tensorboard - trainer.add_event_handler( - Events.EPOCH_STARTED, tensorboard_handlers.log_lr(summary_writer, optimizer, "epoch"), - ) - - # log training summary to tensorboard as well - trainer.add_event_handler( - Events.ITERATION_COMPLETED, tensorboard_handlers.log_training_output(summary_writer), - ) - - def _select_pred_and_mask(model_out_dict): - return (model_out_dict["y_pred"].squeeze(), model_out_dict["mask"].squeeze()) - - def _select_max(pred_tensor): - return pred_tensor.max(1)[1] - - def _tensor_to_numpy(pred_tensor): - return pred_tensor.squeeze().cpu().numpy() - - def snapshot_function(): - return (trainer.state.iteration % snapshot_duration) == 0 - - n_classes = train_set.n_classes - + transform_fn = lambda output_dict: (output_dict["y_pred"].squeeze(), output_dict["mask"].squeeze()) evaluator = create_supervised_evaluator( model, prepare_batch, metrics={ - "nll": Loss(criterion, output_transform=_select_pred_and_mask), - "pixacc": pixelwise_accuracy( - n_classes, output_transform=_select_pred_and_mask, device=device - ), - "cacc": class_accuracy(n_classes, output_transform=_select_pred_and_mask), - "mca": mean_class_accuracy(n_classes, output_transform=_select_pred_and_mask), - "ciou": class_iou(n_classes, output_transform=_select_pred_and_mask), - "mIoU": mean_iou(n_classes, output_transform=_select_pred_and_mask), + "nll": Loss(criterion, output_transform=transform_fn), + "pixacc": pixelwise_accuracy(n_classes, output_transform=transform_fn, device=device), + "cacc": class_accuracy(n_classes, output_transform=transform_fn), + "mca": mean_class_accuracy(n_classes, output_transform=transform_fn), + "ciou": class_iou(n_classes, output_transform=transform_fn), + "mIoU": mean_iou(n_classes, output_transform=transform_fn), }, device=device, ) - - trainer.add_event_handler(Events.EPOCH_COMPLETED, Evaluator(evaluator, val_loader)) + trainer.add_event_handler(Events.ITERATION_STARTED, scheduler) + + # Logging: + trainer.add_event_handler( + Events.ITERATION_COMPLETED, logging_handlers.log_training_output(log_interval=config.TRAIN.BATCH_SIZE_PER_GPU), + ) + trainer.add_event_handler(Events.EPOCH_COMPLETED, logging_handlers.log_lr(optimizer)) evaluator.add_event_handler( Events.EPOCH_COMPLETED, @@ -288,51 +214,40 @@ def snapshot_function(): ), ) - evaluator.add_event_handler( - Events.EPOCH_COMPLETED, - tensorboard_handlers.log_metrics( - summary_writer, - trainer, - "epoch", - metrics_dict={ - "mIoU": "Validation/mIoU", - "nll": "Validation/Loss", - "mca": "Validation/MCA", - "pixacc": "Validation/Pixel_Acc", - }, - ), - ) - - transform_func = compose(np_to_tb, decode_segmap(n_classes=n_classes), _tensor_to_numpy) + # Tensorboard and Logging: + trainer.add_event_handler(Events.ITERATION_COMPLETED, tensorboard_handlers.log_training_output(summary_writer)) + trainer.add_event_handler(Events.ITERATION_COMPLETED, tensorboard_handlers.log_validation_output(summary_writer)) - transform_pred = compose(transform_func, _select_max) + @trainer.on(Events.EPOCH_COMPLETED) + def log_training_results(engine): + evaluator.run(train_loader) + log_results(engine, evaluator, summary_writer, n_classes, stage="Training") - evaluator.add_event_handler( - Events.EPOCH_COMPLETED, create_image_writer(summary_writer, "Validation/Image", "image"), - ) - evaluator.add_event_handler( - Events.EPOCH_COMPLETED, - create_image_writer( - summary_writer, "Validation/Mask", "mask", transform_func=transform_func - ), - ) - evaluator.add_event_handler( - Events.EPOCH_COMPLETED, - create_image_writer( - summary_writer, "Validation/Pred", "y_pred", transform_func=transform_pred - ), - ) + @trainer.on(Events.EPOCH_COMPLETED) + def log_validation_results(engine): + evaluator.run(val_loader) + log_results(engine, evaluator, summary_writer, n_classes, stage="Validation") + # Checkpointing: checkpoint_handler = SnapshotHandler( - output_dir, config.MODEL.NAME, extract_metric_from("mIoU"), snapshot_function, + output_dir, + config.MODEL.NAME, + extract_metric_from("mIoU"), + lambda: (trainer.state.iteration % snapshot_duration) == 0, ) evaluator.add_event_handler(Events.EPOCH_COMPLETED, checkpoint_handler, {"model": model}) - logger.info("Starting training") + logger.info("Starting training") if debug: - trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length = config.TRAIN.BATCH_SIZE_PER_GPU, seed = config.SEED) + trainer.run( + train_loader, + max_epochs=config.TRAIN.END_EPOCH, + epoch_length=config.TRAIN.BATCH_SIZE_PER_GPU, + seed=config.SEED, + ) else: - trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length = len(train_loader), seed = config.SEED) + trainer.run(train_loader, max_epochs=config.TRAIN.END_EPOCH, epoch_length=len(train_loader), seed=config.SEED) + summary_writer.close() if __name__ == "__main__": diff --git a/interpretation/deepseismic_interpretation/dutchf3/data.py b/interpretation/deepseismic_interpretation/dutchf3/data.py index e11dd059..4517f510 100644 --- a/interpretation/deepseismic_interpretation/dutchf3/data.py +++ b/interpretation/deepseismic_interpretation/dutchf3/data.py @@ -579,12 +579,14 @@ def __getitem__(self, index): patch_name = self.patches[index] direction, idx, xdx, ddx = patch_name.split(sep="_") + # Shift offsets the padding that is added in training # shift = self.patch_size if "test" not in self.split else 0 # TODO: Remember we are cancelling the shift since we no longer pad # issue: https://github.com/microsoft/seismic-deeplearning/issues/273 shift = 0 idx, xdx, ddx = int(idx) + shift, int(xdx) + shift, int(ddx) + shift + if direction == "i": im = self.seismic[idx, :, xdx : xdx + self.patch_size, ddx : ddx + self.patch_size] lbl = self.labels[idx, xdx : xdx + self.patch_size, ddx : ddx + self.patch_size] @@ -604,11 +606,11 @@ def __getitem__(self, index): if self.is_transform: im, lbl = self.transform(im, lbl) return im, lbl - + def __repr__(self): unique, counts = np.unique(self.labels, return_counts=True) - ratio = counts/np.sum(counts) - return "\n".join(f"{lbl}: {cnt} [{rat}]"for lbl, cnt, rat in zip(unique, counts, ratio)) + ratio = counts / np.sum(counts) + return "\n".join(f"{lbl}: {cnt} [{rat}]" for lbl, cnt, rat in zip(unique, counts, ratio)) _TRAIN_PATCH_LOADERS = { @@ -619,7 +621,7 @@ def __repr__(self): _TRAIN_SECTION_LOADERS = {"section": TrainSectionLoaderWithDepth} def get_patch_loader(cfg): - assert cfg.TRAIN.DEPTH in [ + assert str(cfg.TRAIN.DEPTH).lower() in [ "section", "patch", "none", @@ -629,7 +631,7 @@ def get_patch_loader(cfg): def get_section_loader(cfg): - assert cfg.TRAIN.DEPTH in [ + assert str(cfg.TRAIN.DEPTH).lower() in [ "section", "none", ], f"Depth {cfg.TRAIN.DEPTH} not supported for section data. \ @@ -693,7 +695,7 @@ def get_seismic_labels(): @curry -def decode_segmap(label_mask, n_classes=6, label_colours=get_seismic_labels()): +def decode_segmap(label_mask, n_classes, label_colours=get_seismic_labels()): """Decode segmentation class labels into a colour image Args: label_mask (np.ndarray): an (N,H,W) array of integer values denoting diff --git a/tests/cicd/main_build.yml b/tests/cicd/main_build.yml index 3a343233..bfc9b026 100644 --- a/tests/cicd/main_build.yml +++ b/tests/cicd/main_build.yml @@ -159,35 +159,42 @@ jobs: pids= export CUDA_VISIBLE_DEVICES=0 # find the latest model which we just trained - model=$(ls -td output/patch_deconvnet/no_depth/* | head -1) + model_dir=$(ls -td output/patch_deconvnet/no_depth/* | head -1) + model=$(ls -t ${model_dir}/*.pth | head -1) # try running the test script { python test.py 'DATASET.ROOT' '/home/alfred/data_dynamic/dutch_f3/data' \ - 'TEST.MODEL_PATH' ${model}/patch_deconvnet_running_model_0.*.pth \ + 'TEST.MODEL_PATH' ${model} \ --cfg=configs/patch_deconvnet.yaml --debug ; echo "$?" > "$dir/$BASHPID"; } & pids+=" $!" export CUDA_VISIBLE_DEVICES=1 # find the latest model which we just trained - model=$(ls -td output/unet/section_depth/* | head -1) + model_dir=$(ls -td output/unet/section_depth/* | head -1) + model=$(ls -t ${model_dir}/*.pth | head -1) + # try running the test script { python test.py 'DATASET.ROOT' '/home/alfred/data_dynamic/dutch_f3/data' \ - 'TEST.MODEL_PATH' ${model}/resnet_unet_running_model_0.*.pth \ + 'TEST.MODEL_PATH' ${model} \ --cfg=configs/unet.yaml --debug ; echo "$?" > "$dir/$BASHPID"; } & pids+=" $!" export CUDA_VISIBLE_DEVICES=2 # find the latest model which we just trained - model=$(ls -td output/seresnet_unet/section_depth/* | head -1) + model_dir=$(ls -td output/seresnet_unet/section_depth/* | head -1) + model=$(ls -t ${model_dir}/*.pth | head -1) + # try running the test script { python test.py 'DATASET.ROOT' '/home/alfred/data_dynamic/dutch_f3/data' \ - 'TEST.MODEL_PATH' ${model}/resnet_unet_running_model_0.*.pth \ + 'TEST.MODEL_PATH' ${model} \ --cfg=configs/seresnet_unet.yaml --debug ; echo "$?" > "$dir/$BASHPID"; } & pids+=" $!" export CUDA_VISIBLE_DEVICES=3 # find the latest model which we just trained - model=$(ls -td output/hrnet/section_depth/* | head -1) + model_dir=$(ls -td output/hrnet/section_depth/* | head -1) + model=$(ls -t ${model_dir}/*.pth | head -1) + # try running the test script { python test.py 'DATASET.ROOT' '/home/alfred/data_dynamic/dutch_f3/data' \ 'MODEL.PRETRAINED' '/home/alfred/models/hrnetv2_w48_imagenet_pretrained.pth' \ - 'TEST.MODEL_PATH' ${model}/seg_hrnet_running_model_0.*.pth \ + 'TEST.MODEL_PATH' ${model} \ --cfg=configs/hrnet.yaml --debug ; echo "$?" > "$dir/$BASHPID"; } & pids+=" $!" @@ -204,4 +211,4 @@ jobs: # Remove the temporary directory rm -r "$dir" - echo "PASSED" + echo "PASSED" \ No newline at end of file