Skip to content

Commit

Permalink
* v0.3.4
Browse files Browse the repository at this point in the history
* Added MetricConfusionMatrixBase for adding custom confusion matrix based metrics
* Added ConfusionMatrixBasedMetric Enum to get specific metrics such as tp,fp,fn,tn,precision,sensitivity,specificity,recall,ppv,npv,accuracy,f1score
* Added confusion matrix common metrics (TruePositives, TrueNegatives, FalsePositives, FalseNegatives)
* Added MetricMethod enum to pass to MetricBase, now you can define whether your metric is based on MEAN, SUM or LAST of all batches
* StatsPrint callback now support "print_confusion_matrix" and "print_confusion_matrix_normalized" arguments in case MetricConfusionMatrixBase metric is found
* Added confusion matrix tests and example
* Some custom layers renames (breaking changes in this part)
  • Loading branch information
RoyToluna committed Nov 11, 2020
1 parent 37250dd commit 385d3f1
Show file tree
Hide file tree
Showing 26 changed files with 993 additions and 170 deletions.
11 changes: 11 additions & 0 deletions CHANGELOG.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@ Change Log
==========


0.3.4 (11/11/2020)
-----------------
* Added MetricConfusionMatrixBase for adding custom confusion matrix based metrics
* Added ConfusionMatrixBasedMetric Enum to get specific metrics such as tp,fp,fn,tn,precision,sensitivity,specificity,recall,ppv,npv,accuracy,f1score
* Added confusion matrix common metrics (TruePositives, TrueNegatives, FalsePositives, FalseNegatives)
* Added MetricMethod enum to pass to MetricBase, now you can define whether your metric is based on MEAN, SUM or LAST of all batches
* StatsPrint callback now support "print_confusion_matrix" and "print_confusion_matrix_normalized" arguments in case MetricConfusionMatrixBase metric is found
* Added confusion matrix tests and example
* Some custom layers renames (breaking changes in this part)


0.3.3 (01/11/2020)
-----------------
* Added StatsResult class
Expand Down
78 changes: 56 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,14 @@ A Fast, Flexible Trainer with Callbacks and Extensions for PyTorch
pip install lpd
```

<b>[v0.3.3-beta](https://github.com/RoySadaka/lpd/releases) Release - contains the following:</b>
* Added StatsResult class
* Trainer.evaluate(...) now returns StatsResult instance with loss and metrics details
<b>[v0.3.4-beta](https://github.com/RoySadaka/lpd/releases) Release - contains the following:</b>
* Added ``MetricConfusionMatrixBase`` for adding custom confusion matrix based metrics
* Added ``ConfusionMatrixBasedMetric`` Enum to get specific metrics such as tp,fp,fn,tn,precision,sensitivity,specificity,recall,ppv,npv,accuracy,f1score
* Added confusion matrix common metrics (``TruePositives``, ``TrueNegatives``, ``FalsePositives``, ``FalseNegatives``)
* Added ``MetricMethod`` enum to pass to ``MetricBase``, now you can define whether your metric is based on ``MEAN``, ``SUM`` or ``LAST`` of all batches
* StatsPrint callback now support ``print_confusion_matrix`` and ``print_confusion_matrix_normalized`` arguments in case ``MetricConfusionMatrixBase`` metric is found
* Added confusion matrix tests and example
* Some custom layers renames (breaking changes in this part)



Expand All @@ -36,7 +41,7 @@ A Fast, Flexible Trainer with Callbacks and Extensions for PyTorch
from lpd.enums import Phase, State, MonitorType, MonitorMode, StatsType
from lpd.callbacks import LossOptimizerHandler, StatsPrint, ModelCheckPoint, Tensorboard, EarlyStopping, SchedulerStep, CallbackMonitor
from lpd.extensions.custom_schedulers import KerasDecay
from lpd.metrics import BinaryAccuracyWithLogits
from lpd.metrics import BinaryAccuracyWithLogits, FalsePositives
from lpd.utils.torch_utils import get_gpu_device_if_available
from lpd.utils.general_utils import seed_all

Expand All @@ -47,7 +52,8 @@ A Fast, Flexible Trainer with Callbacks and Extensions for PyTorch
optimizer = torch.optim.SGD(params=model.parameters())
scheduler = KerasDecay(optimizer, decay=0.01, last_step=-1) # decay scheduler using keras formula
loss_func = torch.nn.BCEWithLogitsLoss().to(device) # this is your loss class, already sent to the relevant device
metric_name_to_func = {'acc':BinaryAccuracyWithLogits()} # define your metrics in a dictionary
metric_name_to_func = {'Accuracy':BinaryAccuracyWithLogits(), "FP":FalsePositives(num_class=2, threshold = 0)} # define your metrics in a dictionary


# you can use some of the defined callbacks, or you can create your own
callbacks = [
Expand All @@ -65,12 +71,17 @@ A Fast, Flexible Trainer with Callbacks and Extensions for PyTorch
monitor_type=MonitorType.METRIC,
stats_type=StatsType.VAL,
monitor_mode=MonitorMode.MAX,
metric_name='acc')),
StatsPrint(train_metrics_monitors=CallbackMonitor(patience=-1,
monitor_type=MonitorType.METRIC,
stats_type=StatsType.TRAIN,
monitor_mode=MonitorMode.MAX,
metric_name='acc'))
metric_name='Accuracy')),
StatsPrint(train_metrics_monitors=[CallbackMonitor(patience=-1,
monitor_type=MonitorType.METRIC,
stats_type=StatsType.TRAIN,
monitor_mode=MonitorMode.MAX, # <-- notice MAX
metric_name='Accuracy'),
CallbackMonitor(patience=-1,
monitor_type=MonitorType.METRIC,
stats_type=StatsType.TRAIN,
monitor_mode=MonitorMode.MIN, # <-- notice MIN
metric_name='FP')]
]

trainer = Trainer(model,
Expand Down Expand Up @@ -142,9 +153,9 @@ Here are some examples
val_loss = trainer.val_stats.get_loss() # the mean of the last epoch's validation losses
test_loss = trainer.test_stats.get_loss() # the mean of the test losses (available only after calling evaluate)

train_metrics = trainer.train_stats.get_metrics() # dict(metric_name, mean(values)) of the current epoch in train state
val_metrics = trainer.val_stats.get_metrics() # dict(metric_name, mean(values)) of the current epoch in validation state
test_metrics = trainer.test_stats.get_metrics() # dict(metric_name, mean(values)) of the test (available only after calling evaluate)
train_metrics = trainer.train_stats.get_metrics() # dict(metric_name, MetricMethod(values)) of the current epoch in train state
val_metrics = trainer.val_stats.get_metrics() # dict(metric_name, MetricMethod(values)) of the current epoch in validation state
test_metrics = trainer.test_stats.get_metrics() # dict(metric_name, MetricMethod(values)) of the test (available only after calling evaluate)
```


Expand Down Expand Up @@ -250,28 +261,28 @@ And now, use it in ``LossOptimizerHandler`` callback :
```python
LossOptimizerHandler(apply_on_phase=Phase.BATCH_END,
apply_on_states=State.TRAIN,
loss_handler=None, # default loss handler will be used (calls loss.backward() every invocation)
loss_handler=None, # default loss handler will be used (calls loss.backward() every batch)
optimizer_step_handler=my_optimizer_handler_closure(action='step'),
optimizer_zero_grad_handler=my_optimizer_handler_closure(action='zero_grad')),
optimizer_zero_grad_handler=my_optimizer_handler_closure(action='zero_grad'))
```


### StatsPrint Callback
``StatsPrint`` callback prints informative summary of the trainer stats including loss and metrics.

Loss will be monitored as ``MonitorMode.MIN``.
* Loss (for all states) will be monitored as ``MonitorMode.MIN``
* For train metrics, provide your own monitors via ``train_metrics_monitors``
* Validation metrics monitors will be added automatically according to ``train_metrics_monitors``

For train metrics, provide your own monitors via ``train_metrics_monitors``.

Validation loss & metrics monitors will be added automatically.
```python
StatsPrint(apply_on_phase=Phase.EPOCH_END,
apply_on_states=State.EXTERNAL,
train_metrics_monitors=CallbackMonitor(patience=None,
monitor_type=MonitorType.METRIC,
stats_type=StatsType.TRAIN,
monitor_mode=MonitorMode.MAX,
metric_name='Accuracy'))
metric_name='TruePositives'),
print_confusion_matrix=True) # in case you use one of the ConfusionMatrix metrics (e.g. TruePositives), you may print the confusion matrix
```
Output example:

Expand Down Expand Up @@ -407,13 +418,17 @@ Lets expand ``MyAwesomeCallback`` with ``CallbackMonitor`` to track if our valid
```

## Metrics
``lpd.metrics`` provides metrics to check the accuracy of your model, let's create a custom metric using ``MetricBase`` and also show the use of ``BinaryAccuracyWithLogits`` in this example
``lpd.metrics`` provides metrics to check the accuracy of your model.

Let's create a custom metric using ``MetricBase`` and also show the use of ``BinaryAccuracyWithLogits`` in this example
```python
from lpd.metrics import BinaryAccuracyWithLogits, MetricBase
from lpd.enums import MetricMethod

# our custom metric
class InaccuracyWithLogits(MetricBase):
def __init__(self):
super(InaccuracyWithLogits, self).__init__(MetricMethod.MEAN) # use mean over the batches
self.bawl = BinaryAccuracyWithLogits() # we exploit BinaryAccuracyWithLogits for the computation

def __call__(self, y_pred, y_true): # <=== implement this method!
Expand All @@ -425,6 +440,25 @@ Lets expand ``MyAwesomeCallback`` with ``CallbackMonitor`` to track if our valid
metric_name_to_func = {'accuracy':BinaryAccuracyWithLogits(), 'inaccuracy':InaccuracyWithLogits()}
```

Let's do another example, a custom metric ``Positivity`` based on confusion matrix using ``MetricConfusionMatrixBase``
```python
from lpd.metrics import MetricConfusionMatrixBase, MetricBase, TruePositives, TrueNegatives
from lpd.enums import ConfusionMatrixBasedMetric, MetricMethod

# our custom metric
class Positivity(MetricConfusionMatrixBase):
def __init__(self, num_classes, labels=None, predictions_to_classes_convertor=None, threshold=0.5):
super(Positivity, self).__init__(num_classes, labels, predictions_to_classes_convertor, threshold)

def __call__(self, y_pred, y_true): # <=== implement this method!
tp_per_class = self.get_stats(ConfusionMatrixBasedMetric.TP)
tn_per_class = self.get_stats(ConfusionMatrixBasedMetric.TN)
if self.is_binary(y_pred, y_true):
return tp_per_class[1] + tn_per_class[1]
return tp_per_class + tn_per_class
```


## Save and Load full Trainer
Sometimes you just want to save everything so you can continue training where you left off.

Expand Down
77 changes: 77 additions & 0 deletions examples/confusion_matrix/train.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
import os
import torch.optim as optim
import torch.nn as nn

from lpd.trainer import Trainer
from lpd.callbacks import SchedulerStep, StatsPrint, ModelCheckPoint, LossOptimizerHandler, CallbackMonitor
from lpd.extensions.custom_schedulers import DoNothingToLR
from lpd.enums import Phase, State, MonitorType, StatsType, MonitorMode
from lpd.metrics import TruePositives, FalsePositives, TrueNegatives, FalseNegatives
import lpd.utils.torch_utils as tu
import lpd.utils.general_utils as gu
import examples.utils as eu

gu.seed_all(42) # BECAUSE ITS THE ANSWER TO LIFE AND THE UNIVERSE

def get_parameters():
# N is batch size; D_in is input dimension;
# H is hidden dimension; D_out is output dimension.
N, D_in, H, D_out, num_classes = 128, 100, 100, 3,3
num_epochs = 5
data_loader = eu.examples_data_generator(N, D_in, D_out, category_out=True)
data_loader_steps = 100
return N, D_in, H, D_out, num_epochs, num_classes, data_loader, data_loader_steps


def get_trainer_base(D_in, H, D_out, num_classes):
device = tu.get_gpu_device_if_available()

model = eu.get_basic_model(D_in, H, D_out).to(device)

loss_func = nn.CrossEntropyLoss().to(device)

optimizer = optim.Adam(model.parameters(), lr=1e-4)

scheduler = DoNothingToLR() #CAN ALSO USE scheduler=None, BUT DoNothingToLR IS MORE EXPLICIT

labels = ['Cat', 'Dog', 'Bird']
metric_name_to_func = {
"TP":TruePositives(num_classes, labels=labels, threshold = 0),
"FP":FalsePositives(num_classes, labels=labels, threshold = 0),
"TN":TrueNegatives(num_classes, labels=labels, threshold = 0),
"FN":FalseNegatives(num_classes, labels=labels, threshold = 0)
}

return device, model, loss_func, optimizer, scheduler, metric_name_to_func


def get_trainer(N, D_in, H, D_out, num_epochs, num_classes, data_loader, data_loader_steps):
device, model, loss_func, optimizer, scheduler, metric_name_to_func = get_trainer_base(D_in, H, D_out, num_classes)

callbacks = [
LossOptimizerHandler(),
StatsPrint(print_confusion_matrix=True)
]

trainer = Trainer(model=model,
device=device,
loss_func=loss_func,
optimizer=optimizer,
scheduler=scheduler,
metric_name_to_func=metric_name_to_func,
train_data_loader=data_loader,
val_data_loader=data_loader,
train_steps=data_loader_steps,
val_steps=data_loader_steps,
callbacks=callbacks,
name='Confusion-Matrix-Example')
return trainer


def run():
N, D_in, H, D_out, num_epochs, num_classes, data_loader, data_loader_steps = get_parameters()

current_trainer = get_trainer(N, D_in, H, D_out, num_epochs, num_classes, data_loader, data_loader_steps)

current_trainer.train(num_epochs)

80 changes: 39 additions & 41 deletions examples/data_loader/train.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
# THE FOLLOWING EXAMPLE WAS CONSTRUCTED FROM
# https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel
# AND MIGRATED TO USE lpd FOR TRAINING
#IN THIS EXAMPLE WE WILL USE THE PYTORCH DATALOADER (AS OPPOSE TO PYTHON GENERATORS)
# IN THIS EXAMPLE WE WILL USE THE PYTORCH DATALOADER (AS OPPOSE TO PYTHON GENERATORS)

import torch as T
import torch.nn as nn
Expand All @@ -17,93 +14,94 @@
import lpd.utils.torch_utils as tu


class MyDataset(Dataset):
'Characterizes a dataset for PyTorch'
def __init__(self, list_IDs, labels):
'Initialization'
self.labels = labels
self.list_IDs = list_IDs
num_embeddings = 10
num_samples_per_file = 100
def generate_file_data():
# [(X,y), (X,y), (X,y), (X,y)] like [ (1,0), (5,1), (6,1) ...]
return [(T.randint(1,num_embeddings,(1,1)).squeeze(), T.randint(0,2,(1,1)).squeeze().float()) for _ in range(num_samples_per_file)]


def _generate_fake_sample(self, ID):
# YOU CAN ALSO USE T.load('data/' + ID + '.pt') IF YOU HAVE FILES PER EACH ID
id_as_number = int(ID[-1])
return T.LongTensor([id_as_number]) #FOR 'id-1' RETURN [1,1,1,1], FOR 'id-2' RETURN [2,2,2,2] AND SO ON
class MyDataset(Dataset):
def __init__(self, file_ids):
self.file_ids = file_ids
self.idx_to_file_id = {idx:file_id for idx, file_id in enumerate(self.file_ids)}
self.file_id_to_sample_idx = {i:0 for i in self.file_ids}

def __len__(self):
'Denotes the total number of samples'
return len(self.list_IDs)
return len(self.file_ids)

def __getitem__(self, index):
'Generates one sample of data'
# Select sample
ID = self.list_IDs[index]
file_id = self.idx_to_file_id[index]

data = raw_data[file_id]
sample_idx = self.file_id_to_sample_idx[file_id]
self.file_id_to_sample_idx[file_id] = (self.file_id_to_sample_idx[file_id] + 1) % num_samples_per_file

data_at_sample_idx = data[sample_idx]

# Load data and get label
X = self._generate_fake_sample(ID)
y = self.labels[ID]
X = data_at_sample_idx[0]
y = data_at_sample_idx[1]

return X, y

# Parameters
params = { 'data_loader_params': {
'batch_size': 8,
'shuffle': True,
'num_workers': 6
'num_workers': 1
},
'D_in': 4,
'H': 128,
'D_out': 1,
'embedding_dim': 64,
'num_epochs': 50}
'num_epochs': 80}

# Datasets
file_ids = [f'id-{i}' for i in range(16)]
partition = {
#FOR SIMPLICITY, TRAIN/VAL/TEST WILL BE THE SAME DATA
'train': ['id-1', 'id-2', 'id-3', 'id-4', 'id-5', 'id-6', 'id-7', 'id-8'],
'val': ['id-1', 'id-2', 'id-3', 'id-4', 'id-5', 'id-6', 'id-7', 'id-8'],
'test': ['id-1', 'id-2', 'id-3', 'id-4', 'id-5', 'id-6', 'id-7', 'id-8']
'train': file_ids,
'val': file_ids,
'test': file_ids
}

labels = {'id-1': 0., 'id-2': 1., 'id-3': 0., 'id-4': 1., 'id-5': 0., 'id-6': 1., 'id-7': 0., 'id-8': 1.}
raw_data = {file_id:generate_file_data() for file_id in partition['train']}

# Generators
train_dataset = MyDataset(partition['train'], labels)
train_dataset = MyDataset(partition['train'])
train_data_loader = DataLoader(train_dataset, **params['data_loader_params'])

val_dataset = MyDataset(partition['val'], labels)
val_dataset = MyDataset(partition['val'])
val_data_loader = DataLoader(val_dataset, **params['data_loader_params'])

test_dataset = MyDataset(partition['test'], labels)
test_dataset = MyDataset(partition['test'])
test_data_loader = DataLoader(test_dataset, **params['data_loader_params'])


class Model(nn.Module):
def __init__(self, D_in, H, D_out, num_embeddings, embedding_dim):
def __init__(self, H, D_out, num_embeddings, embedding_dim):
super(Model, self).__init__()

#LAYERS

self.embedding_layer = nn.Embedding(num_embeddings=num_embeddings + 1,
embedding_dim=embedding_dim)
# nn.init.uniform_(self.embedding_layer.weight, a=-0.05, b=0.05) # I PREFER THE INIT THAT TensorFlow DO FOR Embedding

self.dense = Dense(embedding_dim, H, use_bias=True, activation=nn.ReLU())
self.dense_out = Dense(H, D_out, use_bias=True, activation=None)

def forward(self, x): # (batch, D_in)
x = self.embedding_layer(x)
def forward(self, x):
x = self.embedding_layer(x) # (batch, embedding_dim)
x = self.dense(x) # (batch, H)
x = self.dense_out(x) # (batch, 1, 1)
x = x.squeeze(2).squeeze(1) # (batch, 1)
return x #NOTICE! LOGITS OUT, NOT SIGMOID, THE SIGMOID WILL BE APPLIED IN THE LOSS HANDLER FOR THIS EXAMPLE
x = self.dense_out(x) # (batch, 1)
x = x.squeeze() # (batch)
return x #NOTICE! LOGITS OUT, NOT SIGMOID, THE SIGMOID WILL BE APPLIED IN BCEWithLogitsLoss

def get_trainer(params):

device = tu.get_gpu_device_if_available()

# Use the nn package to define our model and loss function.
num_embeddings = len(train_dataset)
model = Model(params['D_in'], params['H'], params['D_out'], num_embeddings, params['embedding_dim']).to(device)
model = Model(params['H'], params['D_out'], num_embeddings, params['embedding_dim']).to(device)

loss_func = nn.BCEWithLogitsLoss().to(device)

Expand Down
Loading

0 comments on commit 385d3f1

Please sign in to comment.