Experiments

This page contains documentation on all experiments that have been run during the development of the baseline.

Experiment 1
Experiment 2
Experiment 3
Experiment 4

Experiment 1

CubeRun with dogs and cats

A CubeRun model was trained using Keras built in flow_from_directory function. The function infers classes based on directory structure. The model was trained on 1000 pictures of cats, and 1000 pictures of dogs. It was validated on 400 pictures of cats, and 400 pictures of dogs. It reached an accuracy of 96% on the training set, and 83% on the validation set.

Results in commit: d03ab0923c171185fc110c1868b2dc3a6b39829e

Experiment 2

CubeRun with MLSP 2013

The model uses categorical cross entropy, adadelta optimization and accuracy as metric. Categorical cross enropy is NOT recommended for multi-label data, and therefore the results are not that interesting. However, it is possible to observe that the nerwork at least learns something.

The validation data is 100 random data points loaded from the same data set as the training data, which is not a good solution either.

The output layer uses 'softmax' activation, which is supposedly not good for multi-label data either.

The recommended setup for multi-label data is:

loss function : binary cross entropy
output layer : sigmoid activation function

The accuracy of the model continued to improve for ~25 epochs, then it got stuck at ~40% training, and validation accuracy.

Results in commit: 4322e86631da107f00fef65e85320d3bb84a4712

Experiment 3

CubeRun with MLSP 2013

The model uses binary cross entropy, adadelta optimization and accuracy as metric.

The output layer uses a sigmoid activation function.

The accuracy of the model improved to nearly 100%, when run for 60 epochs, on the training data. The model achieves a 95% accuracy on the test set. However, this is misleading as the model would achive 100% accuracy simply by assigning every class to each data point.

The model predictions were checked manually against the ground truth on both training data, and test data. The training data looked really good, however, the predictions on test data did not look so good. The model has probably been overfitted to the training data. The results after each epoch were lost by mistake, but since this is just a way of getting closer to a working model / pipeline, this is not such a big deal.

Using binary cross entropy and sigmoid activation did improve the results.

Needed improvements:

The accuracy measure should use some sort of recall, i.e., the model should get penalized for predicting wrong classes.
The validation set MUST be separate from the training set, otherwise it will not be possible to observe overfitting.

Predictions in commit: 8ec1e24d4f2a21003624b12968fcc2e12e1efe32

Experiment 4

CubeRun with MLSP 2013

The validation set is separate from the training set in this model, and the training was only for 20 epochs.

The training accuracy rose from ~90% to ~99% during the first 20 epochs.
The validation accuracy stayed around ~94% for all epochs.
The validation loss decressed from ~0.23 to ~0.03 during the first 20 epochs.
The validation loss decreased from ~0.6 to ~0.25 during the first 10 epochs, then it started increasing to around ~0.4 during the last 10 epochs.

This indicates that the model starts to get overfitted already after 10 epochs, which may explain that the last model (60 epochs) performed so badly on the test data set.

Results in commit: 27f64986d6d00490837b03d8e10ff83dea789eeb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments

Experiments

Experiment 1

Experiment 2

Experiment 3

Experiment 4

Clone this wiki locally