Skip to content

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

License

Notifications You must be signed in to change notification settings

rgtzths/mlp_hpp_analysis

Repository files navigation

mlp_hpp_analysis

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

Before using

Install requirements.txt by using the command pip install -r requirements.txt

To use this module

  1. Write various .json files with the experiments you want to perform.

  2. Run the experiments using the comand python code/run.py --hyper path_to_the_folder

Execute the experiments performed in the paper

In the hyperparameters folder there is one folder for each of the tested datasets.

If the user desires to run every experiment at the same time use the all_runs folder. Otherwise it can run the experiments by folder individually achieving the same results as the ones presented in the paper.

Keep in mind the experiments with the binary_crossentropy and sparse_categorical_crossentropy are kept in a seperate folder as they require Y array to be created differently. You can run them seperatly and then join the csv results.

With the experiments performed the results should be presented in results/raw folder.

To preprocess them run python code/results_preprocess.py which should create the results/final folder with the preprocessed results.

After that to obtain the importance of the hyperparameters run python code/results_analysis.py which should present the importance by dataset and the average of the six datasets.

Results

Here we present the results that are available in the paper and an additional analysis of the obtained results.

If there is any analysis missing that the reader might desire to perform, the complete data obtained from the runs is available in the results folder, or the reader might run the experiments him self.

Hyperparameter importance

These are the results of the fANOVA analysis.

General Importance

All Datasets
Hyperparameter Performance Training Time Inference Time
activation_functions 18.42 3.2 6.99
batch_size 0.95 55.94 37.67
loss 12.23 0.33 2.1
optimizer 14.88 5.17 2.16
learning_rate 17.65 3.38 1.34
hidden_layer_dim 3.94 3.85 16.62
hidden_layer_size 3.94 3.61 6.29

Importance by dataset type

Classification
Hyperparameter Performance Training Time Inference Time
activation_functions 17.59 2.91 2.28
batch_size 1.31 57.3 37.43
loss 9.16 0.01 3.76
optimizer 17.11 3.78 4.53
learning_rate 21.4 4.69 0.01
hidden_layer_dim 6.13 0.67 19.37
hidden_layer_size 3.04 5.2 8.54
Regression
Hyperparameter Performance Training Time Inference Time
activation_functions 23.66 2.87 15.98
batch_size 4.49 64.87 37.22
loss 19.51 0.12 0.01
optimizer 7.4 8.33 0.12
learning_rate 18.09 1.38 3.26
hidden_layer_dim 2.1 2.2 12.22
hidden_layer_size 3.32 1.48 4.18

Importance per dataset

Abalone
Hyperparameter Performance Training Time Inference Time
activation_functions 14.77 1.39 4.39
batch_size 0.55 56.72 21.61
loss 0.0 1.62 0.0
optimizer 2.96 7.99 3.5
learning_rate 30.02 6.9 0.07
hidden_layer_dim 7.16 0.12 15.69
hidden_layer_size 11.55 4.35 11.04
Bike Sharing
Hyperparameter Performance Training Time Inference Time
activation_functions 51.26 0.59 24.54
batch_size 0.74 72.21 29.71
loss 0.06 0.0 0.0
optimizer 17.86 6.28 0.02
learning_rate 11.6 5.17 7.14
hidden_layer_dim 0.0 1.98 14.41
hidden_layer_size 2.62 1.16 0.82
Compas
Hyperparameter Performance Training Time Inference Time
activation_functions 3.4 0.4 0.08
batch_size 1.16 43.0 6.23
loss 33.98 0.19 0.0
optimizer 21.68 4.02 4.16
learning_rate 9.59 6.06 0.02
hidden_layer_dim 0.76 2.92 49.31
hidden_layer_size 3.61 7.49 20.06
Covertype
Hyperparameter Performance Training Time Inference Time
activation_functions 29.22 12.77 4.01
batch_size 0.77 56.92 41.6
loss 0.06 0.0 10.34
optimizer 8.29 1.65 4.67
learning_rate 23.64 0.32 0.17
hidden_layer_dim 13.27 0.2 3.32
hidden_layer_size 1.84 4.79 0.62
Delays Zurich
Hyperparameter Performance Training Time Inference Time
activation_functions 0.37 3.57 5.2
batch_size 0.0 58.2 57.82
loss 39.27 0.0 0.01
optimizer 14.39 2.42 0.0
learning_rate 0.18 0.58 0.5
hidden_layer_dim 2.37 10.18 12.22
hidden_layer_size 3.81 0.48 3.92
Higgs
Hyperparameter Performance Training Time Inference Time
activation_functions 11.51 0.49 3.73
batch_size 2.46 48.6 69.07
loss 0.01 0.14 2.25
optimizer 24.08 8.67 0.63
learning_rate 30.84 1.22 0.15
hidden_layer_dim 0.09 7.68 4.75
hidden_layer_size 0.18 3.39 1.25

Performance metrics

Best performing hyperparameter combination per dataset

Activation function Batch size Hidden layer dimension Loss function Optimizer Learning Rate MSE/MCC Training time Prediction Time
Regression
Abalone
relu 256 [224, 192, 608, 768, 800] mean_squared_error adam 0.001 2.158 1.928 0.107
Bike Sharing
selu 1024 [352, 32, 288, 32, 544, 704, 96] mean_squared_error adam 0.001 59.748 3.621 0.128
Delays Zurich
relu 128 [640, 416, 576, 192, 288, 32, 32] mean_squared_error adam 0.001 3.101 73.694 0.286
Classification
Compass
relu 512 [512, 512, 512, 512] categorical_crossentropy adam 0.001 0.041 1.567 0.118
Covertype
relu 512 [1024, 1024, 1024, 1024, 1024, 1024, 1024, 1024] categorical_crossentropy adam 0.001 0.828 74.544 0.199
Higgs
softsign 512 [224, 480, 64, 96, 768, 32, 928] categorical_crossentropy adam 0.001 0.415 50.935 0.239

Baseline vs Best vs Worst comparison

The best and worst models were picked based on the performance metric

Dataset Baseline Best model Worst model
Performance (MCC/MSE)
Regression
Abalone 2.289 2.158 9.295
Bike Sharing 84.045 59.748 100.139
Delays Zurich 3.107 3.101 154.627
Classification
Compass 0.022 0.041 0
Covertype 0.812 0.828 -0.001
Higgs 0.256 0.415 0
Training Time
Abalone 1.465 1.928 2.554
Bike Sharing 4.67 3.621 3.014
Delays Zurich 12.74 73.694 7.25
Compass 1.088 2.342 1.121
Covertype 37.381 74.544 4.987
Higgs 21.161 50.935 4.329
Inference Time
Abalone 0.11 0.107 0.101
Bike Sharing 0.132 0.128 0.122
Delays Zurich 0.136 0.286 0.149
Compass 1.088 0.11 1.121
Covertype 0.173 0.199 0.172
Higgs 0.173 0.239 0.182

Authors

License

This project is licensed under the MIT License - see the LICENSE file for details

Citation

If you use this code, please cite our work: Teixeira, Rafael & Antunes, Mário & Sobral, Rúben & Martins, João & Gomes, Diogo & Aguiar, Rui. (2023). Exploring the Intricacies of Neural Network Optimization. 10.1007/978-3-031-45275-8_2.

DOI

About

This repository is the code basis for the paper intitled "Exploring the Intricacies of Neural Network Optimization"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages