Skip to content
Gustavo Rosa edited this page Feb 24, 2017 · 9 revisions

This document aims at explaining the model file used for each deep learning technique. Let's get started!

  1. Restricted Boltzmann Machines (RBM). Suppose we have the following model file:
  2. 100 0.1 0.1 0.00001 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0.1 0.9 #<minimum learning rate> <maximum learning rate>
    The first line contains four parameters: number of hidden units, learning rate, weight decay and momentum. Notice everything right after the caracter # is considering a comment, thus not taking into account by the parser.

    The next line configures RBM minimum and maximum learning rate.

    The same model file can be seen for Gaussian RBMs.

  3. Dropout Restricted Boltzmann Machines (Dropout RBM). Suppose we have the following model file:
  4. 100 0.1 0.1 0.00001 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0.1 0.9 #<minimum learning rate> <maximum learning rate>
    1 #<hidden units dropout rate>
    Considered the approach mentioned above (RBM), the difference only takes place on the last line, where it stands for the usage of dropout. 1 stands for a standard RBM, without dropout. For instance, if we use 0.2 as the dropout rate, it will stand for a network with 80% of dropout, as 0 consists in deactivating all the units on the referred layer.

    The same model file can be seen for Dropout Gaussian RBMs, Dropout DRBMs and Dropout Gaussian DRBMs.

  5. Dropconnect Restricted Boltzmann Machines (Dropconnect RBM). Suppose we have the following model file:
  6. 100 0.1 0.1 0.00001 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0.1 0.9 #<minimum learning rate> <maximum learning rate>
    0.5 #<dropconnect mask rate>
    Now, considering another type of regularization, we introduce the Dropconnect RBMs. Hence, the last line will stand for the usage of dropconnect. For instance, if we use 0.2 as the dropconnect rate, it will stand for a network with 80% of dropconnect, as 0 consists in deactivating all the weights on the referred layer.

  7. Discriminative Restricted Boltzmann Machines (DRBM). Suppose we have the following model file:
  8. 100 0.1 0.1 0.00001 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0.1 0.9 #<minimum learning rate> <maximum learning rate>
    The first line contains four parameters: number of hidden units, learning rate, weight decay and momentum. Notice everything right after the caracter # is considering a comment, thus not taking into account by the parser.

    The next line configures DRBM minimum and maximum learning rate.

  9. Gaussian Discriminative Restricted Boltzmann Machines (Gaussian DRBM). Suppose we have the following model file:
  10. 100 0.1 0.1 0.00001 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0.1 0.9 #<minimum learning rate> <maximum learning rate>
    The same model file can be seen for Gaussian DRBMs and Bernoulli DRBMs.

  11. Deep Boltzmann Machines (DBM). Suppose we have the following model file:
  12. 500 0.1 0.1 0.00001 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0.1 0.9 #<minimum learning rate> <maximum learning rate>
    500 0.1 0.1 0.00001 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0.1 0.9 #<minimum learning rate> <maximum learning rate>
    Considering RBM model file, it only differs that DBM takes into account a different number of parameters, as it is composed by n layers. Therefore, for a DBM with 2 layers the model shall be the one mentioned above.

    If there is a need in adding more layers, you just need to copy and paste the first two lines from the model. Hence, for example, a DBM with 5 layers will have a model file with 10 lines. Note that this model file can also be used for TDBM (Temperature-based Deep Boltzmann Machines).

  13. Dropout Deep Boltzmann Machines (Dropout DBM). Suppose we have the following model file:
  14. 100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    1 #<hidden units dropout rate>
    100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    1 #<hidden units dropout rate>
    500 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    1 #<hidden units dropout rate>
    Now, a DBM with dropout regularization will need an extra line for each layer containing hidden units dropout rate.

  15. Dropconnect Deep Boltzmann Machines (Dropconnect DBM). Suppose we have the following model file:
  16. 100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    0.5 #<dropconnect mask rate>
    100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    0.5 #<dropconnect mask rate>
    500 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    0.8 #<dropconnect mask rate>
    Finally, a Dropconnect DBM will need an extra line for each layer containing the dropconnect mask rate.

  17. Deep Belief Networks (DBN). Suppose we have the following model file:
  18. 100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    500 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    The very same approach can be seen for DBN. For instance, the model above stands for a DBN with three layers. Note that this model file can also be used for TDBN (Temperature-based Deep Belief Networks).

  19. Dropout Deep Belief Networks (Dropout DBN). Suppose we have the following model file:
  20. 100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    1 #<hidden units dropout rate>
    100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    1 #<hidden units dropout rate>
    500 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    1 #<hidden units dropout rate>
    Now, a DBN with dropout regularization will need an extra line for each layer containing hidden units dropout rate.

  21. Dropconnect Deep Belief Networks (Dropconnect DBN). Suppose we have the following model file:
  22. 100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    0.5 #<dropconnect mask rate>
    100 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    0.5 #<dropconnect mask rate>
    500 0.1 0.0002 0.5 #<number of hidden units> <learning rate> <weight decay> <momentum>
    0 0.1 #<minimum learning rate> <maximum learning rate>
    0.8 #<dropconnect mask rate>
    Finally, a Dropconnect DBN will need an extra line for each layer containing the dropconnect mask rate.

  23. Enhanced Probabilistic Neural Network (EPNN). Suppose we have the following model file:
  24. 10 1 0.5 0 # <kmax | 0 for number of labels in training set> <sigma> <radius | greater than 0 for Hyper-Sphere use in EPNN> <learning best parameters | 0 for no-optimization | 1 for grid-search | 2 for grid-search and train/eval sets merging>
    The kmax stands for the maximum degree for the knn graph, while sigma is the spread of the Gaussian function and radius needs to be set r > 0 to use Hyper-Sphere in Enhanced Probabilistic Neural Network and r = 0 for only Probabilistic Neural Network.

    Finally, the learning best parameters stands for the desired type of optimization activity. 0 for no-optimization, 1 for a grid-search optimization and 2 for grid-search with training/evaluaiton sets merging.

Clone this wiki locally