Table of Contents generated with DocToc
- You can create a network in C++ directly. As an example, to create a
8C5-MP2-16C5-MP3-150N-10N
network, for MNIST, you could do:
EasyCL *cl = new EasyCL();
NeuralNet *net = new NeuralNet(cl);
net->addLayer( InputLayerMaker::instance()->numPlanes(1)->imageSize(28) );
net->addLayer( NormalizationLayerMaker::instance()->translate( -mean )->scale( 1.0f / standardDeviation ) );
net->addLayer( ConvolutionalMaker::instance()->numFilters(8)->filterSize(5)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( PoolingMaker::instance()->poolingSize(2) );
net->addLayer( ConvolutionalMaker::instance()->numFilters(16)->filterSize(5)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( PoolingMaker::instance()->poolingSize(3) );
net->addLayer( FullyConnectedMaker::instance()->numPlanes(150)->imageSize(1)->biased() );
net->addLayer( ActivationMaker::instance()->relu() );
net->addLayer( FullyConnectedMaker::instance()->numPlanes(10)->imageSize(1)->biased() );
net->addLayer( ActivationMaker::instance()->linear() );
net->addLayer( SoftMaxMaker::instance() );
net->print();
- The following sections will detail the various layers available, and the options available for each layer type
- Data must be provided in contiguous, 1d format, see below
#include "DeepCL.h"
OpenCLHelper *cl = OpenCLHelper::createForFirstGpuOtherwiseCpu();
NeuralNet *net = new NeuralNet( cl );
- You need exactly one input layer:
net->addLayer( InputMaker::instance()->numPlanes(10)->imageSize(19) );
- You need to set the number of input planes, and the image size.
- You can add a normalization layer, to translate and scale input data. Put it just after the input layer, like this:
NeuralNet *net = new NeuralNet();
net->addLayer( InputMaker::instance()->numPlanes(10)->imageSize(19) );
net->addLayer( NormalizationMaker::instance()->translate( - mean )->scale( 1.0f / standardDeviation ) );
// other layers here...
To add a drop out layer:
net->addLayer( DropoutMaker::instance()->dropRatio(0.5f) );
This should probably go in between a fully-connected layer, and its associated activation layer, like:
net->addLayer( FullyConnectedMaker::instance()->numPlanes(10)->imageSize(1)->linear()->biased() );
net->addLayer( DropoutMaker::instance()->dropRatio(0.5f) );
net->addLayer( ActivationMaker::instance()->tanh() );
- You can add a random patch layer, to cut a patch from each image, in a random location, and train against that
- You need to specify the patch size, eg on minst, which is 28x28 images, you might use a patch size of 24
- During training the patch location is chosen randomly, per image, per epoch
- Size of output image from this layer is the size of the patch
- During testing, the patch is cut from the centre of the image
net->addLayer( RandomPatchMaker::instance()->patchSize(24) );
- You can add a random translations layer, to randomly translate each input image by a random amount, during training
- During testing, no translation is done
- If you put eg
translateSize(2)
, then the translation amount will be chosen uniformly from the set{-2,-1,0,1,2}
, for each axis. - Output image from this layer is same size as input image
net->addLayer( RandomTranslationsMaker::instance()->translateSize(2) );
Eg:
net->addLayer( ConvolutionalMaker::instance()->numFilters(32)->filterSize(5)->relu()->biased() );
- You can change the number of filters, and their size. If you want, you can use any of the following options:
->padZeros()
: pad the input image with zeros, so the output image is same size as the input->biased()
turn on bias->biased(1)
same as->biased()
->biased(0)
turn off bias (default)
- convolutional layers forward-prop and backward-prop both run on GPU, via OpenCL
Eg:
net->addLayer( ActivationMaker::instance()->relu() );
- You can create one of the following activations to be applied on the previous layer.
->linear()
choose linear activation->relu()
choose RELU activation->elu()
choose ELU activation->sigmoid()
choose sigmoid activation->tanh()
choose tanh activation (current default, but defaults can change...)->scaledtanh()
1.7159 * tanh(0.66667 * x )
eg:
net->addLayer( FullyConnectedMaker::instance()->numPlanes(2)->imageSize(28) );
Available options:
->biased()
turn on bias->biased(1)
same as->biased()
->biased(0)
turn off bias (default)->linear()
choose linear activation->relu()
choose relu activation->sigmoid()
choose sigmoid activation->tanh()
choose tanh activation (current default, but defaults can change...)->scaledtanh()
1.7159 * tanh(0.66667 * x )
net->addLayer( PoolingMaker::instance()->poolingSize(2) );
- By default, if the input image size is not an exact multiple of the poolingsize, the extra margin will be ignored
- You can specify
padZeros
to include this margin:
net->addLayer( PoolingMaker::instance()->poolingSize(2)->padZeros() );
You need to add exactly one loss layer, as the last layer of the net. The following loss layers are available:
net->addLayer( SquareLossMaker::instance() );
net->addLayer( CrossEntropyMaker::instance() );
net->addLayer( SoftMaxLayer::instance() );
- if your outputs are categorial, 1-of-N, then softMaxLayer is probably what you want
- otherwise, you can choose square loss, or cross-entropy loss:
- squared loss works well with a
tanh
last layer - cross entropy loss works well with a
sigmoid
last layer - if you're not sure, then
tanh
last layer, with squared loss, works well
- squared loss works well with a
- the softmax layer:
- creates a probability distribution, ie a set of outputs, that sum to 1, and each lie in the range
0 <= x <= 1
- can create this probability distribution either across all output planes, with a imagesize of 1
- this is the default
- or else a per-plane probability distribution
- add option
->perPlane()
- add option
- creates a probability distribution, ie a set of outputs, that sum to 1, and each lie in the range
Input data should be provided in a contiguous array, of float
s. "group by" order should be:
- training example id
- input plane
- image row
- image column
Providing labels, as an integer array, is the most efficient way of training, if you are training against categorical data. The labels should be provided as one integer per example, zero-based.
- in this case, the last layer of the net should have the same number of nodes as categories, eg a
netdef
ending in-5n
, if there are 5 categories - if using the C++ API, you would probably want to use a
softmax
loss layer
For non-categorical data, you can provide expected output values as a contiguous array of floats. "group by" order for the floats should be:
- training example id
- output plane (eg, corresponds to filter id, for convolutional network)
- output row
- output column
// create a Trainer object, currently SGD,
// passing in learning rate, and momentum:
Trainer *trainer = SGD::instance( cl, 0.02f, 0.0f );
Can set weightdecay, momentum, learningrate:
SGD *sgd = SGD::instance( cl );
sgd->setLearningRate( 0.002f );
sgd->setMomentum( 0.1f );
sgd->setWeightDecay( 0.001f );
Other trainers:
Adagrad *adagrad = new Adagrad( cl );
adagrad->setLearningRate( 0.002f );
Trainer *trainer = adagrad;
Rmsprop *rmsprop = new Rmsprop( cl );
rmsprop->setLearningRate( 0.002f );
Trainer *trainer = rmsprop;
Nesterov *nesterov = new Nesterov( cl );
nesterov->setLearningRate( 0.002f );
nesterov->setMomentum( 0.1f );
Trainer *trainer = nesterov;
Annealer *annealer = new Annealer( cl );
annealer->setLearningRate( 0.002f );
annealer->setAnneal( 0.97f );
Trainer *trainer = annealer;
eg:
NetLearner netLearner(
trainer, net,
Ntrain, trainData, trainLabels,
Ntest, testData, testLabels );
netLearner.setSchedule( numEpochs );
netLearner.setBatchSize( batchSize );
netLearner.learn();
// learning is now done :-)
eg
// (create a net, as above)
// (train it, as above)
// test, eg:
BatchLearner batchLearner( net );
int testNumRight = batchLearner.test( batchSize, Ntest, testData, testLabels );
- By default an
OriginalInitializer
object is used to initialize weights (a bit hacky, but changing this would need a major version bump) - You can create an instance of
UniformInitializer
, and assign this to the ConvolutionalMaker by doing for example->setWeightInitializer( new UniformInitializer(1.0f) )
, to use a uniform initializer- uniform initializer assigns weights sampled uniformally from the range +/- ( initialWeights divided by fanin)
- possible to create other WeightsInitializers if we ant
You can find more details in the Doxygen-generated docs at doxy docs for 4.x.x