This branch hosts the code for the technical report "Towards Good Practices for Very Deep Two-stream ConvNets", and more.
- Sep. 30, 2015
- Support for cuDNN v3.
- Sep. 7, 2015
- New mechanism for parallel comminucation reduced parallel overhead.
- Batch normalization, courtesy of @Cysu.
VideoDataLayer
for inputing video data.- Training on optical flow data.
- Data augmentation with fixed corner cropping and multi-scale cropping.
- Parallel training with multiple GPUs.
- cuDNNv3 integration.
Generally it's the same as the original caffe. Please see the original README. Please see following instruction for accessing features above. More detailed documentation is on the way.
- Video/optic flow data
- First use the optical flow extraction tool to convert videos to RGB images and opitcal flow images.
- A new data layer called
VideoDataLayer
has been added to support multi-frame input. See the UCF101 sample for how to use it. - Note: The
VideoDataLayer
can only input the optical-flow images generated by the tool listed above.
- Fixed corner cropping augmentation
- Set
fix_crop
totrue
intranform_param
of network's protocol buffer definition.
- Set
- "Multi-scale" cropping augmentation
- Set
multi_scale
totrue
intransform_param
- In
transform_param
, specifyscale_ratios
as a list of floats smaller than one, default is[1, .875, .75, .65]
- In
transform_param
, specifymax_distort
to an integer, which will limit the aspect ratio distortion, default to1
- Set
- cuDNN v3
- Current default config for cuDNNv3 yields a reasonable speed up over cuDNNv2. You can get this by simply replacing the library files.
- If you have plenty of GPU memory, there is parameter
richness
in the solver protobuf. Setting it to a number higher than1
, e.g.10
or20
, will potentially further accelerate the computation, but this will cost a significant amount of GPU memory. - Training with multiple GPUs
- Requires OpenMPI > 1.7.4 (Why?). Remember to compile your OpenMPI with option
--with-cuda
- Specify list of GPU IDs to be used for training, in the solver protocol buffer definition, like
device_id: [0,1,2,3]
- Compile using cmake and use
mpirun
to launch caffe executable, like
- Requires OpenMPI > 1.7.4 (Why?). Remember to compile your OpenMPI with option
mkdir build && cd build
cmake .. -DUSE_MPI=ON
make && make install
mpirun -np 4 ./install/bin/caffe train --solver=<Your Solver File> [--weights=<Pretrained caffemodel>]
Note: actual batch_size will be num_device
times batch_size
specified in network's prototxt.
- Action recognition on UCF101
- Scene recognition on Places205
Currently all existing data layers sub-classed from BasePrefetchingDataLayer
support parallel training. If you have newly added layer which is also sub-classed from BasePrefetchingDataLayer
, simply implement the virtual method
inline virtual void advance_cursor();
Its function should be forwarding the "data cursor" in your data layer for one step. Then your new layer will be able to provide support for parallel training.
Contact
Following is the original README of Caffe.
Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and community contributors.
Check out the project site for all the details like
- DIY Deep Learning for Vision with Caffe
- Tutorial Documentation
- BVLC reference models and the community model zoo
- Installation instructions
and step-by-step examples.
Please join the caffe-users group or gitter chat to ask questions and talk about methods and models. Framework development discussions and thorough bug reports are collected on Issues.
Happy brewing!
Caffe is released under the BSD 2-Clause license. The BVLC reference models are released for unrestricted use.
Please cite Caffe in your publications if it helps your research:
@article{jia2014caffe,
Author = {Jia, Yangqing and Shelhamer, Evan and Donahue, Jeff and Karayev, Sergey and Long, Jonathan and Girshick, Ross and Guadarrama, Sergio and Darrell, Trevor},
Journal = {arXiv preprint arXiv:1408.5093},
Title = {Caffe: Convolutional Architecture for Fast Feature Embedding},
Year = {2014}
}