diff --git a/.gitignore b/.gitignore index 2ac7a0096eb..cf92e5fd4db 100644 --- a/.gitignore +++ b/.gitignore @@ -51,8 +51,7 @@ Makefile.config # 1. reference, and not casually committed # 2. custom, and live on their own unless they're deliberated contributed data/* -*model -*_iter_* +*.caffemodel *.solverstate *.binaryproto *leveldb diff --git a/docs/getting_pretrained_models.md b/docs/getting_pretrained_models.md deleted file mode 100644 index 5df2bd4dc2d..00000000000 --- a/docs/getting_pretrained_models.md +++ /dev/null @@ -1,34 +0,0 @@ ---- -layout: default ---- - -# Pre-trained models - -[BVLC](http://bvlc.eecs.berkeley.edu) aims to provide a variety of high quality pre-trained models. -Note that unlike Caffe itself, these models are licensed for **academic research / non-commercial use only**. -If you have any questions, please get in touch with us. - -*UPDATE* July 2014: we are actively working on a service for hosting user-uploaded model definition and trained weight files. -Soon, the community will be able to easily contribute different architectures! - -### ImageNet - -**Caffe Reference ImageNet Model**: Our reference implementation of an ImageNet model trained on ILSVRC-2012 can be downloaded (232.6MB) by running `examples/imagenet/get_caffe_reference_imagenet_model.sh` from the Caffe root directory. - -- The bundled model is the iteration 310,000 snapshot. -- The best validation performance during training was iteration 313,000 with - validation accuracy 57.412% and loss 1.82328. -- This model obtains a top-1 accuracy 57.4% and a top-5 accuracy 80.4% on the validation set, using just the center crop. (Using the average of 10 crops, (4 + 1 center) * 2 mirror, should obtain a bit higher accuracy) - -**AlexNet**: Our training of the Krizhevsky architecture, which differs from the paper's methodology by (1) not training with the relighting data-augmentation and (2) initializing non-zero biases to 0.1 instead of 1. (2) was found necessary for training, as initialization to 1 gave flat loss. Download the model (243.9MB) by running `examples/imagenet/get_caffe_alexnet_model.sh` from the Caffe root directory. - -- The bundled model is the iteration 360,000 snapshot. -- The best validation performance during training was iteration 358,000 with - validation accuracy 57.258% and loss 1.83948. -- This model obtains a top-1 accuracy 57.1% and a top-5 accuracy 80.2% on the validation set, using just the center crop. (Using the average of 10 crops, (4 + 1 center) * 2 mirror, should obtain a bit higher accuracy) - -**R-CNN (ILSVRC13)**: The pure Caffe instantiation of the [R-CNN](https://github.com/rbgirshick/rcnn) model for ILSVRC13 detection. Download the model (230.8MB) by running `examples/imagenet/get_caffe_rcnn_imagenet_model.sh` from the Caffe root directory. This model was made by transplanting the R-CNN SVM classifiers into a `fc-rcnn` classification layer, provided here as an off-the-shelf Caffe detector. Try the [detection example](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb) to see it in action. For the full details, refer to the R-CNN site. *N.B. For research purposes, make use of the official R-CNN package and not this example.* - -### Auxiliary Data - -Additionally, you will probably eventually need some auxiliary data (mean image, synset list, etc.): run `data/ilsvrc12/get_ilsvrc_aux.sh` from the root directory to obtain it. diff --git a/docs/index.md b/docs/index.md index 47191ba8646..af4b50c323b 100644 --- a/docs/index.md +++ b/docs/index.md @@ -38,8 +38,8 @@ Slides about the Caffe architecture, *updated 03/14*. A 4-page report for the ACM Multimedia Open Source competition. - [Installation instructions](/installation.html)
Tested on Ubuntu, Red Hat, OS X. -* [Pre-trained models](/getting_pretrained_models.html)
-BVLC provides ready-to-use models for non-commercial use. +* [Model Zoo](/model_zoo.html)
+BVLC suggests a standard distribution format for Caffe models, and provides trained models. * [Developing & Contributing](/development.html)
Guidelines for development and contributing to Caffe. * [API Documentation](/doxygen/)
diff --git a/docs/model_zoo.md b/docs/model_zoo.md new file mode 100644 index 00000000000..0b7daa0bf17 --- /dev/null +++ b/docs/model_zoo.md @@ -0,0 +1,53 @@ +--- +--- +# Caffe Model Zoo + +Lots of people have used Caffe to train models of different architectures and applied to different problems, ranging from simple regression to AlexNet-alikes to Siamese networks for image similarity to speech applications. +To lower the friction of sharing these models, we introduce the model zoo framework: + +- A standard format for packaging Caffe model info. +- Tools to upload/download model info to/from Github Gists, and to download trained `.caffemodel` binaries. +- A central wiki page for sharing model info Gists. + +## Where to get trained models + +First of all, we provide some trained models out of the box. +Each one of these can be downloaded by running `scripts/download_model_binary.py ` where `` is specified below: + +- **BVLC Reference CaffeNet** in `models/bvlc_reference_caffenet`: AlexNet trained on ILSVRC 2012, with a minor variation from the version as described in the NIPS 2012 paper. +- **BVLC AlexNet** in `models/bvlc_alexnet`: AlexNet trained on ILSVRC 2012, almost exactly as described in NIPS 2012. +- **BVLC Reference R-CNN ILSVRC-2013** in `models/bvlc_reference_rcnn_ilsvrc13`: pure Caffe implementation of [R-CNN](https://github.com/rbgirshick/rcnn). + +User-provided models are posted to a public-editable [wiki page](https://github.com/BVLC/caffe/wiki/Model-Zoo). + +## Model info format + +A caffe model is distributed as a directory containing: + +- Solver/model prototxt(s) +- `readme.md` containing + - YAML frontmatter + - Caffe version used to train this model (tagged release or commit hash). + - [optional] file URL and SHA1 of the trained `.caffemodel`. + - [optional] github gist id. + - Information about what data the model was trained on, modeling choices, etc. + - License information. +- [optional] Other helpful scripts. + +## Hosting model info + +Github Gist is a good format for model info distribution because it can contain multiple files, is versionable, and has in-browser syntax highlighting and markdown rendering. + +- `scripts/upload_model_to_gist.sh `: uploads non-binary files in the model directory as a Github Gist and prints the Gist ID. If `gist_id` is already part of the `/readme.md` frontmatter, then updates existing Gist. + +Try doing `scripts/upload_model_to_gist.sh models/bvlc_alexnet` to test the uploading (don't forget to delete the uploaded gist afterward). + +Downloading models is not yet supported as a script (there is no good commandline tool for this right now), so simply go to the Gist URL and click "Download Gist" for now. + +### Hosting trained models + +It is up to the user where to host the `.caffemodel` file. +We host our BVLC-provided models on our own server. +Dropbox also works fine (tip: make sure that `?dl=1` is appended to the end of the URL). + +- `scripts/download_model_binary.py `: downloads the `.caffemodel` from the URL specified in the `/readme.md` frontmatter and confirms SHA1. diff --git a/examples/classification.ipynb b/examples/classification.ipynb index 4d4a738a9f3..9c9d247a0cf 100644 --- a/examples/classification.ipynb +++ b/examples/classification.ipynb @@ -2,8 +2,7 @@ "metadata": { "description": "Use the pre-trained ImageNet model to classify images with the Python interface.", "example_name": "ImageNet classification", - "include_in_docs": true, - "signature": "sha256:4f8d4c079c30d20ef4b6818e9672b1741fd1377354e5b83e291710736cecd24f" + "include_in_docs": true }, "nbformat": 3, "nbformat_minor": 0, @@ -19,7 +18,7 @@ "\n", "Caffe provides a general Python interface for models with `caffe.Net` in `python/caffe/pycaffe.py`, but to make off-the-shelf classification easy we provide a `caffe.Classifier` class and `classify.py` script. Both Python and MATLAB wrappers are provided. However, the Python wrapper has more features so we will describe it here. For MATLAB, refer to `matlab/caffe/matcaffe_demo.m`.\n", "\n", - "Before we begin, you must compile Caffe and install the python wrapper by setting your `PYTHONPATH`. If you haven't yet done so, please refer to the [installation instructions](installation.html). This example uses our pre-trained ImageNet model, an ILSVRC12 image classifier. You can download it (232.57MB) by running `examples/imagenet/get_caffe_reference_imagenet_model.sh`. Note that this pre-trained model is licensed for academic research / non-commercial use only.\n", + "Before we begin, you must compile Caffe and install the python wrapper by setting your `PYTHONPATH`. If you haven't yet done so, please refer to the [installation instructions](installation.html). This example uses our pre-trained CaffeNet model, an ILSVRC12 image classifier. You can download it by running `./scripts/download_model_binary.py models/bvlc_reference_caffenet`. Note that this pre-trained model is licensed for academic research / non-commercial use only.\n", "\n", "Ready? Let's start." ] @@ -41,8 +40,8 @@ "\n", "# Set the right path to your model definition file, pretrained model weights,\n", "# and the image you would like to classify.\n", - "MODEL_FILE = 'imagenet/imagenet_deploy.prototxt'\n", - "PRETRAINED = 'imagenet/caffe_reference_imagenet_model'\n", + "MODEL_FILE = '../models/bvlc_reference_caffenet/deploy.prototxt'\n", + "PRETRAINED = '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'\n", "IMAGE_FILE = 'images/cat.jpg'" ], "language": "python", @@ -404,4 +403,4 @@ "metadata": {} } ] -} \ No newline at end of file +} diff --git a/examples/detection.ipynb b/examples/detection.ipynb index beae000a11b..62263b66809 100644 --- a/examples/detection.ipynb +++ b/examples/detection.ipynb @@ -2,8 +2,7 @@ "metadata": { "description": "Run a pretrained model as a detector in Python.", "example_name": "R-CNN detection", - "include_in_docs": true, - "signature": "sha256:8a744fbbb9ed80acab471247eaf50c27dcbd652105404df9feca599939f0c0ee" + "include_in_docs": true }, "nbformat": 3, "nbformat_minor": 0, @@ -26,7 +25,7 @@ "\n", "- [Selective Search](http://koen.me/research/selectivesearch/) is the region proposer used by R-CNN. The [selective_search_ijcv_with_python](https://github.com/sergeyk/selective_search_ijcv_with_python) Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory `selective_search_ijcv_with_python`, run the demo in MATLAB to compile the necessary functions, then add it to your `PYTHONPATH` for importing. (If you have your own region proposals prepared, or would rather not bother with this step, [detect.py](https://github.com/BVLC/caffe/blob/master/python/detect.py) accepts a list of images and bounding boxes as CSV.)\n", "\n", - "- Follow the [model instructions](http://caffe.berkeleyvision.org/getting_pretrained_models.html) to get the Caffe R-CNN ImageNet model.\n", + "-Run `./scripts/download_model_binary.py models/bvlc_reference_caffenet` to get the Caffe R-CNN ImageNet model.\n", "\n", "With that done, we'll call the bundled `detect.py` to generate the region proposals and run the network. For an explanation of the arguments, do `./detect.py --help`." ] @@ -37,7 +36,7 @@ "input": [ "!mkdir -p _temp\n", "!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt\n", - "!../python/detect.py --crop_mode=selective_search --pretrained_model=imagenet/caffe_rcnn_imagenet_model --model_def=imagenet/rcnn_imagenet_deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5" + "!../python/detect.py --crop_mode=selective_search --pretrained_model=models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5" ], "language": "python", "metadata": {}, diff --git a/examples/feature_extraction/imagenet_val.prototxt b/examples/feature_extraction/imagenet_val.prototxt index 32310904a66..83fe8c1a08d 100644 --- a/examples/feature_extraction/imagenet_val.prototxt +++ b/examples/feature_extraction/imagenet_val.prototxt @@ -5,14 +5,14 @@ layers { top: "data" top: "label" image_data_param { - source: "$CAFFE_DIR/examples/_temp/file_list.txt" + source: "examples/_temp/file_list.txt" batch_size: 50 new_height: 256 new_width: 256 } transform_param { crop_size: 227 - mean_file: "$CAFFE_DIR/data/ilsvrc12/imagenet_mean.binaryproto" + mean_file: "data/ilsvrc12/imagenet_mean.binaryproto" mirror: false } } diff --git a/examples/feature_extraction/readme.md b/examples/feature_extraction/readme.md index 083908ebe54..c325ed482e5 100644 --- a/examples/feature_extraction/readme.md +++ b/examples/feature_extraction/readme.md @@ -10,7 +10,9 @@ Extracting Features =================== In this tutorial, we will extract features using a pre-trained model with the included C++ utility. -Follow instructions for [installing Caffe](../../installation.html) and for [downloading the reference model](../../getting_pretrained_models.html) for ImageNet. +Note that we recommend using the Python interface for this task, as for example in the [filter visualization example](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb). + +Follow instructions for [installing Caffe](../../installation.html) and run `scripts/download_model_binary.py models/bvlc_reference_caffenet` from caffe root directory. If you need detailed information about the tools below, please consult their source code, in which additional documentation is usually provided. Select data to run on @@ -35,7 +37,7 @@ Define the Feature Extraction Network Architecture In practice, subtracting the mean image from a dataset significantly improves classification accuracies. Download the mean image of the ILSVRC dataset. - data/ilsvrc12/get_ilsvrc_aux.sh + ./data/ilsvrc12/get_ilsvrc_aux.sh We will use `data/ilsvrc212/imagenet_mean.binaryproto` in the network definition prototxt. @@ -44,14 +46,12 @@ We'll be using the `ImageDataLayer`, which will load and resize images for us. cp examples/feature_extraction/imagenet_val.prototxt examples/_temp -Edit `examples/_temp/imagenet_val.prototxt` to use correct path for your setup (replace `$CAFFE_DIR`) - Extract Features ---------------- Now everything necessary is in place. - build/tools/extract_features.bin examples/imagenet/caffe_reference_imagenet_model examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10 + ./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10 The name of feature blob that you extract is `fc7`, which represents the highest level feature of the reference model. We can use any other layer, as well, such as `conv5` or `pool3`. diff --git a/examples/filter_visualization.ipynb b/examples/filter_visualization.ipynb index 5fdcbe25fb4..abf4c0dd796 100644 --- a/examples/filter_visualization.ipynb +++ b/examples/filter_visualization.ipynb @@ -2,8 +2,7 @@ "metadata": { "description": "Extracting features and visualizing trained filters with an example image, viewed layer-by-layer.", "example_name": "Filter visualization", - "include_in_docs": true, - "signature": "sha256:b1b0457e2b10110aca847a718a3fe631ebcfce63a61cbc33653244f52b1ff4af" + "include_in_docs": true }, "nbformat": 3, "nbformat_minor": 0, @@ -54,15 +53,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "Follow the [instructions](http://caffe.berkeleyvision.org/getting_pretrained_models.html) for getting the pretrained models, load the net, specify test phase and CPU mode, and configure input preprocessing." + "Run `./scripts/download_model_binary.py models/bvlc_reference_caffenet` to get the pretrained CaffeNet model, load the net, specify test phase and CPU mode, and configure input preprocessing." ] }, { "cell_type": "code", "collapsed": false, "input": [ - "net = caffe.Classifier(caffe_root + 'examples/imagenet/imagenet_deploy.prototxt',\n", - " caffe_root + 'examples/imagenet/caffe_reference_imagenet_model')\n", + "net = caffe.Classifier(caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt',\n", + " caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')\n", "net.set_phase_test()\n", "net.set_mode_cpu()\n", "# input preprocessing: 'data' is the name of the input blob == net.inputs[0]\n", @@ -598,4 +597,4 @@ "metadata": {} } ] -} \ No newline at end of file +} diff --git a/examples/finetune_flickr_style/readme.md b/examples/finetune_flickr_style/readme.md index da584f0088a..dad45aeb560 100644 --- a/examples/finetune_flickr_style/readme.md +++ b/examples/finetune_flickr_style/readme.md @@ -34,7 +34,7 @@ All steps are to be done from the caffe root directory. The dataset is distributed as a list of URLs with corresponding labels. Using a script, we will download a small subset of the data and split it into train and val sets. - caffe % ./examples/finetune_flickr_style/assemble_data.py -h + caffe % ./models/finetune_flickr_style/assemble_data.py -h usage: assemble_data.py [-h] [-s SEED] [-i IMAGES] [-w WORKERS] Download a subset of Flickr Style to a directory @@ -48,7 +48,7 @@ Using a script, we will download a small subset of the data and split it into tr num workers used to download images. -x uses (all - x) cores. - caffe % python examples/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed 831486 + caffe % python models/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed 831486 Downloading 2000 images with 7 workers... Writing train/val for 1939 successfully downloaded images. @@ -56,17 +56,17 @@ This script downloads images and writes train/val file lists into `data/flickr_s With this random seed there are 1,557 train images and 382 test images. The prototxts in this example assume this, and also assume the presence of the ImageNet mean file (run `get_ilsvrc_aux.sh` from `data/ilsvrc12` to obtain this if you haven't yet). -We'll also need the ImageNet-trained model, which you can obtain by running `get_caffe_reference_imagenet_model.sh` from `examples/imagenet`. +We'll also need the ImageNet-trained model, which you can obtain by running `./scripts/download_model_binary.py models/bvlc_reference_caffenet`. Now we can train! (You can fine-tune in CPU mode by leaving out the `-gpu` flag.) - caffe % ./build/tools/caffe train -solver examples/finetune_flickr_style/flickr_style_solver.prototxt -weights examples/imagenet/caffe_reference_imagenet_model -gpu 0 + caffe % ./build/tools/caffe train -solver models/finetune_flickr_style/flickr_style_solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel -gpu 0 [...] I0828 22:10:04.025378 9718 solver.cpp:46] Solver scaffolding done. I0828 22:10:04.025388 9718 caffe.cpp:95] Use GPU with device ID 0 - I0828 22:10:04.192004 9718 caffe.cpp:107] Finetuning from examples/imagenet/caffe_reference_imagenet_model + I0828 22:10:04.192004 9718 caffe.cpp:107] Finetuning from models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel [...] @@ -149,10 +149,16 @@ This model is only beginning to learn. Fine-tuning can be feasible when training from scratch would not be for lack of time or data. Even in CPU mode each pass through the training set takes ~100 s. GPU fine-tuning is of course faster still and can learn a useful model in minutes or hours instead of days or weeks. Furthermore, note that the model has only trained on < 2,000 instances. Transfer learning a new task like style recognition from the ImageNet pretraining can require much less data than training from scratch. + Now try fine-tuning to your own tasks and data! +## Trained model + +We provide a model trained on all 80K images, with final accuracy of 98%. +Simply do `./scripts/download_model_binary.py models/finetune_flickr_style` to obtain it. + ## License The Flickr Style dataset as distributed here contains only URLs to images. Some of the images may have copyright. -Training a category-recognition model for research/non-commercial use may constitute fair use of this data. +Training a category-recognition model for research/non-commercial use may constitute fair use of this data, but the result should not be used for commercial purposes. diff --git a/examples/imagenet/get_caffe_alexnet_model.sh b/examples/imagenet/get_caffe_alexnet_model.sh deleted file mode 100755 index 7312ed93070..00000000000 --- a/examples/imagenet/get_caffe_alexnet_model.sh +++ /dev/null @@ -1,28 +0,0 @@ -#!/usr/bin/env sh -# This scripts downloads the caffe reference imagenet model -# for ilsvrc image classification and deep feature extraction - -MODEL=caffe_alexnet_model -CHECKSUM=29eb495b11613825c1900382f5286963 - -if [ -f $MODEL ]; then - echo "Model already exists. Checking md5..." - os=`uname -s` - if [ "$os" = "Linux" ]; then - checksum=`md5sum $MODEL | awk '{ print $1 }'` - elif [ "$os" = "Darwin" ]; then - checksum=`cat $MODEL | md5` - fi - if [ "$checksum" = "$CHECKSUM" ]; then - echo "Model checksum is correct. No need to download." - exit 0 - else - echo "Model checksum is incorrect. Need to download again." - fi -fi - -echo "Downloading..." - -wget http://dl.caffe.berkeleyvision.org/$MODEL examples/imagenet/$MODEL - -echo "Done. Please run this command again to verify that checksum = $CHECKSUM." diff --git a/examples/imagenet/get_caffe_rcnn_imagenet_model.sh b/examples/imagenet/get_caffe_rcnn_imagenet_model.sh deleted file mode 100755 index 9a8d0a155a0..00000000000 --- a/examples/imagenet/get_caffe_rcnn_imagenet_model.sh +++ /dev/null @@ -1,28 +0,0 @@ -#!/usr/bin/env sh -# This scripts downloads the Caffe R-CNN ImageNet -# for ILSVRC13 detection. - -MODEL=caffe_rcnn_imagenet_model -CHECKSUM=42c1556d2d47a9128c4a90e0a9c5341c - -if [ -f $MODEL ]; then - echo "Model already exists. Checking md5..." - os=`uname -s` - if [ "$os" = "Linux" ]; then - checksum=`md5sum $MODEL | awk '{ print $1 }'` - elif [ "$os" = "Darwin" ]; then - checksum=`cat $MODEL | md5` - fi - if [ "$checksum" = "$CHECKSUM" ]; then - echo "Model checksum is correct. No need to download." - exit 0 - else - echo "Model checksum is incorrect. Need to download again." - fi -fi - -echo "Downloading..." - -wget http://dl.caffe.berkeleyvision.org/$MODEL examples/imagenet/$MODEL - -echo "Done. Please run this command again to verify that checksum = $CHECKSUM." diff --git a/examples/imagenet/get_caffe_reference_imagenet_model.sh b/examples/imagenet/get_caffe_reference_imagenet_model.sh deleted file mode 100755 index f687ebfa79e..00000000000 --- a/examples/imagenet/get_caffe_reference_imagenet_model.sh +++ /dev/null @@ -1,28 +0,0 @@ -#!/usr/bin/env sh -# This scripts downloads the caffe reference imagenet model -# for ilsvrc image classification and deep feature extraction - -MODEL=caffe_reference_imagenet_model -CHECKSUM=af678f0bd3cdd2437e35679d88665170 - -if [ -f $MODEL ]; then - echo "Model already exists. Checking md5..." - os=`uname -s` - if [ "$os" = "Linux" ]; then - checksum=`md5sum $MODEL | awk '{ print $1 }'` - elif [ "$os" = "Darwin" ]; then - checksum=`cat $MODEL | md5` - fi - if [ "$checksum" = "$CHECKSUM" ]; then - echo "Model checksum is correct. No need to download." - exit 0 - else - echo "Model checksum is incorrect. Need to download again." - fi -fi - -echo "Downloading..." - -wget http://dl.caffe.berkeleyvision.org/$MODEL examples/imagenet/$MODEL - -echo "Done. Please run this command again to verify that checksum = $CHECKSUM." diff --git a/examples/imagenet/imagenet_full_conv.prototxt b/examples/imagenet/imagenet_full_conv.prototxt index 570efae5901..395b0f0162f 100644 --- a/examples/imagenet/imagenet_full_conv.prototxt +++ b/examples/imagenet/imagenet_full_conv.prototxt @@ -1,3 +1,4 @@ +# This file is for the net_surgery.ipynb example notebook. name: "CaffeNetConv" input: "data" input_dim: 1 diff --git a/examples/imagenet/readme.md b/examples/imagenet/readme.md index b4a3110ecb6..41384f9475b 100644 --- a/examples/imagenet/readme.md +++ b/examples/imagenet/readme.md @@ -1,26 +1,24 @@ --- title: ImageNet tutorial -description: Train and test "CaffeNet" on ImageNet challenge data. +description: Train and test "CaffeNet" on ImageNet data. category: example include_in_docs: true priority: 1 --- -Yangqing's Recipe on Brewing ImageNet -===================================== +Brewing ImageNet +================ - "All your braincells are belong to us." - - Caffeine - -We are going to describe a reference implementation for the approach first proposed by Krizhevsky, Sutskever, and Hinton in their [NIPS 2012 paper](http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf). Since training the whole model takes some time and energy, we provide a model, trained in the same way as we describe here, to help fight global warming. If you would like to simply use the pretrained model, check out the [Pretrained ImageNet](../../getting_pretrained_models.html) page. *Note that the pretrained model is for academic research / non-commercial use only*. - -To clarify, by ImageNet we actually mean the ILSVRC12 challenge, but you can easily train on the whole of ImageNet as well, just with more disk space, and a little longer training time. - -(If you don't get the quote, visit [Yann LeCun's fun page](http://yann.lecun.com/ex/fun/). +This guide is meant to get you ready to train your own model on your own data. +If you just want an ImageNet-trained network, then note that since training takes a lot of energy and we hate global warming, we provide the CaffeNet model trained as described below in the [model zoo](/model_zoo.html). Data Preparation ---------------- +*The guide specifies all paths and assumes all commands are executed from the root caffe directory.* + +*By "ImageNet" we here mean the ILSVRC12 challenge, but you can easily train on the whole of ImageNet as well, just with more disk space, and a little longer training time.* + We assume that you already have downloaded the ImageNet training data and validation data, and they are stored on your disk like: /path/to/imagenet/train/n01440764/n01440764_10026.JPEG @@ -28,44 +26,39 @@ We assume that you already have downloaded the ImageNet training data and valida You will first need to prepare some auxiliary data for training. This data can be downloaded by: - cd $CAFFE_ROOT/data/ilsvrc12/ - ./get_ilsvrc_aux.sh + ./data/get_ilsvrc_aux.sh The training and validation input are described in `train.txt` and `val.txt` as text listing all the files and their labels. Note that we use a different indexing for labels than the ILSVRC devkit: we sort the synset names in their ASCII order, and then label them from 0 to 999. See `synset_words.txt` for the synset/name mapping. -You may want to resize the images to 256x256 in advance. By default, we do not explicitly do this because in a cluster environment, one may benefit from resizing images in a parallel fashion, using mapreduce. For example, Yangqing used his lightedweighted [mincepie](https://github.com/Yangqing/mincepie) package to do mapreduce on the Berkeley cluster. If you would things to be rather simple and straightforward, you can also use shell commands, something like: +You may want to resize the images to 256x256 in advance. By default, we do not explicitly do this because in a cluster environment, one may benefit from resizing images in a parallel fashion, using mapreduce. For example, Yangqing used his lightweight [mincepie](https://github.com/Yangqing/mincepie) package. If you prefer things to be simpler, you can also use shell commands, something like: for name in /path/to/imagenet/val/*.JPEG; do convert -resize 256x256\! $name $name done -Go to `$CAFFE_ROOT/examples/imagenet/` for the rest of this guide. - -Take a look at `create_imagenet.sh`. Set the paths to the train and val dirs as needed, and set "RESIZE=true" to resize all images to 256x256 if you haven't resized the images in advance. -Now simply create the leveldbs with `./create_imagenet.sh`. Note that `ilsvrc12_train_leveldb` and `ilsvrc12_val_leveldb` should not exist before this execution. It will be created by the script. `GLOG_logtostderr=1` simply dumps more information for you to inspect, and you can safely ignore it. +Take a look at `examples/imagenet/create_imagenet.sh`. Set the paths to the train and val dirs as needed, and set "RESIZE=true" to resize all images to 256x256 if you haven't resized the images in advance. +Now simply create the leveldbs with `examples/imagenet/create_imagenet.sh`. Note that `examples/imagenet/ilsvrc12_train_leveldb` and `examples/imagenet/ilsvrc12_val_leveldb` should not exist before this execution. It will be created by the script. `GLOG_logtostderr=1` simply dumps more information for you to inspect, and you can safely ignore it. Compute Image Mean ------------------ The model requires us to subtract the image mean from each image, so we have to compute the mean. `tools/compute_image_mean.cpp` implements that - it is also a good example to familiarize yourself on how to manipulate the multiple components, such as protocol buffers, leveldbs, and logging, if you are not familiar with them. Anyway, the mean computation can be carried out as: - ./make_imagenet_mean.sh + ./examples/imagenet/make_imagenet_mean.sh which will make `data/ilsvrc12/imagenet_mean.binaryproto`. -Network Definition ------------------- - -The network definition follows strictly the one in Krizhevsky et al. You can find the detailed definition at `examples/imagenet/imagenet_train_val.prototxt`. Note the paths in the data layer --- if you have not followed the exact paths in this guide you will need to change the following lines: +Model Definition +---------------- - source: "ilvsrc12_train_leveldb" - mean_file: "../../data/ilsvrc12/imagenet_mean.binaryproto" +We are going to describe a reference implementation for the approach first proposed by Krizhevsky, Sutskever, and Hinton in their [NIPS 2012 paper](http://books.nips.cc/papers/files/nips25/NIPS2012_0534.pdf). -to point to your own leveldb and image mean. +The network definition (`models/bvlc_reference_caffenet/train_val.prototxt`) follows the one in Krizhevsky et al. +Note that if you deviated from file paths suggested in this guide, you'll need to adjust the relevant paths in the `.prototxt` files. -If you look carefully at `imagenet_train_val.prototxt`, you will notice several `include` sections specifying either `phase: TRAIN` or `phase: TEST`. These sections allow us to define two closely related networks in one file: the network used for training and the network used for testing. These two networks are almost identical, sharing all layers except for those marked with `include { phase: TRAIN }` or `include { phase: TEST }`. In this case, only the input layers and one output layer are different. +If you look carefully at `models/bvlc_reference_caffenet/train_val.prototxt`, you will notice several `include` sections specifying either `phase: TRAIN` or `phase: TEST`. These sections allow us to define two closely related networks in one file: the network used for training and the network used for testing. These two networks are almost identical, sharing all layers except for those marked with `include { phase: TRAIN }` or `include { phase: TEST }`. In this case, only the input layers and one output layer are different. -**Input layer differences:** The training network's `data` input layer draws its data from `ilsvrc12_train_leveldb` and randomly mirrors the input image. The testing network's `data` layer takes data from `ilsvrc12_val_leveldb` and does not perform random mirroring. +**Input layer differences:** The training network's `data` input layer draws its data from `examples/imagenet/ilsvrc12_train_leveldb` and randomly mirrors the input image. The testing network's `data` layer takes data from `examples/imagenet/ilsvrc12_val_leveldb` and does not perform random mirroring. **Output layer differences:** Both networks output the `softmax_loss` layer, which in training is used to compute the loss function and to initialize the backpropagation, while in validation this loss is simply reported. The testing network also has a second output layer, `accuracy`, which is used to report the accuracy on the test set. In the process of training, the test network will occasionally be instantiated and tested on the test set, producing lines like `Test score #0: xxx` and `Test score #1: xxx`. In this case score 0 is the accuracy (which will start around 1/1000 = 0.001 for an untrained network) and score 1 is the loss (which will start around 7 for an untrained network). @@ -78,33 +71,35 @@ We will also lay out a protocol buffer for running the solver. Let's make a few * The network will be trained with momentum 0.9 and a weight decay of 0.0005. * For every 10,000 iterations, we will take a snapshot of the current status. -Sound good? This is implemented in `examples/imagenet/imagenet_solver.prototxt`. Again, you will need to change the first line: - - net: "imagenet_train_val.prototxt" - -to point to the actual path if you have changed it. +Sound good? This is implemented in `models/bvlc_reference_caffenet/solver.prototxt`. Training ImageNet ----------------- Ready? Let's train. - ./train_imagenet.sh + ./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt + +Sit back and enjoy! + +On a K40 machine, every 20 iterations take about 26.5 seconds to run (while a on a K20 this takes 36 seconds), so effectively about 5.2 ms per image for the full forward-backward pass. About 2 ms of this is on forward, and the rest is backward. If you are interested in dissecting the computation time, you can run -Sit back and enjoy! On my K20 machine, every 20 iterations take about 36 seconds to run, so effectively about 7 ms per image for the full forward-backward pass. About 2.5 ms of this is on forward, and the rest is backward. If you are interested in dissecting the computation time, you can look at `examples/net_speed_benchmark.cpp`, but it was written purely for debugging purpose, so you may need to figure a few things out yourself. + ./build/tools/caffe time --model=models/bvlc_reference_caffenet/train_val.prototxt Resume Training? ---------------- -We all experience times when the power goes out, or we feel like rewarding ourself a little by playing Battlefield (does someone still remember Quake?). Since we are snapshotting intermediate results during training, we will be able to resume from snapshots. This can be done as easy as: +We all experience times when the power goes out, or we feel like rewarding ourself a little by playing Battlefield (does anyone still remember Quake?). Since we are snapshotting intermediate results during training, we will be able to resume from snapshots. This can be done as easy as: - ./resume_training.sh + ./build/tools/caffe train --solver=models/bvlc_reference_caffenet/solver.prototxt --snapshot=models/bvlc_reference_caffenet/caffenet_train_10000.solverstate -where in the script `caffe_imagenet_train_1000.solverstate` is the solver state snapshot that stores all necessary information to recover the exact solver state (including the parameters, momentum history, etc). +where in the script `caffenet_train_10000.solverstate` is the solver state snapshot that stores all necessary information to recover the exact solver state (including the parameters, momentum history, etc). Parting Words ------------- -Hope you liked this recipe! Many researchers have gone further since the ILSVRC 2012 challenge, changing the network architecture and/or finetuning the various parameters in the network. The recent ILSVRC 2013 challenge suggests that there are quite some room for improvement. **Caffe allows one to explore different network choices more easily, by simply writing different prototxt files** - isn't that exciting? +Hope you liked this recipe! +Many researchers have gone further since the ILSVRC 2012 challenge, changing the network architecture and/or fine-tuning the various parameters in the network to address new data and tasks. +**Caffe lets you explore different network choices more easily by simply writing different prototxt files** - isn't that exciting? -And since now you have a trained network, check out how to use it: [Running Pretrained ImageNet](../../getting_pretrained_models.html). This time we will use Python, but if you have wrappers for other languages, please kindly send a pull request! +And since now you have a trained network, check out how to use it with the Python interface for [classifying ImageNet](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/classification.ipynb). diff --git a/examples/imagenet/resume_training.sh b/examples/imagenet/resume_training.sh deleted file mode 100755 index 3c964b56ffc..00000000000 --- a/examples/imagenet/resume_training.sh +++ /dev/null @@ -1,7 +0,0 @@ -#!/usr/bin/env sh - -./build/tools/caffe train \ - --solver=examples/imagenet/imagenet_solver.prototxt \ - --snapshot=examples/imagenet/caffe_imagenet_10000.solverstate - -echo "Done." diff --git a/examples/imagenet/time_imagenet.sh b/examples/imagenet/time_imagenet.sh deleted file mode 100755 index 3f46e0e0f97..00000000000 --- a/examples/imagenet/time_imagenet.sh +++ /dev/null @@ -1,17 +0,0 @@ -#!/usr/bin/env sh - -if [ -z "$1" ]; then - echo "Using CPU! To time GPU mode, use:" - echo " ./time_imagenet.sh " - echo "(Try ./time_imagenet.sh 0 if you have just one GPU.)" - sleep 3 # Let the user read - GPU="" -else - GPU="--gpu=$1" -fi - -./build/tools/caffe time \ - --model=examples/imagenet/imagenet_train_val.prototxt \ - ${GPU} - -echo "Done." diff --git a/examples/imagenet/train_alexnet.sh b/examples/imagenet/train_alexnet.sh deleted file mode 100755 index 1ddcbeee4b0..00000000000 --- a/examples/imagenet/train_alexnet.sh +++ /dev/null @@ -1,5 +0,0 @@ -#!/usr/bin/env sh - -./build/tools/caffe train --solver=examples/imagenet/alexnet_solver.prototxt - -echo "Done." diff --git a/examples/imagenet/train_imagenet.sh b/examples/imagenet/train_imagenet.sh deleted file mode 100755 index cba2ad59581..00000000000 --- a/examples/imagenet/train_imagenet.sh +++ /dev/null @@ -1,5 +0,0 @@ -#!/usr/bin/env sh - -./build/tools/caffe train --solver=examples/imagenet/imagenet_solver.prototxt - -echo "Done." diff --git a/examples/net_surgery.ipynb b/examples/net_surgery.ipynb index 2d8bbb106be..0854018af4a 100644 --- a/examples/net_surgery.ipynb +++ b/examples/net_surgery.ipynb @@ -2,8 +2,7 @@ "metadata": { "description": "How to do net surgery and manually change model parameters, making a fully-convolutional classifier for dense feature extraction.", "example_name": "Editing model parameters", - "include_in_docs": true, - "signature": "sha256:10c551b31a64c2210f6094dbb603f26c206a7b72cd99032f475cb5023edcdc43" + "include_in_docs": true }, "nbformat": 3, "nbformat_minor": 0, @@ -27,7 +26,7 @@ "cell_type": "code", "collapsed": false, "input": [ - "!diff imagenet/imagenet_full_conv.prototxt imagenet/imagenet_deploy.prototxt" + "!diff imagenet/imagenet_full_conv.prototxt ../models/bvlc_reference_caffenet/deploy.prototxt" ], "language": "python", "metadata": {}, @@ -144,7 +143,7 @@ "import caffe\n", "\n", "# Load the original network and extract the fully-connected layers' parameters.\n", - "net = caffe.Net('imagenet/imagenet_deploy.prototxt', 'imagenet/caffe_reference_imagenet_model')\n", + "net = caffe.Net('../models/bvlc_reference_caffenet/deploy.prototxt', 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')\n", "params = ['fc6', 'fc7', 'fc8']\n", "# fc_params = {name: (weights, biases)}\n", "fc_params = {pr: (net.params[pr][0].data, net.params[pr][1].data) for pr in params}\n", @@ -179,7 +178,7 @@ "collapsed": false, "input": [ "# Load the fully-convolutional network to transplant the parameters.\n", - "net_full_conv = caffe.Net('imagenet/imagenet_full_conv.prototxt', 'imagenet/caffe_reference_imagenet_model')\n", + "net_full_conv = caffe.Net('imagenet/imagenet_full_conv.prototxt', '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')\n", "params_full_conv = ['fc6-conv', 'fc7-conv', 'fc8-conv']\n", "# conv_params = {name: (weights, biases)}\n", "conv_params = {pr: (net_full_conv.params[pr][0].data, net_full_conv.params[pr][1].data) for pr in params_full_conv}\n", @@ -350,4 +349,4 @@ "metadata": {} } ] -} \ No newline at end of file +} diff --git a/examples/web_demo/app.py b/examples/web_demo/app.py index f7f46ce6c02..d33fc92f078 100644 --- a/examples/web_demo/app.py +++ b/examples/web_demo/app.py @@ -98,9 +98,9 @@ def allowed_file(filename): class ImagenetClassifier(object): default_args = { 'model_def_file': ( - '{}/examples/imagenet/imagenet_deploy.prototxt'.format(REPO_DIRNAME)), + '{}/models/bvlc_reference_caffenet/deploy.prototxt'.format(REPO_DIRNAME)), 'pretrained_model_file': ( - '{}/examples/imagenet/caffe_reference_imagenet_model'.format(REPO_DIRNAME)), + '{}/models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'.format(REPO_DIRNAME)), 'mean_file': ( '{}/python/caffe/imagenet/ilsvrc_2012_mean.npy'.format(REPO_DIRNAME)), 'class_labels_file': ( diff --git a/examples/web_demo/readme.md b/examples/web_demo/readme.md index 3c8fdc068e7..fe74b9ef7d3 100644 --- a/examples/web_demo/readme.md +++ b/examples/web_demo/readme.md @@ -13,7 +13,11 @@ priority: 10 The demo server requires Python with some dependencies. To make sure you have the dependencies, please run `pip install -r examples/web_demo/requirements.txt`, and also make sure that you've compiled the Python Caffe interface and that it is on your `PYTHONPATH` (see [installation instructions](/installation.html)). -Make sure that you have obtained the Caffe Reference ImageNet Model and the ImageNet Auxiliary Data ([instructions](/getting_pretrained_models.html)). +Make sure that you have obtained the Reference CaffeNet Model and the ImageNet Auxiliary Data: + + ./scripts/download_model_binary.py models/bvlc_reference_caffenet + ./data/ilsvrc12/get_ilsvrc_aux.sh + NOTE: if you run into trouble, try re-downloading the auxiliary files. ## Run diff --git a/matlab/caffe/matcaffe_batch.m b/matlab/caffe/matcaffe_batch.m index 3cb7f1445fb..f6d1aa83b84 100644 --- a/matlab/caffe/matcaffe_batch.m +++ b/matlab/caffe/matcaffe_batch.m @@ -27,9 +27,8 @@ filename = list_im; list_im = read_cell(filename); end -% Adjust the batch size to match with imagenet_deploy.prototxt +% Adjust the batch size and dim to match with models/bvlc_reference_caffenet/deploy.prototxt batch_size = 10; -% Adjust dim to the output size of imagenet_deploy.prototxt dim = 1000; disp(list_im) if mod(length(list_im),batch_size) diff --git a/matlab/caffe/matcaffe_init.m b/matlab/caffe/matcaffe_init.m index 4e4ef8bff4a..7cc6935758e 100644 --- a/matlab/caffe/matcaffe_init.m +++ b/matlab/caffe/matcaffe_init.m @@ -8,11 +8,11 @@ function matcaffe_init(use_gpu, model_def_file, model_file) end if nargin < 2 || isempty(model_def_file) % By default use imagenet_deploy - model_def_file = '../../examples/imagenet/imagenet_deploy.prototxt'; + model_def_file = '../../models/bvlc_reference_caffenet/deploy.prototxt'; end if nargin < 3 || isempty(model_file) % By default use caffe reference model - model_file = '../../examples/imagenet/caffe_reference_imagenet_model'; + model_file = '../../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'; end diff --git a/examples/imagenet/alexnet_deploy.prototxt b/models/bvlc_alexnet/deploy.prototxt similarity index 100% rename from examples/imagenet/alexnet_deploy.prototxt rename to models/bvlc_alexnet/deploy.prototxt diff --git a/models/bvlc_alexnet/readme.md b/models/bvlc_alexnet/readme.md new file mode 100644 index 00000000000..20c393ff26b --- /dev/null +++ b/models/bvlc_alexnet/readme.md @@ -0,0 +1,25 @@ +--- +name: BVLC AlexNet Model +caffemodel: bvlc_alexnet.caffemodel +caffemodel_url: http://dl.caffe.berkeleyvision.org/bvlc_alexnet.caffemodel +license: non-commercial +sha1: 9116a64c0fbe4459d18f4bb6b56d647b63920377 +caffe_commit: 709dc15af4a06bebda027c1eb2b3f3e3375d5077 +--- + +This model is a replication of the model described in the [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) publication. + +Differences: +- not training with the relighting data-augmentation; +- initializing non-zero biases to 0.1 instead of 1 (found necessary for training, as initialization to 1 gave flat loss). + +The bundled model is the iteration 360,000 snapshot. +The best validation performance during training was iteration 358,000 with validation accuracy 57.258% and loss 1.83948. +This model obtains a top-1 accuracy 57.1% and a top-5 accuracy 80.2% on the validation set, using just the center crop. +(Using the average of 10 crops, (4 + 1 center) * 2 mirror, should obtain a bit higher accuracy.) + +## License + +The data used to train this model comes from the ImageNet project, which distributes its database to researchers who agree to a following term of access: +"Researcher shall use the Database only for non-commercial research and educational purposes." +Accordingly, this model is distributed under a non-commercial license. diff --git a/examples/imagenet/imagenet_solver.prototxt b/models/bvlc_alexnet/solver.prototxt similarity index 64% rename from examples/imagenet/imagenet_solver.prototxt rename to models/bvlc_alexnet/solver.prototxt index 5b5be4bb8a9..129265e679b 100644 --- a/examples/imagenet/imagenet_solver.prototxt +++ b/models/bvlc_alexnet/solver.prototxt @@ -1,4 +1,4 @@ -net: "examples/imagenet/imagenet_train_val.prototxt" +net: "models/bvlc_alexnet/train_val.prototxt" test_iter: 1000 test_interval: 1000 base_lr: 0.01 @@ -10,5 +10,5 @@ max_iter: 450000 momentum: 0.9 weight_decay: 0.0005 snapshot: 10000 -snapshot_prefix: "examples/imagenet/caffe_imagenet" +snapshot_prefix: "models/bvlc_alexnet/caffe_alexnet_train" solver_mode: GPU diff --git a/examples/imagenet/alexnet_train_val.prototxt b/models/bvlc_alexnet/train_val.prototxt similarity index 100% rename from examples/imagenet/alexnet_train_val.prototxt rename to models/bvlc_alexnet/train_val.prototxt index 3fa46773403..69b8916d769 100644 --- a/examples/imagenet/alexnet_train_val.prototxt +++ b/models/bvlc_alexnet/train_val.prototxt @@ -34,6 +34,8 @@ layers { layers { name: "conv1" type: CONVOLUTION + bottom: "data" + top: "conv1" blobs_lr: 1 blobs_lr: 2 weight_decay: 1 @@ -51,8 +53,6 @@ layers { value: 0 } } - bottom: "data" - top: "conv1" } layers { name: "relu1" @@ -63,28 +63,30 @@ layers { layers { name: "norm1" type: LRN + bottom: "conv1" + top: "norm1" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } - bottom: "conv1" - top: "norm1" } layers { name: "pool1" type: POOLING + bottom: "norm1" + top: "pool1" pooling_param { pool: MAX kernel_size: 3 stride: 2 } - bottom: "norm1" - top: "pool1" } layers { name: "conv2" type: CONVOLUTION + bottom: "pool1" + top: "conv2" blobs_lr: 1 blobs_lr: 2 weight_decay: 1 @@ -103,8 +105,6 @@ layers { value: 0.1 } } - bottom: "pool1" - top: "conv2" } layers { name: "relu2" @@ -115,28 +115,30 @@ layers { layers { name: "norm2" type: LRN + bottom: "conv2" + top: "norm2" lrn_param { local_size: 5 alpha: 0.0001 beta: 0.75 } - bottom: "conv2" - top: "norm2" } layers { name: "pool2" type: POOLING + bottom: "norm2" + top: "pool2" pooling_param { pool: MAX kernel_size: 3 stride: 2 } - bottom: "norm2" - top: "pool2" } layers { name: "conv3" type: CONVOLUTION + bottom: "pool2" + top: "conv3" blobs_lr: 1 blobs_lr: 2 weight_decay: 1 @@ -154,8 +156,6 @@ layers { value: 0 } } - bottom: "pool2" - top: "conv3" } layers { name: "relu3" @@ -166,6 +166,8 @@ layers { layers { name: "conv4" type: CONVOLUTION + bottom: "conv3" + top: "conv4" blobs_lr: 1 blobs_lr: 2 weight_decay: 1 @@ -184,8 +186,6 @@ layers { value: 0.1 } } - bottom: "conv3" - top: "conv4" } layers { name: "relu4" @@ -196,6 +196,8 @@ layers { layers { name: "conv5" type: CONVOLUTION + bottom: "conv4" + top: "conv5" blobs_lr: 1 blobs_lr: 2 weight_decay: 1 @@ -214,8 +216,6 @@ layers { value: 0.1 } } - bottom: "conv4" - top: "conv5" } layers { name: "relu5" @@ -226,17 +226,19 @@ layers { layers { name: "pool5" type: POOLING + bottom: "conv5" + top: "pool5" pooling_param { pool: MAX kernel_size: 3 stride: 2 } - bottom: "conv5" - top: "pool5" } layers { name: "fc6" type: INNER_PRODUCT + bottom: "pool5" + top: "fc6" blobs_lr: 1 blobs_lr: 2 weight_decay: 1 @@ -252,8 +254,6 @@ layers { value: 0.1 } } - bottom: "pool5" - top: "fc6" } layers { name: "relu6" @@ -264,15 +264,17 @@ layers { layers { name: "drop6" type: DROPOUT + bottom: "fc6" + top: "fc6" dropout_param { dropout_ratio: 0.5 } - bottom: "fc6" - top: "fc6" } layers { name: "fc7" type: INNER_PRODUCT + bottom: "fc6" + top: "fc7" blobs_lr: 1 blobs_lr: 2 weight_decay: 1 @@ -288,8 +290,6 @@ layers { value: 0.1 } } - bottom: "fc6" - top: "fc7" } layers { name: "relu7" @@ -300,15 +300,17 @@ layers { layers { name: "drop7" type: DROPOUT + bottom: "fc7" + top: "fc7" dropout_param { dropout_ratio: 0.5 } - bottom: "fc7" - top: "fc7" } layers { name: "fc8" type: INNER_PRODUCT + bottom: "fc7" + top: "fc8" blobs_lr: 1 blobs_lr: 2 weight_decay: 1 @@ -324,8 +326,6 @@ layers { value: 0 } } - bottom: "fc7" - top: "fc8" } layers { name: "accuracy" diff --git a/examples/imagenet/imagenet_deploy.prototxt b/models/bvlc_reference_caffenet/deploy.prototxt similarity index 100% rename from examples/imagenet/imagenet_deploy.prototxt rename to models/bvlc_reference_caffenet/deploy.prototxt diff --git a/models/bvlc_reference_caffenet/readme.md b/models/bvlc_reference_caffenet/readme.md new file mode 100644 index 00000000000..d1c6269ae73 --- /dev/null +++ b/models/bvlc_reference_caffenet/readme.md @@ -0,0 +1,25 @@ +--- +name: BVLC CaffeNet Model +caffemodel: bvlc_reference_caffenet.caffemodel +caffemodel_url: http://dl.caffe.berkeleyvision.org/bvlc_reference_caffenet.caffemodel +license: non-commercial +sha1: 4c8d77deb20ea792f84eb5e6d0a11ca0a8660a46 +caffe_commit: 709dc15af4a06bebda027c1eb2b3f3e3375d5077 +--- + +This model is the result of following the Caffe [ImageNet model training instructions](http://caffe.berkeleyvision.org/gathered/examples/imagenet.html). +It is a replication of the model described in the [AlexNet](http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks) publication with some differences: + +- not training with the relighting data-augmentation; +- the order of pooling and normalization layers is switched (in CaffeNet, pooling is done before normalization). + +This model is snapshot of iteration 310,000. +The best validation performance during training was iteration 313,000 with validation accuracy 57.412% and loss 1.82328. +This model obtains a top-1 accuracy 57.4% and a top-5 accuracy 80.4% on the validation set, using just the center crop. +(Using the average of 10 crops, (4 + 1 center) * 2 mirror, should obtain a bit higher accuracy still.) + +## License + +The data used to train this model comes from the ImageNet project, which distributes its database to researchers who agree to a following term of access: +"Researcher shall use the Database only for non-commercial research and educational purposes." +Accordingly, this model is distributed under a non-commercial license. diff --git a/examples/imagenet/alexnet_solver.prototxt b/models/bvlc_reference_caffenet/solver.prototxt similarity index 61% rename from examples/imagenet/alexnet_solver.prototxt rename to models/bvlc_reference_caffenet/solver.prototxt index 94bda7f36a5..af1315ba2ac 100644 --- a/examples/imagenet/alexnet_solver.prototxt +++ b/models/bvlc_reference_caffenet/solver.prototxt @@ -1,4 +1,4 @@ -net: "examples/imagenet/alexnet_train_val.prototxt" +net: "models/bvlc_reference_caffenet/train_val.prototxt" test_iter: 1000 test_interval: 1000 base_lr: 0.01 @@ -10,5 +10,5 @@ max_iter: 450000 momentum: 0.9 weight_decay: 0.0005 snapshot: 10000 -snapshot_prefix: "examples/imagenet/caffe_alexnet" +snapshot_prefix: "models/bvlc_reference_caffenet/caffenet_train" solver_mode: GPU diff --git a/examples/imagenet/imagenet_train_val.prototxt b/models/bvlc_reference_caffenet/train_val.prototxt similarity index 100% rename from examples/imagenet/imagenet_train_val.prototxt rename to models/bvlc_reference_caffenet/train_val.prototxt diff --git a/examples/imagenet/rcnn_imagenet_deploy.prototxt b/models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt similarity index 100% rename from examples/imagenet/rcnn_imagenet_deploy.prototxt rename to models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt diff --git a/models/bvlc_reference_rcnn_ilsvrc13/readme.md b/models/bvlc_reference_rcnn_ilsvrc13/readme.md new file mode 100644 index 00000000000..fb8f26d15df --- /dev/null +++ b/models/bvlc_reference_rcnn_ilsvrc13/readme.md @@ -0,0 +1,20 @@ +--- +name: BVLC Reference RCNN ILSVRC13 Model +caffemodel: bvlc_reference_rcnn_ilsvrc13.caffemodel +caffemodel_url: http://dl.caffe.berkeleyvision.org/bvlc_reference_rcnn_ilsvrc13.caffemodel +license: non-commercial +sha1: bdd8abb885819cba5e2fe1eb36235f2319477e64 +caffe_commit: a7e397abbda52c0b90323c23ab95bdeabee90a98 +--- + +The pure Caffe instantiation of the [R-CNN](https://github.com/rbgirshick/rcnn) model for ILSVRC13 detection. +This model was made by transplanting the R-CNN SVM classifiers into a `fc-rcnn` classification layer, provided here as an off-the-shelf Caffe detector. +Try the [detection example](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/detection.ipynb) to see it in action. + +*N.B. For research purposes, make use of the official R-CNN package and not this example.* + +## License + +The data used to train this model comes from the ImageNet project, which distributes its database to researchers who agree to a following term of access: +"Researcher shall use the Database only for non-commercial research and educational purposes." +Accordingly, this model is distributed under a non-commercial license. diff --git a/models/finetune_flickr_style/readme.md b/models/finetune_flickr_style/readme.md new file mode 100644 index 00000000000..c08485f7fd3 --- /dev/null +++ b/models/finetune_flickr_style/readme.md @@ -0,0 +1,21 @@ +--- +name: Finetuning CaffeNet on Flickr Style +caffemodel: finetune_flickr_style.caffemodel +caffemodel_url: http://dl.caffe.berkeleyvision.org/finetune_flickr_style.caffemodel +license: non-commercial +sha1: 443ad95a61fb0b5cd3cee55951bcc1f299186b5e +caffe_commit: 41751046f18499b84dbaf529f64c0e664e2a09fe +gist_id: 034c6ac3865563b69e60 +--- + +This model is trained exactly as described in `docs/finetune_flickr_style/readme.md`, using all 80000 images. +The final performance on the test set: + + I0903 18:40:59.211707 11585 caffe.cpp:167] Loss: 0.407405 + I0903 18:40:59.211717 11585 caffe.cpp:179] accuracy = 0.9164 + +## License + +The Flickr Style dataset contains only URLs to images. +Some of the images may have copyright. +Training a category-recognition model for research/non-commercial use may constitute fair use of this data, but the result should not be used for commercial purposes. diff --git a/examples/finetune_flickr_style/flickr_style_solver.prototxt b/models/finetune_flickr_style/solver.prototxt similarity index 74% rename from examples/finetune_flickr_style/flickr_style_solver.prototxt rename to models/finetune_flickr_style/solver.prototxt index 756e162ba1e..5e189bc93c0 100644 --- a/examples/finetune_flickr_style/flickr_style_solver.prototxt +++ b/models/finetune_flickr_style/solver.prototxt @@ -1,4 +1,4 @@ -net: "examples/finetune_flickr_style/flickr_style_train_val.prototxt" +net: "models/finetune_flickr_style/train_val.prototxt" test_iter: 100 test_interval: 1000 # lr for fine-tuning should be lower than when starting from scratch @@ -12,6 +12,6 @@ max_iter: 100000 momentum: 0.9 weight_decay: 0.0005 snapshot: 10000 -snapshot_prefix: "examples/finetune_flickr_style/flickr_style" +snapshot_prefix: "models/finetune_flickr_style/finetune_flickr_style" # uncomment the following to default to CPU mode solving # solver_mode: CPU diff --git a/examples/finetune_flickr_style/flickr_style_train_val.prototxt b/models/finetune_flickr_style/train_val.prototxt similarity index 100% rename from examples/finetune_flickr_style/flickr_style_train_val.prototxt rename to models/finetune_flickr_style/train_val.prototxt diff --git a/python/classify.py b/python/classify.py index ddc5429feac..873b5e38f19 100755 --- a/python/classify.py +++ b/python/classify.py @@ -31,13 +31,13 @@ def main(argv): parser.add_argument( "--model_def", default=os.path.join(pycaffe_dir, - "../examples/imagenet/imagenet_deploy.prototxt"), + "../models/bvlc_reference_caffenet/deploy.prototxt"), help="Model definition file." ) parser.add_argument( "--pretrained_model", default=os.path.join(pycaffe_dir, - "../examples/imagenet/caffe_reference_imagenet_model"), + "../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel"), help="Trained model weights file." ) parser.add_argument( diff --git a/python/detect.py b/python/detect.py index 4598fc7a327..bc8c0703646 100755 --- a/python/detect.py +++ b/python/detect.py @@ -46,13 +46,13 @@ def main(argv): parser.add_argument( "--model_def", default=os.path.join(pycaffe_dir, - "../examples/imagenet/imagenet_deploy.prototxt"), + "../models/bvlc_reference_caffenet/deploy.prototxt.prototxt"), help="Model definition file." ) parser.add_argument( "--pretrained_model", default=os.path.join(pycaffe_dir, - "../examples/imagenet/caffe_reference_imagenet_model"), + "../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel"), help="Trained model weights file." ) parser.add_argument( diff --git a/scripts/download_model_binary.py b/scripts/download_model_binary.py new file mode 100755 index 00000000000..48e9015fd26 --- /dev/null +++ b/scripts/download_model_binary.py @@ -0,0 +1,76 @@ +#!/usr/bin/env python +import os +import sys +import time +import yaml +import urllib +import hashlib +import argparse + +required_keys = ['caffemodel', 'caffemodel_url', 'sha1'] + + +def reporthook(count, block_size, total_size): + """ + From http://blog.moleculea.com/2012/10/04/urlretrieve-progres-indicator/ + """ + global start_time + if count == 0: + start_time = time.time() + return + duration = time.time() - start_time + progress_size = int(count * block_size) + speed = int(progress_size / (1024 * duration)) + percent = int(count * block_size * 100 / total_size) + sys.stdout.write("\r...%d%%, %d MB, %d KB/s, %d seconds passed" % + (percent, progress_size / (1024 * 1024), speed, duration)) + sys.stdout.flush() + + +def parse_readme_frontmatter(dirname): + readme_filename = os.path.join(dirname, 'readme.md') + with open(readme_filename) as f: + lines = [line.strip() for line in f.readlines()] + top = lines.index('---') + bottom = lines[top + 1:].index('---') + frontmatter = yaml.load('\n'.join(lines[top + 1:bottom])) + assert all(key in frontmatter for key in required_keys) + return dirname, frontmatter + + +def valid_dirname(dirname): + try: + return parse_readme_frontmatter(dirname) + except Exception as e: + print('ERROR: {}'.format(e)) + raise argparse.ArgumentTypeError( + 'Must be valid Caffe model directory with a correct readme.md') + + +if __name__ == '__main__': + parser = argparse.ArgumentParser( + description='Download trained model binary.') + parser.add_argument('dirname', type=valid_dirname) + args = parser.parse_args() + + # A tiny hack: the dirname validator also returns readme YAML frontmatter. + dirname = args.dirname[0] + frontmatter = args.dirname[1] + model_filename = os.path.join(dirname, frontmatter['caffemodel']) + + # Closure-d function for checking SHA1. + def model_checks_out(filename=model_filename, sha1=frontmatter['sha1']): + with open(filename, 'r') as f: + return hashlib.sha1(f.read()).hexdigest() == sha1 + + # Check if model exists. + if os.path.exists(model_filename) and model_checks_out(): + print("Model already exists.") + sys.exit(0) + + # Download and verify model. + urllib.urlretrieve( + frontmatter['caffemodel_url'], model_filename, reporthook) + if not model_checks_out(): + print('ERROR: model did not download correctly! Run this again.') + sys.exit(1) diff --git a/scripts/upload_model_to_gist.sh b/scripts/upload_model_to_gist.sh new file mode 100755 index 00000000000..2dfbabd72a3 --- /dev/null +++ b/scripts/upload_model_to_gist.sh @@ -0,0 +1,38 @@ +#!/bin/bash + +# Check for valid directory +DIRNAME=$1 +if [ ! -f $DIRNAME/readme.md ]; then + echo "usage: upload_model_to_gist.sh " + echo " /readme.md must exist" +fi +cd $DIRNAME +FILES=`find . -type f -maxdepth 1 ! -name "*.caffemodel*" | xargs echo` + +# Check for gist tool. +gist -v >/dev/null 2>&1 || { echo >&2 "I require 'gist' but it's not installed. Do 'gem install gist'."; exit 1; } + +NAME=`sed -n 's/^name:[[:space:]]*//p' readme.md` +if [ -z "$NAME" ]; then + echo " /readme.md must contain name field in the front-matter." +fi + +GIST=`sed -n 's/^gist_id:[[:space:]]*//p' readme.md` +if [ -z "$GIST" ]; then + echo "Uploading new Gist" + gist -p -d "$NAME" $FILES +else + echo "Updating existing Gist, id $GIST" + gist -u $GIST -d "$NAME" $FILES +fi + +RESULT=$? +if [ $RESULT -eq 0 ]; then + echo "You've uploaded your model!" + echo "Don't forget to add the gist_id field to your /readme.md now!" + echo "Run the command again after you do that, to make sure the Gist id propagates." + echo "" + echo "And do share your model over at https://github.com/BVLC/caffe/wiki/Model-Zoo" +else + echo "Something went wrong!" +fi diff --git a/src/caffe/solver.cpp b/src/caffe/solver.cpp index dcac4c1537c..d8517c64f85 100644 --- a/src/caffe/solver.cpp +++ b/src/caffe/solver.cpp @@ -316,6 +316,7 @@ void Solver::Snapshot() { char iter_str_buffer[kBufferSize]; snprintf(iter_str_buffer, kBufferSize, "_iter_%d", iter_); filename += iter_str_buffer; + filename += ".caffemodel"; LOG(INFO) << "Snapshotting to " << filename; WriteProtoToBinaryFile(net_param, filename.c_str()); SolverState state;