Merge pull request #917 from sergeyk/model_zoo

Define standard format for Caffe models to open the "model zoo"
BVLC · Sep 4, 2014 · adbea64 · adbea64
2 parents f2324fe + d46f3cd
commit adbea64
Show file tree

Hide file tree

Showing 42 changed files with 385 additions and 278 deletions.
diff --git a/.gitignore b/.gitignore
@@ -51,8 +51,7 @@ Makefile.config
 # 1. reference, and not casually committed
 # 2. custom, and live on their own unless they're deliberated contributed
 data/*
-*model
-*_iter_*
+*.caffemodel
 *.solverstate
 *.binaryproto
 *leveldb

diff --git a/docs/getting_pretrained_models.md b/docs/getting_pretrained_models.md
diff --git a/docs/index.md b/docs/index.md
@@ -38,8 +38,8 @@ Slides about the Caffe architecture, *updated 03/14*.
 A 4-page report for the ACM Multimedia Open Source competition.
 - [Installation instructions](/installation.html)<br />
 Tested on Ubuntu, Red Hat, OS X.
-* [Pre-trained models](/getting_pretrained_models.html)<br />
-BVLC provides ready-to-use models for non-commercial use.
+* [Model Zoo](/model_zoo.html)<br />
+BVLC suggests a standard distribution format for Caffe models, and provides trained models.
 * [Developing & Contributing](/development.html)<br />
 Guidelines for development and contributing to Caffe.
 * [API Documentation](/doxygen/)<br />

diff --git a/docs/model_zoo.md b/docs/model_zoo.md
@@ -0,0 +1,53 @@
+---
+---
+# Caffe Model Zoo
+
+Lots of people have used Caffe to train models of different architectures and applied to different problems, ranging from simple regression to AlexNet-alikes to Siamese networks for image similarity to speech applications.
+To lower the friction of sharing these models, we introduce the model zoo framework:
+
+- A standard format for packaging Caffe model info.
+- Tools to upload/download model info to/from Github Gists, and to download trained `.caffemodel` binaries.
+- A central wiki page for sharing model info Gists.
+
+## Where to get trained models
+
+First of all, we provide some trained models out of the box.
+Each one of these can be downloaded by running `scripts/download_model_binary.py <dirname>` where `<dirname>` is specified below:
+
+- **BVLC Reference CaffeNet** in `models/bvlc_reference_caffenet`: AlexNet trained on ILSVRC 2012, with a minor variation from the version as described in the NIPS 2012 paper.
+- **BVLC AlexNet** in `models/bvlc_alexnet`: AlexNet trained on ILSVRC 2012, almost exactly as described in NIPS 2012.
+- **BVLC Reference R-CNN ILSVRC-2013** in `models/bvlc_reference_rcnn_ilsvrc13`: pure Caffe implementation of [R-CNN](https://github.com/rbgirshick/rcnn).
+
+User-provided models are posted to a public-editable [wiki page](https://github.com/BVLC/caffe/wiki/Model-Zoo).
+
+## Model info format
+
+A caffe model is distributed as a directory containing:
+
+- Solver/model prototxt(s)
+- `readme.md` containing
+    - YAML frontmatter
+        - Caffe version used to train this model (tagged release or commit hash).
+        - [optional] file URL and SHA1 of the trained `.caffemodel`.
+        - [optional] github gist id.
+    - Information about what data the model was trained on, modeling choices, etc.
+    - License information.
+- [optional] Other helpful scripts.
+
+## Hosting model info
+
+Github Gist is a good format for model info distribution because it can contain multiple files, is versionable, and has in-browser syntax highlighting and markdown rendering.
+
+- `scripts/upload_model_to_gist.sh <dirname>`: uploads non-binary files in the model directory as a Github Gist and prints the Gist ID. If `gist_id` is already part of the `<dirname>/readme.md` frontmatter, then updates existing Gist.
+
+Try doing `scripts/upload_model_to_gist.sh models/bvlc_alexnet` to test the uploading (don't forget to delete the uploaded gist afterward).
+
+Downloading models is not yet supported as a script (there is no good commandline tool for this right now), so simply go to the Gist URL and click "Download Gist" for now.
+
+### Hosting trained models
+
+It is up to the user where to host the `.caffemodel` file.
+We host our BVLC-provided models on our own server.
+Dropbox also works fine (tip: make sure that `?dl=1` is appended to the end of the URL).
+
+- `scripts/download_model_binary.py <dirname>`: downloads the `.caffemodel` from the URL specified in the `<dirname>/readme.md` frontmatter and confirms SHA1.
diff --git a/examples/classification.ipynb b/examples/classification.ipynb
@@ -2,8 +2,7 @@
  "metadata": {
   "description": "Use the pre-trained ImageNet model to classify images with the Python interface.",
   "example_name": "ImageNet classification",
-  "include_in_docs": true,
-  "signature": "sha256:4f8d4c079c30d20ef4b6818e9672b1741fd1377354e5b83e291710736cecd24f"
+  "include_in_docs": true
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -19,7 +18,7 @@
       "\n",
       "Caffe provides a general Python interface for models with `caffe.Net` in `python/caffe/pycaffe.py`, but to make off-the-shelf classification easy we provide a `caffe.Classifier` class and `classify.py` script. Both Python and MATLAB wrappers are provided. However, the Python wrapper has more features so we will describe it here. For MATLAB, refer to `matlab/caffe/matcaffe_demo.m`.\n",
       "\n",
-      "Before we begin, you must compile Caffe and install the python wrapper by setting your `PYTHONPATH`. If you haven't yet done so, please refer to the [installation instructions](installation.html). This example uses our pre-trained ImageNet model, an ILSVRC12 image classifier. You can download it (232.57MB) by running `examples/imagenet/get_caffe_reference_imagenet_model.sh`. Note that this pre-trained model is licensed for academic research / non-commercial use only.\n",
+      "Before we begin, you must compile Caffe and install the python wrapper by setting your `PYTHONPATH`. If you haven't yet done so, please refer to the [installation instructions](installation.html). This example uses our pre-trained CaffeNet model, an ILSVRC12 image classifier. You can download it by running `./scripts/download_model_binary.py models/bvlc_reference_caffenet`. Note that this pre-trained model is licensed for academic research / non-commercial use only.\n",
       "\n",
       "Ready? Let's start."
      ]
@@ -41,8 +40,8 @@
       "\n",
       "# Set the right path to your model definition file, pretrained model weights,\n",
       "# and the image you would like to classify.\n",
-      "MODEL_FILE = 'imagenet/imagenet_deploy.prototxt'\n",
-      "PRETRAINED = 'imagenet/caffe_reference_imagenet_model'\n",
+      "MODEL_FILE = '../models/bvlc_reference_caffenet/deploy.prototxt'\n",
+      "PRETRAINED = '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'\n",
       "IMAGE_FILE = 'images/cat.jpg'"
      ],
      "language": "python",
@@ -404,4 +403,4 @@
    "metadata": {}
   }
  ]
-}
+}
diff --git a/examples/detection.ipynb b/examples/detection.ipynb
@@ -2,8 +2,7 @@
  "metadata": {
   "description": "Run a pretrained model as a detector in Python.",
   "example_name": "R-CNN detection",
-  "include_in_docs": true,
-  "signature": "sha256:8a744fbbb9ed80acab471247eaf50c27dcbd652105404df9feca599939f0c0ee"
+  "include_in_docs": true
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -26,7 +25,7 @@
       "\n",
       "- [Selective Search](http://koen.me/research/selectivesearch/) is the region proposer used by R-CNN. The [selective_search_ijcv_with_python](https://github.com/sergeyk/selective_search_ijcv_with_python) Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory `selective_search_ijcv_with_python`, run the demo in MATLAB to compile the necessary functions, then add it to your `PYTHONPATH` for importing. (If you have your own region proposals prepared, or would rather not bother with this step, [detect.py](https://github.com/BVLC/caffe/blob/master/python/detect.py) accepts a list of images and bounding boxes as CSV.)\n",
       "\n",
-      "- Follow the [model instructions](http://caffe.berkeleyvision.org/getting_pretrained_models.html) to get the Caffe R-CNN ImageNet model.\n",
+      "-Run `./scripts/download_model_binary.py models/bvlc_reference_caffenet` to get the Caffe R-CNN ImageNet model.\n",
       "\n",
       "With that done, we'll call the bundled `detect.py` to generate the region proposals and run the network. For an explanation of the arguments, do `./detect.py --help`."
      ]
@@ -37,7 +36,7 @@
      "input": [
       "!mkdir -p _temp\n",
       "!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt\n",
-      "!../python/detect.py --crop_mode=selective_search --pretrained_model=imagenet/caffe_rcnn_imagenet_model --model_def=imagenet/rcnn_imagenet_deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5"
+      "!../python/detect.py --crop_mode=selective_search --pretrained_model=models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5"
      ],
      "language": "python",
      "metadata": {},

diff --git a/examples/feature_extraction/imagenet_val.prototxt b/examples/feature_extraction/imagenet_val.prototxt
@@ -5,14 +5,14 @@ layers {
   top: "data"
   top: "label"
   image_data_param {
-    source: "$CAFFE_DIR/examples/_temp/file_list.txt"
+    source: "examples/_temp/file_list.txt"
     batch_size: 50
     new_height: 256
     new_width: 256
   }
   transform_param {
     crop_size: 227
-    mean_file: "$CAFFE_DIR/data/ilsvrc12/imagenet_mean.binaryproto"
+    mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
     mirror: false
   }
 }

diff --git a/examples/feature_extraction/readme.md b/examples/feature_extraction/readme.md
@@ -10,7 +10,9 @@ Extracting Features
 ===================
 
 In this tutorial, we will extract features using a pre-trained model with the included C++ utility.
-Follow instructions for [installing Caffe](../../installation.html) and for [downloading the reference model](../../getting_pretrained_models.html) for ImageNet.
+Note that we recommend using the Python interface for this task, as for example in the [filter visualization example](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb).
+
+Follow instructions for [installing Caffe](../../installation.html) and run `scripts/download_model_binary.py models/bvlc_reference_caffenet` from caffe root directory.
 If you need detailed information about the tools below, please consult their source code, in which additional documentation is usually provided.
 
 Select data to run on
@@ -35,7 +37,7 @@ Define the Feature Extraction Network Architecture
 In practice, subtracting the mean image from a dataset significantly improves classification accuracies.
 Download the mean image of the ILSVRC dataset.
 
-    data/ilsvrc12/get_ilsvrc_aux.sh
+    ./data/ilsvrc12/get_ilsvrc_aux.sh
 
 We will use `data/ilsvrc212/imagenet_mean.binaryproto` in the network definition prototxt.
 
@@ -44,14 +46,12 @@ We'll be using the `ImageDataLayer`, which will load and resize images for us.
 
     cp examples/feature_extraction/imagenet_val.prototxt examples/_temp
 
-Edit `examples/_temp/imagenet_val.prototxt` to use correct path for your setup (replace `$CAFFE_DIR`)
-
 Extract Features
 ----------------
 
 Now everything necessary is in place.
 
-    build/tools/extract_features.bin examples/imagenet/caffe_reference_imagenet_model examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10
+    ./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10
 
 The name of feature blob that you extract is `fc7`, which represents the highest level feature of the reference model.
 We can use any other layer, as well, such as `conv5` or `pool3`.

diff --git a/examples/filter_visualization.ipynb b/examples/filter_visualization.ipynb
@@ -2,8 +2,7 @@
  "metadata": {
   "description": "Extracting features and visualizing trained filters with an example image, viewed layer-by-layer.",
   "example_name": "Filter visualization",
-  "include_in_docs": true,
-  "signature": "sha256:b1b0457e2b10110aca847a718a3fe631ebcfce63a61cbc33653244f52b1ff4af"
+  "include_in_docs": true
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -54,15 +53,15 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "Follow the [instructions](http://caffe.berkeleyvision.org/getting_pretrained_models.html) for getting the pretrained models, load the net, specify test phase and CPU mode, and configure input preprocessing."
+      "Run `./scripts/download_model_binary.py models/bvlc_reference_caffenet` to get the pretrained CaffeNet model, load the net, specify test phase and CPU mode, and configure input preprocessing."
      ]
     },
     {
      "cell_type": "code",
      "collapsed": false,
      "input": [
-      "net = caffe.Classifier(caffe_root + 'examples/imagenet/imagenet_deploy.prototxt',\n",
-      "                       caffe_root + 'examples/imagenet/caffe_reference_imagenet_model')\n",
+      "net = caffe.Classifier(caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt',\n",
+      "                       caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')\n",
       "net.set_phase_test()\n",
       "net.set_mode_cpu()\n",
       "# input preprocessing: 'data' is the name of the input blob == net.inputs[0]\n",
@@ -598,4 +597,4 @@
    "metadata": {}
   }
  ]
-}
+}
diff --git a/examples/finetune_flickr_style/readme.md b/examples/finetune_flickr_style/readme.md
@@ -34,7 +34,7 @@ All steps are to be done from the caffe root directory.
 The dataset is distributed as a list of URLs with corresponding labels.
 Using a script, we will download a small subset of the data and split it into train and val sets.
 
-    caffe % ./examples/finetune_flickr_style/assemble_data.py -h
+    caffe % ./models/finetune_flickr_style/assemble_data.py -h
     usage: assemble_data.py [-h] [-s SEED] [-i IMAGES] [-w WORKERS]
 
     Download a subset of Flickr Style to a directory
@@ -48,25 +48,25 @@ Using a script, we will download a small subset of the data and split it into tr
                             num workers used to download images. -x uses (all - x)
                             cores.
 
-    caffe % python examples/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed 831486
+    caffe % python models/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed 831486
     Downloading 2000 images with 7 workers...
     Writing train/val for 1939 successfully downloaded images.
 
 This script downloads images and writes train/val file lists into `data/flickr_style`.
 With this random seed there are 1,557 train images and 382 test images.
 The prototxts in this example assume this, and also assume the presence of the ImageNet mean file (run `get_ilsvrc_aux.sh` from `data/ilsvrc12` to obtain this if you haven't yet).
 
-We'll also need the ImageNet-trained model, which you can obtain by running `get_caffe_reference_imagenet_model.sh` from `examples/imagenet`.
+We'll also need the ImageNet-trained model, which you can obtain by running `./scripts/download_model_binary.py models/bvlc_reference_caffenet`.
 
 Now we can train! (You can fine-tune in CPU mode by leaving out the `-gpu` flag.)
 
-    caffe % ./build/tools/caffe train -solver examples/finetune_flickr_style/flickr_style_solver.prototxt -weights examples/imagenet/caffe_reference_imagenet_model -gpu 0
+    caffe % ./build/tools/caffe train -solver models/finetune_flickr_style/flickr_style_solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel -gpu 0
 
     [...]
 
     I0828 22:10:04.025378  9718 solver.cpp:46] Solver scaffolding done.
     I0828 22:10:04.025388  9718 caffe.cpp:95] Use GPU with device ID 0
-    I0828 22:10:04.192004  9718 caffe.cpp:107] Finetuning from examples/imagenet/caffe_reference_imagenet_model
+    I0828 22:10:04.192004  9718 caffe.cpp:107] Finetuning from models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel
 
     [...]
 
@@ -149,10 +149,16 @@ This model is only beginning to learn.
 Fine-tuning can be feasible when training from scratch would not be for lack of time or data.
 Even in CPU mode each pass through the training set takes ~100 s. GPU fine-tuning is of course faster still and can learn a useful model in minutes or hours instead of days or weeks.
 Furthermore, note that the model has only trained on < 2,000 instances. Transfer learning a new task like style recognition from the ImageNet pretraining can require much less data than training from scratch.
+
 Now try fine-tuning to your own tasks and data!
 
+## Trained model
+
+We provide a model trained on all 80K images, with final accuracy of 98%.
+Simply do `./scripts/download_model_binary.py models/finetune_flickr_style` to obtain it.
+
 ## License
 
 The Flickr Style dataset as distributed here contains only URLs to images.
 Some of the images may have copyright.
-Training a category-recognition model for research/non-commercial use may constitute fair use of this data.
+Training a category-recognition model for research/non-commercial use may constitute fair use of this data, but the result should not be used for commercial purposes.
diff --git a/examples/imagenet/get_caffe_alexnet_model.sh b/examples/imagenet/get_caffe_alexnet_model.sh