Skip to content

Commit

Permalink
Merge pull request BVLC#917 from sergeyk/model_zoo
Browse files Browse the repository at this point in the history
Define standard format for Caffe models to open the "model zoo"
  • Loading branch information
shelhamer committed Sep 4, 2014
2 parents 116655e + 8c39996 commit a2bcc7c
Show file tree
Hide file tree
Showing 42 changed files with 385 additions and 278 deletions.
3 changes: 1 addition & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,7 @@ Makefile.config
# 1. reference, and not casually committed
# 2. custom, and live on their own unless they're deliberated contributed
data/*
*model
*_iter_*
*.caffemodel
*.solverstate
*.binaryproto
*leveldb
Expand Down
34 changes: 0 additions & 34 deletions docs/getting_pretrained_models.md

This file was deleted.

4 changes: 2 additions & 2 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ Slides about the Caffe architecture, *updated 03/14*.
A 4-page report for the ACM Multimedia Open Source competition.
- [Installation instructions](/installation.html)<br />
Tested on Ubuntu, Red Hat, OS X.
* [Pre-trained models](/getting_pretrained_models.html)<br />
BVLC provides ready-to-use models for non-commercial use.
* [Model Zoo](/model_zoo.html)<br />
BVLC suggests a standard distribution format for Caffe models, and provides trained models.
* [Developing & Contributing](/development.html)<br />
Guidelines for development and contributing to Caffe.
* [API Documentation](/doxygen/)<br />
Expand Down
53 changes: 53 additions & 0 deletions docs/model_zoo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
---
# Caffe Model Zoo

Lots of people have used Caffe to train models of different architectures and applied to different problems, ranging from simple regression to AlexNet-alikes to Siamese networks for image similarity to speech applications.
To lower the friction of sharing these models, we introduce the model zoo framework:

- A standard format for packaging Caffe model info.
- Tools to upload/download model info to/from Github Gists, and to download trained `.caffemodel` binaries.
- A central wiki page for sharing model info Gists.

## Where to get trained models

First of all, we provide some trained models out of the box.
Each one of these can be downloaded by running `scripts/download_model_binary.py <dirname>` where `<dirname>` is specified below:

- **BVLC Reference CaffeNet** in `models/bvlc_reference_caffenet`: AlexNet trained on ILSVRC 2012, with a minor variation from the version as described in the NIPS 2012 paper.
- **BVLC AlexNet** in `models/bvlc_alexnet`: AlexNet trained on ILSVRC 2012, almost exactly as described in NIPS 2012.
- **BVLC Reference R-CNN ILSVRC-2013** in `models/bvlc_reference_rcnn_ilsvrc13`: pure Caffe implementation of [R-CNN](https://github.com/rbgirshick/rcnn).

User-provided models are posted to a public-editable [wiki page](https://github.com/BVLC/caffe/wiki/Model-Zoo).

## Model info format

A caffe model is distributed as a directory containing:

- Solver/model prototxt(s)
- `readme.md` containing
- YAML frontmatter
- Caffe version used to train this model (tagged release or commit hash).
- [optional] file URL and SHA1 of the trained `.caffemodel`.
- [optional] github gist id.
- Information about what data the model was trained on, modeling choices, etc.
- License information.
- [optional] Other helpful scripts.

## Hosting model info

Github Gist is a good format for model info distribution because it can contain multiple files, is versionable, and has in-browser syntax highlighting and markdown rendering.

- `scripts/upload_model_to_gist.sh <dirname>`: uploads non-binary files in the model directory as a Github Gist and prints the Gist ID. If `gist_id` is already part of the `<dirname>/readme.md` frontmatter, then updates existing Gist.

Try doing `scripts/upload_model_to_gist.sh models/bvlc_alexnet` to test the uploading (don't forget to delete the uploaded gist afterward).

Downloading models is not yet supported as a script (there is no good commandline tool for this right now), so simply go to the Gist URL and click "Download Gist" for now.

### Hosting trained models

It is up to the user where to host the `.caffemodel` file.
We host our BVLC-provided models on our own server.
Dropbox also works fine (tip: make sure that `?dl=1` is appended to the end of the URL).

- `scripts/download_model_binary.py <dirname>`: downloads the `.caffemodel` from the URL specified in the `<dirname>/readme.md` frontmatter and confirms SHA1.
11 changes: 5 additions & 6 deletions examples/classification.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
"metadata": {
"description": "Use the pre-trained ImageNet model to classify images with the Python interface.",
"example_name": "ImageNet classification",
"include_in_docs": true,
"signature": "sha256:4f8d4c079c30d20ef4b6818e9672b1741fd1377354e5b83e291710736cecd24f"
"include_in_docs": true
},
"nbformat": 3,
"nbformat_minor": 0,
Expand All @@ -19,7 +18,7 @@
"\n",
"Caffe provides a general Python interface for models with `caffe.Net` in `python/caffe/pycaffe.py`, but to make off-the-shelf classification easy we provide a `caffe.Classifier` class and `classify.py` script. Both Python and MATLAB wrappers are provided. However, the Python wrapper has more features so we will describe it here. For MATLAB, refer to `matlab/caffe/matcaffe_demo.m`.\n",
"\n",
"Before we begin, you must compile Caffe and install the python wrapper by setting your `PYTHONPATH`. If you haven't yet done so, please refer to the [installation instructions](installation.html). This example uses our pre-trained ImageNet model, an ILSVRC12 image classifier. You can download it (232.57MB) by running `examples/imagenet/get_caffe_reference_imagenet_model.sh`. Note that this pre-trained model is licensed for academic research / non-commercial use only.\n",
"Before we begin, you must compile Caffe and install the python wrapper by setting your `PYTHONPATH`. If you haven't yet done so, please refer to the [installation instructions](installation.html). This example uses our pre-trained CaffeNet model, an ILSVRC12 image classifier. You can download it by running `./scripts/download_model_binary.py models/bvlc_reference_caffenet`. Note that this pre-trained model is licensed for academic research / non-commercial use only.\n",
"\n",
"Ready? Let's start."
]
Expand All @@ -41,8 +40,8 @@
"\n",
"# Set the right path to your model definition file, pretrained model weights,\n",
"# and the image you would like to classify.\n",
"MODEL_FILE = 'imagenet/imagenet_deploy.prototxt'\n",
"PRETRAINED = 'imagenet/caffe_reference_imagenet_model'\n",
"MODEL_FILE = '../models/bvlc_reference_caffenet/deploy.prototxt'\n",
"PRETRAINED = '../models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel'\n",
"IMAGE_FILE = 'images/cat.jpg'"
],
"language": "python",
Expand Down Expand Up @@ -404,4 +403,4 @@
"metadata": {}
}
]
}
}
7 changes: 3 additions & 4 deletions examples/detection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
"metadata": {
"description": "Run a pretrained model as a detector in Python.",
"example_name": "R-CNN detection",
"include_in_docs": true,
"signature": "sha256:8a744fbbb9ed80acab471247eaf50c27dcbd652105404df9feca599939f0c0ee"
"include_in_docs": true
},
"nbformat": 3,
"nbformat_minor": 0,
Expand All @@ -26,7 +25,7 @@
"\n",
"- [Selective Search](http://koen.me/research/selectivesearch/) is the region proposer used by R-CNN. The [selective_search_ijcv_with_python](https://github.com/sergeyk/selective_search_ijcv_with_python) Python module takes care of extracting proposals through the selective search MATLAB implementation. To install it, download the module and name its directory `selective_search_ijcv_with_python`, run the demo in MATLAB to compile the necessary functions, then add it to your `PYTHONPATH` for importing. (If you have your own region proposals prepared, or would rather not bother with this step, [detect.py](https://github.com/BVLC/caffe/blob/master/python/detect.py) accepts a list of images and bounding boxes as CSV.)\n",
"\n",
"- Follow the [model instructions](http://caffe.berkeleyvision.org/getting_pretrained_models.html) to get the Caffe R-CNN ImageNet model.\n",
"-Run `./scripts/download_model_binary.py models/bvlc_reference_caffenet` to get the Caffe R-CNN ImageNet model.\n",
"\n",
"With that done, we'll call the bundled `detect.py` to generate the region proposals and run the network. For an explanation of the arguments, do `./detect.py --help`."
]
Expand All @@ -37,7 +36,7 @@
"input": [
"!mkdir -p _temp\n",
"!echo `pwd`/images/fish-bike.jpg > _temp/det_input.txt\n",
"!../python/detect.py --crop_mode=selective_search --pretrained_model=imagenet/caffe_rcnn_imagenet_model --model_def=imagenet/rcnn_imagenet_deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5"
"!../python/detect.py --crop_mode=selective_search --pretrained_model=models/bvlc_reference_rcnn_ilsvrc13/bvlc_reference_rcnn_ilsvrc13.caffemodel --model_def=models/bvlc_reference_rcnn_ilsvrc13/deploy.prototxt --gpu --raw_scale=255 _temp/det_input.txt _temp/det_output.h5"
],
"language": "python",
"metadata": {},
Expand Down
4 changes: 2 additions & 2 deletions examples/feature_extraction/imagenet_val.prototxt
Original file line number Diff line number Diff line change
Expand Up @@ -5,14 +5,14 @@ layers {
top: "data"
top: "label"
image_data_param {
source: "$CAFFE_DIR/examples/_temp/file_list.txt"
source: "examples/_temp/file_list.txt"
batch_size: 50
new_height: 256
new_width: 256
}
transform_param {
crop_size: 227
mean_file: "$CAFFE_DIR/data/ilsvrc12/imagenet_mean.binaryproto"
mean_file: "data/ilsvrc12/imagenet_mean.binaryproto"
mirror: false
}
}
Expand Down
10 changes: 5 additions & 5 deletions examples/feature_extraction/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,9 @@ Extracting Features
===================

In this tutorial, we will extract features using a pre-trained model with the included C++ utility.
Follow instructions for [installing Caffe](../../installation.html) and for [downloading the reference model](../../getting_pretrained_models.html) for ImageNet.
Note that we recommend using the Python interface for this task, as for example in the [filter visualization example](http://nbviewer.ipython.org/github/BVLC/caffe/blob/master/examples/filter_visualization.ipynb).

Follow instructions for [installing Caffe](../../installation.html) and run `scripts/download_model_binary.py models/bvlc_reference_caffenet` from caffe root directory.
If you need detailed information about the tools below, please consult their source code, in which additional documentation is usually provided.

Select data to run on
Expand All @@ -35,7 +37,7 @@ Define the Feature Extraction Network Architecture
In practice, subtracting the mean image from a dataset significantly improves classification accuracies.
Download the mean image of the ILSVRC dataset.

data/ilsvrc12/get_ilsvrc_aux.sh
./data/ilsvrc12/get_ilsvrc_aux.sh

We will use `data/ilsvrc212/imagenet_mean.binaryproto` in the network definition prototxt.

Expand All @@ -44,14 +46,12 @@ We'll be using the `ImageDataLayer`, which will load and resize images for us.

cp examples/feature_extraction/imagenet_val.prototxt examples/_temp

Edit `examples/_temp/imagenet_val.prototxt` to use correct path for your setup (replace `$CAFFE_DIR`)

Extract Features
----------------

Now everything necessary is in place.

build/tools/extract_features.bin examples/imagenet/caffe_reference_imagenet_model examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10
./build/tools/extract_features.bin models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel examples/_temp/imagenet_val.prototxt fc7 examples/_temp/features 10

The name of feature blob that you extract is `fc7`, which represents the highest level feature of the reference model.
We can use any other layer, as well, such as `conv5` or `pool3`.
Expand Down
11 changes: 5 additions & 6 deletions examples/filter_visualization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,7 @@
"metadata": {
"description": "Extracting features and visualizing trained filters with an example image, viewed layer-by-layer.",
"example_name": "Filter visualization",
"include_in_docs": true,
"signature": "sha256:b1b0457e2b10110aca847a718a3fe631ebcfce63a61cbc33653244f52b1ff4af"
"include_in_docs": true
},
"nbformat": 3,
"nbformat_minor": 0,
Expand Down Expand Up @@ -54,15 +53,15 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Follow the [instructions](http://caffe.berkeleyvision.org/getting_pretrained_models.html) for getting the pretrained models, load the net, specify test phase and CPU mode, and configure input preprocessing."
"Run `./scripts/download_model_binary.py models/bvlc_reference_caffenet` to get the pretrained CaffeNet model, load the net, specify test phase and CPU mode, and configure input preprocessing."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"net = caffe.Classifier(caffe_root + 'examples/imagenet/imagenet_deploy.prototxt',\n",
" caffe_root + 'examples/imagenet/caffe_reference_imagenet_model')\n",
"net = caffe.Classifier(caffe_root + 'models/bvlc_reference_caffenet/deploy.prototxt',\n",
" caffe_root + 'models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel')\n",
"net.set_phase_test()\n",
"net.set_mode_cpu()\n",
"# input preprocessing: 'data' is the name of the input blob == net.inputs[0]\n",
Expand Down Expand Up @@ -598,4 +597,4 @@
"metadata": {}
}
]
}
}
18 changes: 12 additions & 6 deletions examples/finetune_flickr_style/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ All steps are to be done from the caffe root directory.
The dataset is distributed as a list of URLs with corresponding labels.
Using a script, we will download a small subset of the data and split it into train and val sets.

caffe % ./examples/finetune_flickr_style/assemble_data.py -h
caffe % ./models/finetune_flickr_style/assemble_data.py -h
usage: assemble_data.py [-h] [-s SEED] [-i IMAGES] [-w WORKERS]

Download a subset of Flickr Style to a directory
Expand All @@ -48,25 +48,25 @@ Using a script, we will download a small subset of the data and split it into tr
num workers used to download images. -x uses (all - x)
cores.

caffe % python examples/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed 831486
caffe % python models/finetune_flickr_style/assemble_data.py --workers=-1 --images=2000 --seed 831486
Downloading 2000 images with 7 workers...
Writing train/val for 1939 successfully downloaded images.

This script downloads images and writes train/val file lists into `data/flickr_style`.
With this random seed there are 1,557 train images and 382 test images.
The prototxts in this example assume this, and also assume the presence of the ImageNet mean file (run `get_ilsvrc_aux.sh` from `data/ilsvrc12` to obtain this if you haven't yet).

We'll also need the ImageNet-trained model, which you can obtain by running `get_caffe_reference_imagenet_model.sh` from `examples/imagenet`.
We'll also need the ImageNet-trained model, which you can obtain by running `./scripts/download_model_binary.py models/bvlc_reference_caffenet`.

Now we can train! (You can fine-tune in CPU mode by leaving out the `-gpu` flag.)

caffe % ./build/tools/caffe train -solver examples/finetune_flickr_style/flickr_style_solver.prototxt -weights examples/imagenet/caffe_reference_imagenet_model -gpu 0
caffe % ./build/tools/caffe train -solver models/finetune_flickr_style/flickr_style_solver.prototxt -weights models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel -gpu 0

[...]

I0828 22:10:04.025378 9718 solver.cpp:46] Solver scaffolding done.
I0828 22:10:04.025388 9718 caffe.cpp:95] Use GPU with device ID 0
I0828 22:10:04.192004 9718 caffe.cpp:107] Finetuning from examples/imagenet/caffe_reference_imagenet_model
I0828 22:10:04.192004 9718 caffe.cpp:107] Finetuning from models/bvlc_reference_caffenet/bvlc_reference_caffenet.caffemodel

[...]

Expand Down Expand Up @@ -149,10 +149,16 @@ This model is only beginning to learn.
Fine-tuning can be feasible when training from scratch would not be for lack of time or data.
Even in CPU mode each pass through the training set takes ~100 s. GPU fine-tuning is of course faster still and can learn a useful model in minutes or hours instead of days or weeks.
Furthermore, note that the model has only trained on < 2,000 instances. Transfer learning a new task like style recognition from the ImageNet pretraining can require much less data than training from scratch.

Now try fine-tuning to your own tasks and data!

## Trained model

We provide a model trained on all 80K images, with final accuracy of 98%.
Simply do `./scripts/download_model_binary.py models/finetune_flickr_style` to obtain it.

## License

The Flickr Style dataset as distributed here contains only URLs to images.
Some of the images may have copyright.
Training a category-recognition model for research/non-commercial use may constitute fair use of this data.
Training a category-recognition model for research/non-commercial use may constitute fair use of this data, but the result should not be used for commercial purposes.
28 changes: 0 additions & 28 deletions examples/imagenet/get_caffe_alexnet_model.sh

This file was deleted.

Loading

0 comments on commit a2bcc7c

Please sign in to comment.