-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Update standard networks for Caffe #733
Update standard networks for Caffe #733
Conversation
Still use Power layer for deploy phase
Give train/val data layers different names - Better visualizations Use stage instead of phase to differentiate - Consistent with other layers
I ran some experiments on the new networks to verify that the batch sizes changes didn't break anything. OS: Ubuntu 16.04 Runtime (most are slightly improved - 👍)
Memory utilization (nothing runs out of memory - 👍)
Full data: |
That looks good to me. I suppose I'll need to update batch sizes for Torch too. |
Hi Luke, You mention slight performance improvement for change #2. Is that due to power scaling in data layer being faster than scale layer? In any case, new users may find it a little confusing to see three power scaling operations rather than just one:
Also, what does the comment "# 1/(standard deviation)" actually mean? |
Yes, that's the reason. It's handled in the multi-threaded data loader.
The standard deviation for the MNIST dataset is ~80 per pixel (from a range of [0-255] per pixel). |
Thank for the feedback Luke. Seeing that training the MNIST dataset is not very computationally expensive, I would still suggest the simpler network definition with a single scale layer, much clearer for new users. Would it be helpful to change the comment |
I'm comfortable with those changes, yes. Would you like to make a PR for it? |
That's great, thank you. Just submitted #976 . |
Update standard networks for Caffe
This pull request makes 3 updates to the Caffe standard networks:
train
/val
Data
layers in each network, and use stage for include rules instead of phaseData
layer for input scaling duringtrain
/val
, but still use aPower
layer duringdeploy
.The first change is purely cosmetic. The second may have a slight but negligible improvement in performance. I made the third change because cuDNN typically prefers batch sizes that are even powers of two.