Skip to content

Commit 3fc226b

Browse files
committedNov 27, 2016
small fixes
1 parent 800faee commit 3fc226b

File tree

3 files changed

+40
-43
lines changed

3 files changed

+40
-43
lines changed
 

‎.gitignore

+8-8
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
1+
*~
2+
base.css
3+
charsdb.mat
4+
data/chars-*experiment*
5+
data/matconvnet*
16
data/practical-cnn-*
7+
data/vlfeat*
8+
doc/prism.css
9+
doc/prism.js
210
extra/fonts*
3-
data/chars-*experiment*
4-
charsdb.mat
511
googlefontdirectory
612
imagenet-vgg-verydeep-16.mat
713
sentence-lato.png
8-
data/vlfeat*
914
vlfeat/
10-
data/matconvnet*
11-
base.css
12-
doc/prism.css
13-
doc/prism.js
14-
*~

‎doc/instructions.html

+18-19
Original file line numberDiff line numberDiff line change
@@ -113,18 +113,18 @@ <h3 id="part1.1">Part 1.1: convolution</h3>
113113
<blockquote>
114114
<p><strong>Question.</strong> The third dimension of <code>x</code> is 3. Why?</p>
115115
</blockquote>
116-
<p>Now we will create a bank 10 of $5 \times 5 \times 3$ filters.</p>
116+
<p>Next, we create a bank of 10 filters of dimension $5 \times 5 \times 3$, initialising their coefficients randomly:</p>
117117
<pre><code class="language-matlab">% Create a bank of linear filters
118118
w = randn(5,5,3,10,'single') ;
119119
</code></pre>
120120

121-
<p>The filters are in single precision as well. Note that <code>w</code> has four dimensions, packing 10 filters. Note also that each filter is not flat, but rather a volume with three layers. The next step is applying the filter to the image. This uses the <code>vl_nnconv</code> function from MatConvNet:</p>
121+
<p>The filters are in single precision as well. Note that <code>w</code> has four dimensions, packing 10 filters. Note also that each filter is not flat, but rather a volume containing three slices. The next step is applying the filter to the image. This uses the <code>vl_nnconv</code> function from MatConvNet:</p>
122122
<pre><code class="language-matlab">% Apply the convolution operator
123123
y = vl_nnconv(x, w, []) ;
124124
</code></pre>
125125

126126
<p><strong>Remark:</strong> You might have noticed that the third argument to the <code>vl_nnconv</code> function is the empty matrix <code>[]</code>. It can be otherwise used to pass a vector of bias terms to add to the output of each filter.</p>
127-
<p>The variable <code>y</code> contains the output of the convolution. Note that the filters are three-dimensional, in the sense that it operates on a map $\bx$ with $K$ channels. Furthermore, there are $K'$ such filters, generating a $K'$ dimensional map $\by$ as follows
127+
<p>The variable <code>y</code> contains the output of the convolution. Note that the filters are three-dimensional. This is because they operate on a tensor $\bx$ with $K$ channels. Furthermore, there are $K'$ such filters, generating a $K'$ dimensional map $\by$ as follows:
128128
<script type="math/tex; mode=display">
129129
y_{i'j'k'} = \sum_{ijk} w_{ijkk'} x_{i+i',j+j',k}
130130
</script>
@@ -287,7 +287,7 @@ <h3 id="part-21-the-theory-of-back-propagation">Part 2.1: the theory of back-pro
287287
\bx_L
288288
</script>
289289
During learning, the last layer of the network is the <em>loss function</em> that should be minimized. Hence, the output $\bx_L = x_L$ of the network is a <strong>scalar</strong> quantity (a single number).</p>
290-
<p>The gradient is easily computed using using the <strong>chain rule</strong>. If <em>all</em> network variables and parameters are scalar, this is given by[^derivative]:
290+
<p>The gradient is easily computed using using the <strong>chain rule</strong>. If <em>all</em> network variables and parameters are scalar, this is given by:
291291
<script type="math/tex; mode=display">
292292
\frac{\partial f}{\partial w_l}(x_0;w_1,\dots,w_L)
293293
=
@@ -302,7 +302,7 @@ <h3 id="part-21-the-theory-of-back-propagation">Part 2.1: the theory of back-pro
302302
<blockquote>
303303
<p><strong>Question:</strong> The output derivatives have the same size as the parameters in the network. Why?</p>
304304
</blockquote>
305-
<p><strong>Back-propagation</strong> allows computing the output derivatives in a memory-efficient manner. To see how, the first step is to generalize the equation above to tensors using a matrix notation. This is done by converting tensors into vectors by using the $\vv$ (stacking)[^stacking] operator:
305+
<p><strong>Back-propagation</strong> allows computing the output derivatives in a memory-efficient manner. To see how, the first step is to generalize the equation above to tensors using a matrix notation. This is done by converting tensors into vectors by using the $\vv$ (stacking)<sup id="fnref:stacking"><a class="footnote-ref" href="#fn:stacking" rel="footnote">2</a></sup> operator:
306306
<script type="math/tex; mode=display">
307307
\frac{\partial \vv f}{\partial \vv^\top \bw_l}
308308
=
@@ -532,7 +532,7 @@ <h3 id="part-33-learning-with-gradient-descent">Part 3.3: learning with gradient
532532
<li>Note that the objective enforces a <em>margin</em> between the scores of the positive and negative pixels. How much is this margin?</li>
533533
</ul>
534534
</blockquote>
535-
<p>We can now train the CNN by minimising the objective function with respect to $\bw$ and $b$. We do so by using an algorithm called <em>gradient descent with momentum</em>. Given the current solution $(\bw_t,b_t)$ and update it , this is updated to $(\bw_{t+1},b_t)$ by following the direction of fastest descent as given by the negative gradient $-\nabla E(\bw_t,b_t)$ of the objective. However, gradient updates are smoothed by considering a <em>momentum</em> term $(\bar\bw_{t}, \bar\mu_t)$, yielding the update equations
535+
<p>We can now train the CNN by minimising the objective function with respect to $\bw$ and $b$. We do so by using an algorithm called <em>gradient descent with momentum</em>. Given the current solution $(\bw_t,b_t)$, this is updated to $(\bw_{t+1},b_{t+1})$ by following the direction of fastest descent of the objective $E(\bw_t,b_t)$ as given by the negative gradient $-\nabla E$. However, gradient updates are smoothed by considering a <em>momentum</em> term $(\bar\bw_{t}, \bar\mu_t)$, yielding the update equations
536536
<script type="math/tex; mode=display">
537537
\bar\bw_{t+1} \leftarrow \mu \bar\bw_t + \eta \frac{\partial E}{\partial \bw_t},
538538
\qquad
@@ -558,14 +558,18 @@ <h3 id="part-33-learning-with-gradient-descent">Part 3.3: learning with gradient
558558
<p><strong>Tasks:</strong></p>
559559
<ul>
560560
<li>Inspect the code in the file <code>exercise3.m</code>. Convince yourself that the code is implementing the algorithm described above. Pay particular attention at the forward and backward passes as well as at how the objective function and its derivatives are computed.</li>
561-
<li>Run the algorithm and observe the results. Then answer the following questions:</li>
561+
<li>Run the algorithm and observe the results. Then answer the following questions:<ul>
562562
<li>The learned filter should resemble the discretisation of a well-known differential operator. Which one? </li>
563563
<li>What is the average of the filter values compared to the average of the absolute values?</li>
564-
<li>Run the algorithm again and observe the evolution of the histograms of the score of the positive and negative pixels in relation to the values 0 and 1. Answer the following:</li>
564+
</ul>
565+
</li>
566+
<li>Run the algorithm again and observe the evolution of the histograms of the score of the positive and negative pixels in relation to the values 0 and 1. Answer the following:<ul>
565567
<li>Is the objective function minimised monotonically?</li>
566568
<li>As the histograms evolve, can you identify at least two "phases" in the optimisation?</li>
567569
<li>Once converged, do the score distribute in the manner that you would expect?</li>
568570
</ul>
571+
</li>
572+
</ul>
569573
<p><strong>Hint:</strong> the <code>plotPeriod</code> option can be changed to plot the diagnostic figure with a higher or lower frequency; this can significantly affect the speed of the algorithm.</p>
570574
</blockquote>
571575
<h3 id="part-34-experimenting-with-the-tiny-cnn">Part 3.4: experimenting with the tiny CNN</h3>
@@ -766,17 +770,8 @@ <h3 id="part-47-training-using-the-gpu">Part 4.7: Training using the GPU</h3>
766770
<p>In MatConvNet this is almost trivial as it builds on the easy-to-use GPU support in MATLAB. You can follow this list of steps to try it out:</p>
767771
<ol>
768772
<li>Clear the models generated and cached in the previous steps. To do this, rename or delete the directories <code>data/characters-experiment</code> and <code>data/characters-jit-experiment</code>.</li>
769-
<li>
770-
<p>Make sure that MatConvNet is compiled with GPU support. To do this, use</p>
771-
<p>```matlab</p>
772-
<blockquote>
773-
<p>setup('useGpu', true) ;
774-
```</p>
775-
</blockquote>
776-
</li>
777-
<li>
778-
<p>Try again training the model of <code>exercise4.m</code> switching to <code>true</code> the <code>useGpu</code> flag.</p>
779-
</li>
773+
<li>Make sure that MatConvNet is compiled with GPU support. To do this, use <code>setup('useGpu', true)</code>.</li>
774+
<li>Try again training the model of <code>exercise4.m</code> switching to <code>true</code> the <code>useGpu</code> flag.</li>
780775
</ol>
781776
<blockquote>
782777
<p><strong>Task:</strong> Follow the steps above and note the speed of training. How many images per second can you process now?</p>
@@ -837,6 +832,7 @@ <h2 id="acknowledgements">Acknowledgements</h2>
837832
</ul>
838833
<h2 id="history">History</h2>
839834
<ul>
835+
<li>Used in the Oxford AIMS CDT, 2016-17.</li>
840836
<li>Used in the Oxford AIMS CDT, 2015-16.</li>
841837
<li>Used in the Oxford AIMS CDT, 2014-15.</li>
842838
</ul>
@@ -846,6 +842,9 @@ <h2 id="history">History</h2>
846842
<li id="fn:lattice">
847843
<p>A two-dimensional <em>lattice</em> is a discrete grid embedded in $R^2$, similar for example to a checkerboard.&#160;<a class="footnote-backref" href="#fnref:lattice" rev="footnote" title="Jump back to footnote 1 in the text">&#8617;</a></p>
848844
</li>
845+
<li id="fn:stacking">
846+
<p>The stacking of a tensor $\bx \in\mathbb{R}^{H\times W\times C}$ is the vector <script type="math/tex; mode=display"> \vv \bx= \begin{bmatrix} x_{111}\\ x_{211} \\ \vdots \\ x_{H11} \\ x_{121} \\\vdots \\ x_{HWC} \end{bmatrix}.</script>&#160;<a class="footnote-backref" href="#fnref:stacking" rev="footnote" title="Jump back to footnote 2 in the text">&#8617;</a></p>
847+
</li>
849848
</ol>
850849
</div><script type="text/x-mathjax-config">
851850
MathJax.Hub.Config({

‎doc/instructions.md

+14-16
Original file line numberDiff line numberDiff line change
@@ -77,14 +77,14 @@ Use MATLAB `size` command to obtain the size of the array `x`. Note that the arr
7777

7878
> **Question.** The third dimension of `x` is 3. Why?
7979
80-
Now we will create a bank 10 of $5 \times 5 \times 3$ filters.
80+
Next, we create a bank of 10 filters of dimension $5 \times 5 \times 3$, initialising their coefficients randomly:
8181

8282
```matlab
8383
% Create a bank of linear filters
8484
w = randn(5,5,3,10,'single') ;
8585
```
8686

87-
The filters are in single precision as well. Note that `w` has four dimensions, packing 10 filters. Note also that each filter is not flat, but rather a volume with three layers. The next step is applying the filter to the image. This uses the `vl_nnconv` function from MatConvNet:
87+
The filters are in single precision as well. Note that `w` has four dimensions, packing 10 filters. Note also that each filter is not flat, but rather a volume containing three slices. The next step is applying the filter to the image. This uses the `vl_nnconv` function from MatConvNet:
8888

8989
```matlab
9090
% Apply the convolution operator
@@ -93,7 +93,7 @@ y = vl_nnconv(x, w, []) ;
9393

9494
**Remark:** You might have noticed that the third argument to the `vl_nnconv` function is the empty matrix `[]`. It can be otherwise used to pass a vector of bias terms to add to the output of each filter.
9595

96-
The variable `y` contains the output of the convolution. Note that the filters are three-dimensional, in the sense that it operates on a map $\bx$ with $K$ channels. Furthermore, there are $K'$ such filters, generating a $K'$ dimensional map $\by$ as follows
96+
The variable `y` contains the output of the convolution. Note that the filters are three-dimensional. This is because they operate on a tensor $\bx$ with $K$ channels. Furthermore, there are $K'$ such filters, generating a $K'$ dimensional map $\by$ as follows:
9797
$$
9898
y_{i'j'k'} = \sum_{ijk} w_{ijkk'} x_{i+i',j+j',k}
9999
$$
@@ -270,7 +270,7 @@ $$
270270
$$
271271
During learning, the last layer of the network is the *loss function* that should be minimized. Hence, the output $\bx_L = x_L$ of the network is a **scalar** quantity (a single number).
272272

273-
The gradient is easily computed using using the **chain rule**. If *all* network variables and parameters are scalar, this is given by[^derivative]:
273+
The gradient is easily computed using using the **chain rule**. If *all* network variables and parameters are scalar, this is given by:
274274
$$
275275
\frac{\partial f}{\partial w_l}(x_0;w_1,\dots,w_L)
276276
=
@@ -533,7 +533,7 @@ $$
533533
> - What can you say about the score of each pixel if $\lambda=0$ and $E(\bw,b) =0$?
534534
> - Note that the objective enforces a *margin* between the scores of the positive and negative pixels. How much is this margin?
535535
536-
We can now train the CNN by minimising the objective function with respect to $\bw$ and $b$. We do so by using an algorithm called *gradient descent with momentum*. Given the current solution $(\bw_t,b_t)$ and update it , this is updated to $(\bw_{t+1},b_t)$ by following the direction of fastest descent as given by the negative gradient $-\nabla E(\bw_t,b_t)$ of the objective. However, gradient updates are smoothed by considering a *momentum* term $(\bar\bw_{t}, \bar\mu_t)$, yielding the update equations
536+
We can now train the CNN by minimising the objective function with respect to $\bw$ and $b$. We do so by using an algorithm called *gradient descent with momentum*. Given the current solution $(\bw_t,b_t)$, this is updated to $(\bw_{t+1},b_{t+1})$ by following the direction of fastest descent of the objective $E(\bw_t,b_t)$ as given by the negative gradient $-\nabla E$. However, gradient updates are smoothed by considering a *momentum* term $(\bar\bw_{t}, \bar\mu_t)$, yielding the update equations
537537
$$
538538
\bar\bw_{t+1} \leftarrow \mu \bar\bw_t + \eta \frac{\partial E}{\partial \bw_t},
539539
\qquad
@@ -560,12 +560,12 @@ plotPeriod = 10 ;
560560
>
561561
> - Inspect the code in the file `exercise3.m`. Convince yourself that the code is implementing the algorithm described above. Pay particular attention at the forward and backward passes as well as at how the objective function and its derivatives are computed.
562562
> - Run the algorithm and observe the results. Then answer the following questions:
563-
> * The learned filter should resemble the discretisation of a well-known differential operator. Which one?
564-
> * What is the average of the filter values compared to the average of the absolute values?
563+
> * The learned filter should resemble the discretisation of a well-known differential operator. Which one?
564+
> * What is the average of the filter values compared to the average of the absolute values?
565565
> - Run the algorithm again and observe the evolution of the histograms of the score of the positive and negative pixels in relation to the values 0 and 1. Answer the following:
566-
> * Is the objective function minimised monotonically?
567-
> * As the histograms evolve, can you identify at least two "phases" in the optimisation?
568-
> * Once converged, do the score distribute in the manner that you would expect?
566+
> * Is the objective function minimised monotonically?
567+
> * As the histograms evolve, can you identify at least two "phases" in the optimisation?
568+
> * Once converged, do the score distribute in the manner that you would expect?
569569
>
570570
> **Hint:** the `plotPeriod` option can be changed to plot the diagnostic figure with a higher or lower frequency; this can significantly affect the speed of the algorithm.
571571
@@ -794,12 +794,7 @@ A key challenge in deep learning is the sheer amount of computation required to
794794
In MatConvNet this is almost trivial as it builds on the easy-to-use GPU support in MATLAB. You can follow this list of steps to try it out:
795795

796796
1. Clear the models generated and cached in the previous steps. To do this, rename or delete the directories `data/characters-experiment` and `data/characters-jit-experiment`.
797-
2. Make sure that MatConvNet is compiled with GPU support. To do this, use
798-
799-
```matlab
800-
> setup('useGpu', true) ;
801-
```
802-
797+
2. Make sure that MatConvNet is compiled with GPU support. To do this, use `setup('useGpu', true)`.
803798
3. Try again training the model of `exercise4.m` switching to `true` the `useGpu` flag.
804799

805800
> **Task:** Follow the steps above and note the speed of training. How many images per second can you process now?
@@ -875,7 +870,10 @@ That completes this practical.
875870

876871
## History
877872

873+
* Used in the Oxford AIMS CDT, 2016-17.
878874
* Used in the Oxford AIMS CDT, 2015-16.
879875
* Used in the Oxford AIMS CDT, 2014-15.
880876

881877
[^lattice]: A two-dimensional *lattice* is a discrete grid embedded in $R^2$, similar for example to a checkerboard.
878+
879+
[^stacking]: The stacking of a tensor $\bx \in\mathbb{R}^{H\times W\times C}$ is the vector $$ \vv \bx= \begin{bmatrix} x_{111}\\ x_{211} \\ \vdots \\ x_{H11} \\ x_{121} \\\vdots \\ x_{HWC} \end{bmatrix}.$$

0 commit comments

Comments
 (0)
Please sign in to comment.