Skip to content

Commit

Permalink
add nanotorch logo
Browse files Browse the repository at this point in the history
  • Loading branch information
JamorMoussa committed Jul 8, 2024
1 parent d3c50b9 commit 624026f
Show file tree
Hide file tree
Showing 3 changed files with 335 additions and 8 deletions.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
<div>
<center>
<img src="./docs/images/logo.png" width="300px">
</center>

</div>


# NanoTorch

**NanoTorch** is a deep learning library (micro-framework) inspired by the PyTorch framework, which
Expand Down
335 changes: 327 additions & 8 deletions docs/docs/nanotorch-linear-layer.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -301,26 +301,26 @@
"metadata": {},
"source": [
"$$\n",
"\\begin{pmatrix}\n",
"\\begin{bmatrix}\n",
"y_1 \\\\\n",
"y_2 \\\\\n",
"\\vdots \\\\\n",
"y_m\n",
"\\end{pmatrix}\n",
"\\end{bmatrix}\n",
"=\n",
"\\begin{pmatrix}\n",
"\\begin{bmatrix}\n",
"w_{10} & w_{11} & w_{12} & \\dots & w_{1n} \\\\\n",
"w_{20} & w_{21} & w_{22} & \\dots & w_{2n} \\\\\n",
"\\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
"w_{m0} & w_{m1} & w_{m2} & \\dots & w_{mn}\n",
"\\end{pmatrix}\n",
"\\begin{pmatrix}\n",
"\\end{bmatrix}\n",
"\\begin{bmatrix}\n",
"1 \\\\\n",
"x_1 \\\\\n",
"x_2 \\\\\n",
"\\vdots \\\\\n",
"x_n\n",
"\\end{pmatrix}\n",
"\\end{bmatrix}\n",
"$$"
]
},
Expand All @@ -340,12 +340,12 @@
"Where\n",
"\n",
"$$\n",
"W = \\begin{pmatrix}\n",
"W = \\begin{bmatrix}\n",
"w_{10} & w_{11} & w_{12} & \\dots & w_{1n} \\\\\n",
"w_{20} & w_{21} & w_{22} & \\dots & w_{2n} \\\\\n",
"\\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
"w_{m0} & w_{m1} & w_{m2} & \\dots & w_{mn}\n",
"\\end{pmatrix} \\in \\mathbb{R}^{m \\times (n+1)}\n",
"\\end{bmatrix} \\in \\mathbb{R}^{m \\times (n+1)}\n",
"$$\n",
"\n",
"Here, $\\mathbf{x} \\in \\mathbb{R}^{(n+1)}$ and $\\mathbf{y} \\in \\mathbb{R}^{m}$ denote the input and output vectors of the fully connected layer, respectively."
Expand Down Expand Up @@ -418,6 +418,325 @@
"\n",
"The following figure shows that the fully connected layer receives the gradient flows from the subsequent layer, denoted as $\\frac{\\partial L}{\\partial \\mathbf{y}}$. This quantity is used to compute the gradient of the loss with respect to the current layer's parameters $\\frac{\\partial L}{\\partial W}$. Then, it passes the gradient with respect to the input to the previous layers $\\frac{\\partial L}{\\partial \\mathbf{x}}$, following the chain rule in backpropagation."
]
},
{
"cell_type": "markdown",
"id": "f72511b7-8639-40e8-892a-6f7128e9a0ac",
"metadata": {},
"source": [
"<figure markdown=\"span\">\n",
" <center>\n",
" <img src=\"https://raw.githubusercontent.com/JamorMoussa/NanoTorch/dev/docs/images/docs/linear/linear-back-propagation.png\" width=\"300\" />\n",
" </center>\n",
"</figure>"
]
},
{
"cell_type": "markdown",
"id": "9c0f345b-75a9-401d-a44d-afeff9063409",
"metadata": {},
"source": [
"For instance, let's break down each derivative.\n",
"\n",
"The loss function is a scalar value, i.e., $L \\in \\mathbb{R}$. Let $\\mathbf{v}$ be a vector of n-dimensions, i.e., $\\mathbf{v} \\in \\mathbb{R}^n$."
]
},
{
"cell_type": "markdown",
"id": "0a060bab-59e8-440c-afd1-fccd06f03f33",
"metadata": {},
"source": [
"So, the derivative of $L$ with respect to $\\mathbf{v}$ is defined as the derivative of $L$ for each component of $\\mathbf{v}$. Formally:\n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial \\mathbf{v}} = \\begin{bmatrix}\n",
"\\frac{\\partial L}{\\partial v_1} \\\\\n",
"\\frac{\\partial L}{\\partial v_2} \\\\\n",
"\\vdots \\\\\n",
"\\frac{\\partial L}{\\partial v_n}\n",
"\\end{bmatrix}\n",
"$$\n"
]
},
{
"cell_type": "markdown",
"id": "9d679d2a-2169-456f-bf82-c27a0535915f",
"metadata": {},
"source": [
"With the same logic, given a matrix $M \\in \\mathbb{R}^{m \\times n}$, the derivative of $L$ with respect to $M$ is defined as the derivative of $L$ for each component of $M$. Formally:\n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial M} = \\begin{bmatrix}\n",
"\\frac{\\partial L}{\\partial M_{11}} & \\frac{\\partial L}{\\partial M_{12}} & \\cdots & \\frac{\\partial L}{\\partial M_{1n}} \\\\\n",
"\\frac{\\partial L}{\\partial M_{21}} & \\frac{\\partial L}{\\partial M_{22}} & \\cdots & \\frac{\\partial L}{\\partial M_{2n}} \\\\\n",
"\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
"\\frac{\\partial L}{\\partial M_{m1}} & \\frac{\\partial L}{\\partial M_{m2}} & \\cdots & \\frac{\\partial L}{\\partial M_{mn}}\n",
"\\end{bmatrix}\n",
"$$\n"
]
},
{
"cell_type": "markdown",
"id": "929aa553-91eb-4bef-9bf0-611225ae0d54",
"metadata": {},
"source": [
"#### 1.5.1 Compute $\\frac{\\partial L}{\\partial W}$"
]
},
{
"cell_type": "markdown",
"id": "c1b80af5-150d-4db8-b745-28e40281e944",
"metadata": {},
"source": [
"Since our layer receives the quantity $\\frac{\\partial L}{\\partial \\mathbf{y}}$ during back-propagation, our task is to use it to compute the derivative of $L$ with respect to $W$."
]
},
{
"cell_type": "markdown",
"id": "f34173b9-ab3c-4742-8aad-8d15485e56b0",
"metadata": {},
"source": [
"Given row index $i \\in \\{1, ..., n\\}$ and column index $j \\in \\{1, ..., m\\}$:\n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial W_{ij}} = \\frac{\\partial L}{\\partial y_1} \\underbrace{\\frac{\\partial y_1}{\\partial W_{ij}}}_{=0} + \\frac{\\partial L}{\\partial y_2} \\underbrace{\\frac{\\partial y_2}{\\partial W_{ij}}}_{=0} + \\dots + \\frac{\\partial L}{\\partial y_i} \\frac{\\partial y_i}{\\partial W_{ij}} + \\dots + \\frac{\\partial L}{\\partial y_n} \\underbrace{\\frac{\\partial y_n}{\\partial W_{ij}}}_{=0}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "1debe03f-2623-49ed-b2b4-b0dea0bbb46f",
"metadata": {},
"source": [
"Thus,\n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial W_{ij}} = \\frac{\\partial L}{\\partial y_i} \\frac{\\partial y_i}{\\partial W_{ij}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "c99f801e-b8dc-4966-b0c3-6fc48be804f0",
"metadata": {},
"source": [
"We have:\n",
"\n",
"$$\n",
"y_i = W_{i1}x_1 + \\dots + W_{ij}x_j + \\dots + W_{im}x_m\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "4e0edccc-9691-4669-a4f6-6f5d123270c8",
"metadata": {},
"source": [
"Then, the derivative of $y_i$ with respect to $W_{ij}$ is:\n",
"\n",
"$$\n",
"\\frac{\\partial y_i}{\\partial W_{ij}} = x_j\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "f8fb5624-f7ee-4400-abaa-40ea64531aba",
"metadata": {},
"source": [
"Finally, \n",
"\n",
"$$\n",
" \\forall i \\in \\{1, ..., n \\}, j \\in \\{1, ..., m\\} \\mid \\frac{\\partial L}{\\partial W_{ij}} = \\frac{\\partial L}{\\partial y_i} x_j\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "a8df1afc-ce5a-4113-bb5c-022b8b38d0a1",
"metadata": {},
"source": [
"Using this formula to fill the matrix $\\frac{\\partial L}{\\partial W}$:\n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial W} = \\begin{bmatrix}\n",
"\\frac{\\partial L}{\\partial W_{11}} & \\frac{\\partial L}{\\partial W_{12}} & \\cdots & \\frac{\\partial L}{\\partial W_{1n}} \\\\\n",
"\\frac{\\partial L}{\\partial W_{21}} & \\frac{\\partial L}{\\partial W_{22}} & \\cdots & \\frac{\\partial L}{\\partial W_{2n}} \\\\\n",
"\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
"\\frac{\\partial L}{\\partial W_{m1}} & \\frac{\\partial L}{\\partial W_{m2}} & \\cdots & \\frac{\\partial L}{\\partial W_{mn}}\n",
"\\end{bmatrix} = \\begin{bmatrix}\n",
"\\frac{\\partial L}{\\partial y_1} x_1 & \\frac{\\partial L}{\\partial y_1} x_2 & \\cdots &\\frac{\\partial L}{\\partial y_1} x_n \\\\\n",
"\\frac{\\partial L}{\\partial y_2} x_1 & \\frac{\\partial L}{\\partial y_2} x_2 & \\cdots &\\frac{\\partial L}{\\partial y_2} x_n \\\\\n",
"\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
"\\frac{\\partial L}{\\partial y_m} x_1 & \\frac{\\partial L}{\\partial y_m} x_2 & \\cdots &\\frac{\\partial L}{\\partial y_m} x_n \\\\\n",
"\\end{bmatrix} \n",
"= \n",
"\\begin{bmatrix}\n",
" \\frac{\\partial L}{\\partial y_1} \\\\\n",
" \\frac{\\partial L}{\\partial y_2} \\\\\n",
" \\vdots \\\\\n",
" \\frac{\\partial L}{\\partial y_m}\n",
"\\end{bmatrix}\n",
"\\begin{bmatrix}\n",
" x_1 & x_2 & \\dots & x_n \\\\\n",
"\\end{bmatrix} = \\frac{\\partial L}{\\partial \\mathbf{y}} \\mathbf{x}^T\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "f4cfd831-3df1-4302-8a0a-84bb83beb60f",
"metadata": {},
"source": [
"Finally,\n",
"\n",
"$$\n",
" \\frac{\\partial L}{\\partial W} = \\frac{\\partial L}{\\partial \\mathbf{y}} \\mathbf{x}^T\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "33ec7f99-3f41-4b66-ac43-e2cdebab0d51",
"metadata": {},
"source": [
"#### 1.5.2 Compute $\\frac{\\partial L}{\\partial \\mathbf{x}}$"
]
},
{
"cell_type": "markdown",
"id": "edf39607-1ad2-42f7-a0c9-0a42aad038e9",
"metadata": {},
"source": [
"With the same logic as before, let's compute the derivative of $L$ with respect to input vector $\\mathbf{x}, i.e. $$\\frac{\\partial L}{\\partial \\mathbf{x}}$."
]
},
{
"cell_type": "markdown",
"id": "43a44656-993b-4fd5-a046-1bde6f823e56",
"metadata": {},
"source": [
"For given $i \\in {1, ..., n}$ \n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial x_i} = \\frac{\\partial L}{\\partial y_1} \\underbrace{\\frac{\\partial y_1}{\\partial x_i}}_{W_{j1}} + \\dots + \\frac{\\partial L}{\\partial y_j} \\underbrace{\\frac{\\partial y_j}{\\partial x_i}}_{W_{ji}} + \\dots + \\frac{\\partial L}{\\partial y_m} \\underbrace{\\frac{\\partial y_m}{\\partial x_i}}_{W_{jm}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "01fa24d2-4045-4a01-b15a-e5838c45e2d0",
"metadata": {},
"source": [
"Because we have:\n",
"\n",
"$$\n",
"y_j = W_{j1}x_1 + \\dots + W_{ji}x_i + \\dots + W_{jm}x_m\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "c858a466-0088-4912-b509-b6aa89fa7952",
"metadata": {},
"source": [
"Thus, \n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial x_i} = \\frac{\\partial L}{\\partial y_1}W_{j1} + \\dots + \\frac{\\partial L}{\\partial y_j}W_{ji} + \\dots + \\frac{\\partial L}{\\partial y_m} W_{jm}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "fe997753-95af-4976-9646-c020a9532e3e",
"metadata": {},
"source": [
"Using this formula to fill the vector $\\frac{\\partial L}{\\partial \\mathbf{x}}$:\n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial \\mathbf{x}} = \\begin{bmatrix}\n",
" \\frac{\\partial L}{\\partial x_1} \\\\\n",
" \\frac{\\partial L}{\\partial x_2} \\\\\n",
" \\vdots \\\\\n",
" \\frac{\\partial L}{\\partial x_n}\n",
"\\end{bmatrix} = \\begin{bmatrix}\n",
" \\frac{\\partial L}{\\partial y_1}W_{11} + \\dots + \\frac{\\partial L}{\\partial y_j}W_{j1} + \\dots + \\frac{\\partial L}{\\partial y_m} W_{m1} \\\\\n",
" \\frac{\\partial L}{\\partial y_1}W_{12} + \\dots + \\frac{\\partial L}{\\partial y_j}W_{j2} + \\dots + \\frac{\\partial L}{\\partial y_m} W_{m2} \\\\\n",
" \\vdots \\\\\n",
" \\frac{\\partial L}{\\partial y_1}W_{1n} + \\dots + \\frac{\\partial L}{\\partial y_j}W_{jn} + \\dots + \\frac{\\partial L}{\\partial y_m} W_{mn}\n",
"\\end{bmatrix} = \\begin{bmatrix}\n",
" W_{11} & W_{21} & \\dots & W_{m1} \\\\\n",
" W_{12} & W_{22} & \\dots & W_{m2} \\\\\n",
" \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
" W_{1n} & W_{2n} & \\dots & W_{mn}\n",
"\\end{bmatrix} \\begin{bmatrix}\n",
" \\frac{\\partial L}{\\partial y_1} \\\\\n",
" \\frac{\\partial L}{\\partial y_2} \\\\\n",
" \\vdots \\\\\n",
" \\frac{\\partial L}{\\partial y_m}\n",
"\\end{bmatrix} = W^T \\frac{\\partial L}{\\partial \\mathbf{y}}\n",
"$$\n"
]
},
{
"cell_type": "markdown",
"id": "964c42f8-80dc-4a16-9efb-4a5c201f7c38",
"metadata": {},
"source": [
"Finally,\n",
"\n",
"$$\n",
"\\frac{\\partial L}{\\partial \\mathbf{x}} = W^T \\frac{\\partial L}{\\partial \\mathbf{y}}\n",
"$$"
]
},
{
"cell_type": "markdown",
"id": "77be53d3-860a-4f70-8ec2-899d5c0210c1",
"metadata": {},
"source": [
"<div class=\"admonition tip\" markdown=\"\">\n",
"<p class=\"admonition-title\">Rules to Compute Gradients</p>\n",
"<p>The layer receives the gradient flow $\\frac{\\partial L}{\\partial \\mathbf{y}}$. Therefore, the gradients can be computed as follows:</p>\n",
"<p>\n",
"$$\n",
" \\frac{\\partial L}{\\partial W} = \\frac{\\partial L}{\\partial \\mathbf{y}} \\mathbf{x}^T\n",
"$$\n",
"</p>\n",
"<p>\n",
"$$\n",
" \\frac{\\partial L}{\\partial \\mathbf{x}} = W^T \\frac{\\partial L}{\\partial \\mathbf{y}}\n",
"$$\n",
"</p>\n",
"</div>\n"
]
},
{
"cell_type": "markdown",
"id": "db454d75-1697-49f4-9e45-ad84545a149b",
"metadata": {},
"source": [
"## 02. Implementation - Build Fully Connected Layer from scratch"
]
},
{
"cell_type": "markdown",
"id": "30936b63-9f00-42ce-a597-7e0cba4fc0d5",
"metadata": {},
"source": [
"At this stage we've covered all we need to implement the fully connected layer using only **Numpy**.\n",
"\n",
"For instance, we have two pass modes. First, the forward pass performers of computing the output. Second, the backward pass where the gradients are calculated, helps us update the model's parameters, to make more accurate predictions."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3ed00c8d-9f14-4b7a-99aa-fd06dd3e5464",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
Binary file added docs/images/logo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 624026f

Please sign in to comment.