add nanotorch logo

JamorMoussa · Jul 8, 2024 · 624026f · 624026f
1 parent d3c50b9
commit 624026f
Show file tree

Hide file tree

Showing 3 changed files with 335 additions and 8 deletions.
diff --git a/README.md b/README.md
@@ -1,3 +1,11 @@
+<div>
+    <center>
+        <img src="./docs/images/logo.png" width="300px">
+    </center>
+
+</div>
+
+
 # NanoTorch
 
 **NanoTorch** is a deep learning library (micro-framework) inspired by the PyTorch framework, which 

diff --git a/docs/docs/nanotorch-linear-layer.ipynb b/docs/docs/nanotorch-linear-layer.ipynb
@@ -301,26 +301,26 @@
    "metadata": {},
    "source": [
     "$$\n",
-    "\\begin{pmatrix}\n",
+    "\\begin{bmatrix}\n",
     "y_1 \\\\\n",
     "y_2 \\\\\n",
     "\\vdots \\\\\n",
     "y_m\n",
-    "\\end{pmatrix}\n",
+    "\\end{bmatrix}\n",
     "=\n",
-    "\\begin{pmatrix}\n",
+    "\\begin{bmatrix}\n",
     "w_{10} & w_{11} & w_{12} & \\dots & w_{1n} \\\\\n",
     "w_{20} & w_{21} & w_{22} & \\dots & w_{2n} \\\\\n",
     "\\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
     "w_{m0} & w_{m1} & w_{m2} & \\dots & w_{mn}\n",
-    "\\end{pmatrix}\n",
-    "\\begin{pmatrix}\n",
+    "\\end{bmatrix}\n",
+    "\\begin{bmatrix}\n",
     "1 \\\\\n",
     "x_1 \\\\\n",
     "x_2 \\\\\n",
     "\\vdots \\\\\n",
     "x_n\n",
-    "\\end{pmatrix}\n",
+    "\\end{bmatrix}\n",
     "$$"
    ]
   },
@@ -340,12 +340,12 @@
     "Where\n",
     "\n",
     "$$\n",
-    "W  = \\begin{pmatrix}\n",
+    "W  = \\begin{bmatrix}\n",
     "w_{10} & w_{11} & w_{12} & \\dots & w_{1n} \\\\\n",
     "w_{20} & w_{21} & w_{22} & \\dots & w_{2n} \\\\\n",
     "\\vdots & \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
     "w_{m0} & w_{m1} & w_{m2} & \\dots & w_{mn}\n",
-    "\\end{pmatrix} \\in \\mathbb{R}^{m \\times (n+1)}\n",
+    "\\end{bmatrix} \\in \\mathbb{R}^{m \\times (n+1)}\n",
     "$$\n",
     "\n",
     "Here, $\\mathbf{x} \\in \\mathbb{R}^{(n+1)}$ and $\\mathbf{y} \\in \\mathbb{R}^{m}$ denote the input and output vectors of the fully connected layer, respectively."
@@ -418,6 +418,325 @@
     "\n",
     "The following figure shows that the fully connected layer receives the gradient flows from the subsequent layer, denoted as $\\frac{\\partial L}{\\partial \\mathbf{y}}$. This quantity is used to compute the gradient of the loss with respect to the current layer's parameters $\\frac{\\partial L}{\\partial W}$. Then, it passes the gradient with respect to the input to the previous layers $\\frac{\\partial L}{\\partial \\mathbf{x}}$, following the chain rule in backpropagation."
    ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f72511b7-8639-40e8-892a-6f7128e9a0ac",
+   "metadata": {},
+   "source": [
+    "<figure markdown=\"span\">\n",
+    "    <center>\n",
+    "        <img src=\"https://raw.githubusercontent.com/JamorMoussa/NanoTorch/dev/docs/images/docs/linear/linear-back-propagation.png\" width=\"300\" />\n",
+    "    </center>\n",
+    "</figure>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9c0f345b-75a9-401d-a44d-afeff9063409",
+   "metadata": {},
+   "source": [
+    "For instance, let's break down each derivative.\n",
+    "\n",
+    "The loss function is a scalar value, i.e., $L \\in \\mathbb{R}$. Let $\\mathbf{v}$ be a vector of n-dimensions, i.e., $\\mathbf{v} \\in \\mathbb{R}^n$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "0a060bab-59e8-440c-afd1-fccd06f03f33",
+   "metadata": {},
+   "source": [
+    "So, the derivative of $L$ with respect to $\\mathbf{v}$ is defined as the derivative of $L$ for each component of $\\mathbf{v}$. Formally:\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial \\mathbf{v}} = \\begin{bmatrix}\n",
+    "\\frac{\\partial L}{\\partial v_1} \\\\\n",
+    "\\frac{\\partial L}{\\partial v_2} \\\\\n",
+    "\\vdots \\\\\n",
+    "\\frac{\\partial L}{\\partial v_n}\n",
+    "\\end{bmatrix}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9d679d2a-2169-456f-bf82-c27a0535915f",
+   "metadata": {},
+   "source": [
+    "With the same logic, given a matrix $M \\in \\mathbb{R}^{m \\times n}$, the derivative of $L$ with respect to $M$ is defined as the derivative of $L$ for each component of $M$. Formally:\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial M} = \\begin{bmatrix}\n",
+    "\\frac{\\partial L}{\\partial M_{11}} & \\frac{\\partial L}{\\partial M_{12}} & \\cdots & \\frac{\\partial L}{\\partial M_{1n}} \\\\\n",
+    "\\frac{\\partial L}{\\partial M_{21}} & \\frac{\\partial L}{\\partial M_{22}} & \\cdots & \\frac{\\partial L}{\\partial M_{2n}} \\\\\n",
+    "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
+    "\\frac{\\partial L}{\\partial M_{m1}} & \\frac{\\partial L}{\\partial M_{m2}} & \\cdots & \\frac{\\partial L}{\\partial M_{mn}}\n",
+    "\\end{bmatrix}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "929aa553-91eb-4bef-9bf0-611225ae0d54",
+   "metadata": {},
+   "source": [
+    "#### 1.5.1 Compute $\\frac{\\partial L}{\\partial W}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c1b80af5-150d-4db8-b745-28e40281e944",
+   "metadata": {},
+   "source": [
+    "Since our layer receives the quantity $\\frac{\\partial L}{\\partial \\mathbf{y}}$ during back-propagation, our task is to use it to compute the derivative of $L$ with respect to $W$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f34173b9-ab3c-4742-8aad-8d15485e56b0",
+   "metadata": {},
+   "source": [
+    "Given row index $i \\in \\{1, ..., n\\}$ and column index $j \\in \\{1, ..., m\\}$:\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial W_{ij}} = \\frac{\\partial L}{\\partial y_1} \\underbrace{\\frac{\\partial y_1}{\\partial W_{ij}}}_{=0} + \\frac{\\partial L}{\\partial y_2} \\underbrace{\\frac{\\partial y_2}{\\partial W_{ij}}}_{=0} + \\dots + \\frac{\\partial L}{\\partial y_i} \\frac{\\partial y_i}{\\partial W_{ij}} + \\dots + \\frac{\\partial L}{\\partial y_n} \\underbrace{\\frac{\\partial y_n}{\\partial W_{ij}}}_{=0}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1debe03f-2623-49ed-b2b4-b0dea0bbb46f",
+   "metadata": {},
+   "source": [
+    "Thus,\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial W_{ij}} = \\frac{\\partial L}{\\partial y_i} \\frac{\\partial y_i}{\\partial W_{ij}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c99f801e-b8dc-4966-b0c3-6fc48be804f0",
+   "metadata": {},
+   "source": [
+    "We have:\n",
+    "\n",
+    "$$\n",
+    "y_i = W_{i1}x_1 + \\dots + W_{ij}x_j + \\dots + W_{im}x_m\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4e0edccc-9691-4669-a4f6-6f5d123270c8",
+   "metadata": {},
+   "source": [
+    "Then, the derivative of $y_i$ with respect to $W_{ij}$ is:\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial y_i}{\\partial W_{ij}} = x_j\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f8fb5624-f7ee-4400-abaa-40ea64531aba",
+   "metadata": {},
+   "source": [
+    "Finally, \n",
+    "\n",
+    "$$\n",
+    "    \\forall i \\in  \\{1, ..., n \\}, j \\in \\{1, ..., m\\} \\mid \\frac{\\partial L}{\\partial W_{ij}} = \\frac{\\partial L}{\\partial y_i} x_j\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "a8df1afc-ce5a-4113-bb5c-022b8b38d0a1",
+   "metadata": {},
+   "source": [
+    "Using this formula to fill the matrix $\\frac{\\partial L}{\\partial W}$:\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial W} = \\begin{bmatrix}\n",
+    "\\frac{\\partial L}{\\partial W_{11}} & \\frac{\\partial L}{\\partial W_{12}} & \\cdots & \\frac{\\partial L}{\\partial W_{1n}} \\\\\n",
+    "\\frac{\\partial L}{\\partial W_{21}} & \\frac{\\partial L}{\\partial W_{22}} & \\cdots & \\frac{\\partial L}{\\partial W_{2n}} \\\\\n",
+    "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
+    "\\frac{\\partial L}{\\partial W_{m1}} & \\frac{\\partial L}{\\partial W_{m2}} & \\cdots & \\frac{\\partial L}{\\partial W_{mn}}\n",
+    "\\end{bmatrix} = \\begin{bmatrix}\n",
+    "\\frac{\\partial L}{\\partial y_1} x_1 & \\frac{\\partial L}{\\partial y_1} x_2 & \\cdots &\\frac{\\partial L}{\\partial y_1} x_n \\\\\n",
+    "\\frac{\\partial L}{\\partial y_2} x_1 & \\frac{\\partial L}{\\partial y_2} x_2 & \\cdots &\\frac{\\partial L}{\\partial y_2} x_n \\\\\n",
+    "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
+    "\\frac{\\partial L}{\\partial y_m} x_1 & \\frac{\\partial L}{\\partial y_m} x_2 & \\cdots &\\frac{\\partial L}{\\partial y_m} x_n \\\\\n",
+    "\\end{bmatrix} \n",
+    "= \n",
+    "\\begin{bmatrix}\n",
+    "    \\frac{\\partial L}{\\partial y_1} \\\\\n",
+    "    \\frac{\\partial L}{\\partial y_2} \\\\\n",
+    "    \\vdots \\\\\n",
+    "    \\frac{\\partial L}{\\partial y_m}\n",
+    "\\end{bmatrix}\n",
+    "\\begin{bmatrix}\n",
+    "    x_1 & x_2 & \\dots & x_n \\\\\n",
+    "\\end{bmatrix} = \\frac{\\partial L}{\\partial \\mathbf{y}} \\mathbf{x}^T\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f4cfd831-3df1-4302-8a0a-84bb83beb60f",
+   "metadata": {},
+   "source": [
+    "Finally,\n",
+    "\n",
+    "$$\n",
+    "    \\frac{\\partial L}{\\partial W} =  \\frac{\\partial L}{\\partial \\mathbf{y}} \\mathbf{x}^T\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "33ec7f99-3f41-4b66-ac43-e2cdebab0d51",
+   "metadata": {},
+   "source": [
+    "#### 1.5.2 Compute $\\frac{\\partial L}{\\partial \\mathbf{x}}$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "edf39607-1ad2-42f7-a0c9-0a42aad038e9",
+   "metadata": {},
+   "source": [
+    "With the same logic as before, let's compute the derivative of $L$ with respect to input vector $\\mathbf{x}, i.e. $$\\frac{\\partial L}{\\partial \\mathbf{x}}$."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43a44656-993b-4fd5-a046-1bde6f823e56",
+   "metadata": {},
+   "source": [
+    "For given $i \\in {1, ..., n}$ \n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial x_i} = \\frac{\\partial L}{\\partial y_1}  \\underbrace{\\frac{\\partial y_1}{\\partial x_i}}_{W_{j1}} + \\dots + \\frac{\\partial L}{\\partial y_j}  \\underbrace{\\frac{\\partial y_j}{\\partial x_i}}_{W_{ji}} + \\dots + \\frac{\\partial L}{\\partial y_m}  \\underbrace{\\frac{\\partial y_m}{\\partial x_i}}_{W_{jm}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "01fa24d2-4045-4a01-b15a-e5838c45e2d0",
+   "metadata": {},
+   "source": [
+    "Because we have:\n",
+    "\n",
+    "$$\n",
+    "y_j = W_{j1}x_1 + \\dots + W_{ji}x_i + \\dots + W_{jm}x_m\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c858a466-0088-4912-b509-b6aa89fa7952",
+   "metadata": {},
+   "source": [
+    "Thus, \n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial x_i} = \\frac{\\partial L}{\\partial y_1}W_{j1} + \\dots + \\frac{\\partial L}{\\partial y_j}W_{ji} + \\dots + \\frac{\\partial L}{\\partial y_m} W_{jm}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "fe997753-95af-4976-9646-c020a9532e3e",
+   "metadata": {},
+   "source": [
+    "Using this formula to fill the vector $\\frac{\\partial L}{\\partial \\mathbf{x}}$:\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial \\mathbf{x}} = \\begin{bmatrix}\n",
+    "    \\frac{\\partial L}{\\partial x_1} \\\\\n",
+    "    \\frac{\\partial L}{\\partial x_2} \\\\\n",
+    "    \\vdots \\\\\n",
+    "    \\frac{\\partial L}{\\partial x_n}\n",
+    "\\end{bmatrix} = \\begin{bmatrix}\n",
+    "    \\frac{\\partial L}{\\partial y_1}W_{11} + \\dots + \\frac{\\partial L}{\\partial y_j}W_{j1} + \\dots + \\frac{\\partial L}{\\partial y_m} W_{m1} \\\\\n",
+    "    \\frac{\\partial L}{\\partial y_1}W_{12} + \\dots + \\frac{\\partial L}{\\partial y_j}W_{j2} + \\dots + \\frac{\\partial L}{\\partial y_m} W_{m2} \\\\\n",
+    "    \\vdots \\\\\n",
+    "    \\frac{\\partial L}{\\partial y_1}W_{1n} + \\dots + \\frac{\\partial L}{\\partial y_j}W_{jn} + \\dots + \\frac{\\partial L}{\\partial y_m} W_{mn}\n",
+    "\\end{bmatrix} = \\begin{bmatrix}\n",
+    "    W_{11} & W_{21} & \\dots & W_{m1} \\\\\n",
+    "    W_{12} & W_{22} & \\dots & W_{m2} \\\\\n",
+    "    \\vdots & \\vdots & \\ddots & \\vdots \\\\\n",
+    "    W_{1n} & W_{2n} & \\dots & W_{mn}\n",
+    "\\end{bmatrix} \\begin{bmatrix}\n",
+    "    \\frac{\\partial L}{\\partial y_1} \\\\\n",
+    "    \\frac{\\partial L}{\\partial y_2} \\\\\n",
+    "    \\vdots \\\\\n",
+    "    \\frac{\\partial L}{\\partial y_m}\n",
+    "\\end{bmatrix} =  W^T \\frac{\\partial L}{\\partial \\mathbf{y}}\n",
+    "$$\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "964c42f8-80dc-4a16-9efb-4a5c201f7c38",
+   "metadata": {},
+   "source": [
+    "Finally,\n",
+    "\n",
+    "$$\n",
+    "\\frac{\\partial L}{\\partial \\mathbf{x}} = W^T \\frac{\\partial L}{\\partial \\mathbf{y}}\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "77be53d3-860a-4f70-8ec2-899d5c0210c1",
+   "metadata": {},
+   "source": [
+    "<div class=\"admonition tip\" markdown=\"\">\n",
+    "<p class=\"admonition-title\">Rules to Compute Gradients</p>\n",
+    "<p>The layer receives the gradient flow $\\frac{\\partial L}{\\partial \\mathbf{y}}$. Therefore, the gradients can be computed as follows:</p>\n",
+    "<p>\n",
+    "$$\n",
+    "    \\frac{\\partial L}{\\partial W} = \\frac{\\partial L}{\\partial \\mathbf{y}} \\mathbf{x}^T\n",
+    "$$\n",
+    "</p>\n",
+    "<p>\n",
+    "$$\n",
+    "    \\frac{\\partial L}{\\partial \\mathbf{x}} = W^T \\frac{\\partial L}{\\partial \\mathbf{y}}\n",
+    "$$\n",
+    "</p>\n",
+    "</div>\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "db454d75-1697-49f4-9e45-ad84545a149b",
+   "metadata": {},
+   "source": [
+    "## 02. Implementation - Build Fully Connected Layer from scratch"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "30936b63-9f00-42ce-a597-7e0cba4fc0d5",
+   "metadata": {},
+   "source": [
+    "At this stage we've covered all we need to implement the fully connected layer using only **Numpy**.\n",
+    "\n",
+    "For instance, we have two pass modes. First, the forward pass performers of computing the output. Second, the backward pass where the gradients are calculated, helps us update the model's parameters, to make more accurate predictions."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "3ed00c8d-9f14-4b7a-99aa-fd06dd3e5464",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {

diff --git a/docs/images/logo.png b/docs/images/logo.png