Add codespell support (config, workflow to detect/not fix) and make i…

…t fix few typos (#207)
danielfrg · Jul 20, 2024 · 64cfcb6 · 64cfcb6
1 parent e6d27f3
commit 64cfcb6
Show file tree

Hide file tree

Showing 11 changed files with 70 additions and 37 deletions.
diff --git a/.github/workflows/codespell.yml b/.github/workflows/codespell.yml
@@ -0,0 +1,25 @@
+# Codespell configuration is within pyproject.toml
+---
+name: Codespell
+
+on:
+  push:
+    branches: [main]
+  pull_request:
+    branches: [main]
+
+permissions:
+  contents: read
+
+jobs:
+  codespell:
+    name: Check for spelling errors
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout
+        uses: actions/checkout@v4
+      - name: Annotate locations with typos
+        uses: codespell-project/codespell-problem-matcher@v1
+      - name: Codespell
+        uses: codespell-project/actions-codespell@v2
diff --git a/demo/docs/demo-script.py b/demo/docs/demo-script.py
@@ -43,7 +43,7 @@
 #
 # ### More markdown things
 #
-# > Pellentesque pretium euismod laoreet. Nullam eget mauris ut tellus vehicula consequat. In sed molestie metus. Nulla at varius nunc, sit amet semper arcu. Integer tristique augue eget auctor aliquam. Donec ornare consectetur lectus et viverra. Duis vel elit ac lectus accumsan gravida non ac erat.
+# > Pellentesque pretium euismod laoreet. Nullam eget mauris ut tellus vehicula consequat. In sed molestie metus. Nulla at various nunc, sit amet semper arcu. Integer tristique augue eget auctor aliquam. Donec ornare consectetur lectus et viverra. Duis vel elit ac lectus accumsan gravida non ac erat.
 #
 # Ut in ipsum id neque pellentesque iaculis. Pellentesque massa erat, rhoncus id auctor vel, tempor id neque. Nunc nec iaculis enim. Duis eget tincidunt tellus. Proin vitae ultrices velit.
 #

diff --git a/demo/docs/variational-inference-nb.ipynb b/demo/docs/variational-inference-nb.ipynb
@@ -22,7 +22,7 @@
     "\n",
     "### Bayesian Networks\n",
     "\n",
-    "Bayesian Networks are graph based representations to acccount for randomness while modelling our data. The nodes of the graph are random variables and the connections between nodes denote the direct influence from parent to child.\n",
+    "Bayesian Networks are graph based representations to account for randomness while modelling our data. The nodes of the graph are random variables and the connections between nodes denote the direct influence from parent to child.\n",
     "\n",
     "### Bayesian Network Example\n",
     "\n",
@@ -40,7 +40,7 @@
     "    <summary>Extra: Proof of decomposition</summary>\n",
     "    <p><br>First, let's recall conditional probability,<br>\n",
     "    $$P\\left (A|B\\right ) = \\frac{P\\left (A, B\\right )}{P\\left (B\\right )}$$\n",
-    "    The above equation is so derived because of reduction of sample space of $A$ when $B$ has already occured.\n",
+    "    The above equation is so derived because of reduction of sample space of $A$ when $B$ has already occurred.\n",
     "    Now, adjusting terms -<br>\n",
     "    $$P\\left (A, B\\right ) = P\\left (A|B\\right )*P\\left (B\\right )$$\n",
     "    This equation is called chain rule of probability. Let's generalize this rule for Bayesian Networks. The ordering of names of nodes is such that parent(s) of nodes lie above them (Breadth First Ordering).<br>\n",
@@ -105,11 +105,11 @@
     "The negative sign in the formula has high intuitive meaning. In words, it signifies whenever the probability of certain events is high, the related information is less and vica versa. For example -\n",
     "\n",
     "1. Consider the statement - It never snows in deserts. The probability of this statement being true is significantly high because we already know that it is hardly possible to snow in deserts. So, the related information is very small.\n",
-    "2. Now consider - There was a snowfall in Sahara Desert in late December 2019. Wow, thats a great news because some unlikely event occured (probability was less). In turn, the information is high.\n",
+    "2. Now consider - There was a snowfall in Sahara Desert in late December 2019. Wow, that's a great news because some unlikely event occurred (probability was less). In turn, the information is high.\n",
     "\n",
     "### Entropy\n",
     "\n",
-    "Entropy quantifies how much **average** Information is present in occurence of events. It is denoted by $H$. It is named Differential Entropy in case of Real Continuous Domain.\n",
+    "Entropy quantifies how much **average** Information is present in occurrence of events. It is denoted by $H$. It is named Differential Entropy in case of Real Continuous Domain.\n",
     "\n",
     "$$\n",
     "H =  E_{P\\left (X\\right )} \\left [-\\log\\left (P\\left (X\\right )\\right )\\right ]\\\\\n",
@@ -118,7 +118,7 @@
     "\n",
     "### Entropy of Normal Distribution\n",
     "\n",
-    "As an exercise, let's calculate entropy of Normal Distribution. Let's denote $\\mu$ as mean nd $\\sigma$ as standard deviation of Normal Distribution. Remember the results, we will need them further.\n",
+    "As an exercise, let's calculate entropy of Normal Distribution. Let's denote $\\mu$ as mean and $\\sigma$ as standard deviation of Normal Distribution. Remember the results, we will need them further.\n",
     "\n",
     "$$\n",
     "X \\sim Normal\\left (\\mu, \\sigma^2\\right )\\\\\n",
@@ -253,7 +253,7 @@
     "    <p><br>To simplify notations, let's use $Y=T(X)$ instead of $\\zeta=T(\\theta)$. After reaching the results, we will put the values back. Also, let's denote cummulative distribution function (cdf) as $F$. There are two cases which respect to properties of function $T$.<br><br><strong>Case 1</strong> - When $T$ is an increasing function $$F_Y(y) = P(Y <= y) = P(T(X) <= y)\\\\\n",
     "    = P\\left(X <= T^{-1}(y) \\right) = F_X\\left(T^{-1}(y) \\right)\\\\\n",
     "    F_Y(y) = F_X\\left(T^{-1}(y) \\right)$$Let's differentiate with respect to $y$ both sides - $$\\frac{\\mathrm{d} (F_Y(y))}{\\mathrm{d} y} = \\frac{\\mathrm{d} (F_X\\left(T^{-1}(y) \\right))}{\\mathrm{d} y}\\\\\n",
-    "    P_Y(y) = P_X\\left(T^{-1}(y) \\right) \\frac{\\mathrm{d} (T^{-1}(y))}{\\mathrm{d} y}$$<strong>Case 2</strong> - When $T$ is a descreasing function $$F_Y(y) = P(Y <= y) = P(T(X) <= y) = P\\left(X >= T^{-1}(y) \\right)\\\\\n",
+    "    P_Y(y) = P_X\\left(T^{-1}(y) \\right) \\frac{\\mathrm{d} (T^{-1}(y))}{\\mathrm{d} y}$$<strong>Case 2</strong> - When $T$ is a decreasing function $$F_Y(y) = P(Y <= y) = P(T(X) <= y) = P\\left(X >= T^{-1}(y) \\right)\\\\\n",
     "    = 1-P\\left(X < T^{-1}(y) \\right) = 1-P\\left(X <= T^{-1}(y) \\right) = 1-F_X\\left(T^{-1}(y) \\right)\\\\\n",
     "    F_Y(y) = 1-F_X\\left(T^{-1}(y) \\right)$$Let's differentiate with respect to $y$ both sides - $$\\frac{\\mathrm{d} (F_Y(y))}{\\mathrm{d} y} = \\frac{\\mathrm{d} (1-F_X\\left(T^{-1}(y) \\right))}{\\mathrm{d} y}\\\\\n",
     "    P_Y(y) = (-1) P_X\\left(T^{-1}(y) \\right) (-1) \\frac{\\mathrm{d} (T^{-1}(y))}{\\mathrm{d} y}\\\\\n",
@@ -339,7 +339,7 @@
     "    trans.plot(zeta, p_zeta, color='blue', lw=2)\n",
     "    trans.set_xlabel(r\"$\\zeta$\")\n",
     "    trans.set_ylabel(r\"$P(\\zeta)$\")\n",
-    "    trans.set_title(\"Transfomed Space\");\n"
+    "    trans.set_title(\"Transformed Space\");\n"
    ]
   },
   {

diff --git a/demo/docs/variational-inference-script.py b/demo/docs/variational-inference-script.py
@@ -31,7 +31,7 @@
 #
 # ### Bayesian Networks
 #
-# Bayesian Networks are graph based representations to acccount for randomness while modelling our data. The nodes of the graph are random variables and the connections between nodes denote the direct influence from parent to child.
+# Bayesian Networks are graph based representations to account for randomness while modelling our data. The nodes of the graph are random variables and the connections between nodes denote the direct influence from parent to child.
 #
 # ### Bayesian Network Example
 #
@@ -49,7 +49,7 @@
 #     <summary>Extra: Proof of decomposition</summary>
 #     <p><br>First, let's recall conditional probability,<br>
 #     $$P\left (A|B\right ) = \frac{P\left (A, B\right )}{P\left (B\right )}$$
-#     The above equation is so derived because of reduction of sample space of $A$ when $B$ has already occured.
+#     The above equation is so derived because of reduction of sample space of $A$ when $B$ has already occurred.
 #     Now, adjusting terms -<br>
 #     $$P\left (A, B\right ) = P\left (A|B\right )*P\left (B\right )$$
 #     This equation is called chain rule of probability. Let's generalize this rule for Bayesian Networks. The ordering of names of nodes is such that parent(s) of nodes lie above them (Breadth First Ordering).<br>
@@ -114,11 +114,11 @@
 # The negative sign in the formula has high intuitive meaning. In words, it signifies whenever the probability of certain events is high, the related information is less and vica versa. For example -
 #
 # 1. Consider the statement - It never snows in deserts. The probability of this statement being true is significantly high because we already know that it is hardly possible to snow in deserts. So, the related information is very small.
-# 2. Now consider - There was a snowfall in Sahara Desert in late December 2019. Wow, thats a great news because some unlikely event occured (probability was less). In turn, the information is high.
+# 2. Now consider - There was a snowfall in Sahara Desert in late December 2019. Wow, that's a great news because some unlikely event occurred (probability was less). In turn, the information is high.
 #
 # ### Entropy
 #
-# Entropy quantifies how much **average** Information is present in occurence of events. It is denoted by $H$. It is named Differential Entropy in case of Real Continuous Domain.
+# Entropy quantifies how much **average** Information is present in occurrence of events. It is denoted by $H$. It is named Differential Entropy in case of Real Continuous Domain.
 #
 # $$
 # H =  E_{P\left (X\right )} \left [-\log\left (P\left (X\right )\right )\right ]\\
@@ -127,7 +127,7 @@
 #
 # ### Entropy of Normal Distribution
 #
-# As an exercise, let's calculate entropy of Normal Distribution. Let's denote $\mu$ as mean nd $\sigma$ as standard deviation of Normal Distribution. Remember the results, we will need them further.
+# As an exercise, let's calculate entropy of Normal Distribution. Let's denote $\mu$ as mean and $\sigma$ as standard deviation of Normal Distribution. Remember the results, we will need them further.
 #
 # $$
 # X \sim Normal\left (\mu, \sigma^2\right )\\
@@ -259,10 +259,10 @@
 #
 # <details class="tip">
 #     <summary>Extra: Proof of transformation equation</summary>
-#     <p><br>To simplify notations, let's use $Y=T(X)$ instead of $\zeta=T(\theta)$. After reaching the results, we will put the values back. Also, let's denote cummulative distribution function (cdf) as $F$. There are two cases which respect to properties of function $T$.<br><br><strong>Case 1</strong> - When $T$ is an increasing function $$F_Y(y) = P(Y <= y) = P(T(X) <= y)\\
+#     <p><br>To simplify notations, let's use $Y=T(X)$ instead of $\zeta=T(\theta)$. After reaching the results, we will put the values back. Also, let's denote cumulative distribution function (cdf) as $F$. There are two cases which respect to properties of function $T$.<br><br><strong>Case 1</strong> - When $T$ is an increasing function $$F_Y(y) = P(Y <= y) = P(T(X) <= y)\\
 #     = P\left(X <= T^{-1}(y) \right) = F_X\left(T^{-1}(y) \right)\\
 #     F_Y(y) = F_X\left(T^{-1}(y) \right)$$Let's differentiate with respect to $y$ both sides - $$\frac{\mathrm{d} (F_Y(y))}{\mathrm{d} y} = \frac{\mathrm{d} (F_X\left(T^{-1}(y) \right))}{\mathrm{d} y}\\
-#     P_Y(y) = P_X\left(T^{-1}(y) \right) \frac{\mathrm{d} (T^{-1}(y))}{\mathrm{d} y}$$<strong>Case 2</strong> - When $T$ is a descreasing function $$F_Y(y) = P(Y <= y) = P(T(X) <= y) = P\left(X >= T^{-1}(y) \right)\\
+#     P_Y(y) = P_X\left(T^{-1}(y) \right) \frac{\mathrm{d} (T^{-1}(y))}{\mathrm{d} y}$$<strong>Case 2</strong> - When $T$ is a decreasing function $$F_Y(y) = P(Y <= y) = P(T(X) <= y) = P\left(X >= T^{-1}(y) \right)\\
 #     = 1-P\left(X < T^{-1}(y) \right) = 1-P\left(X <= T^{-1}(y) \right) = 1-F_X\left(T^{-1}(y) \right)\\
 #     F_Y(y) = 1-F_X\left(T^{-1}(y) \right)$$Let's differentiate with respect to $y$ both sides - $$\frac{\mathrm{d} (F_Y(y))}{\mathrm{d} y} = \frac{\mathrm{d} (1-F_X\left(T^{-1}(y) \right))}{\mathrm{d} y}\\
 #     P_Y(y) = (-1) P_X\left(T^{-1}(y) \right) (-1) \frac{\mathrm{d} (T^{-1}(y))}{\mathrm{d} y}\\
@@ -343,7 +343,7 @@ def plot_transformation(theta, zeta, p_theta, p_zeta):
     trans.plot(zeta, p_zeta, color="blue", lw=2)
     trans.set_xlabel(r"$\zeta$")
     trans.set_ylabel(r"$P(\zeta)$")
-    trans.set_title("Transfomed Space")
+    trans.set_title("Transformed Space")
 
 
 # ### Transformed Space Example-1

diff --git a/pyproject.toml b/pyproject.toml
@@ -109,3 +109,11 @@ build-backend = "hatchling.build"
 Documentation = "https://github.com/danielfrg/mkdocs-jupyter#readme"
 Issues = "https://github.com/danielfrg/mkdocs-jupyter/issues"
 Source = "https://github.com/danielfrg/mkdocs-jupyter"
+
+[tool.codespell]
+# Ref: https://github.com/codespell-project/codespell#using-a-config-file
+skip = '.git*,*-lock.yaml,*.lock,*.css'
+check-hidden = true
+# image embeddings and all generally too long lines in "", and all urls
+ignore-regex = '(^\s*"(image/\S+": "|.{300,}).*|https?://\S+)'
+# ignore-words-list = ''
diff --git a/src/mkdocs_jupyter/templates/mkdocs_html/notebook.html.j2 b/src/mkdocs_jupyter/templates/mkdocs_html/notebook.html.j2
@@ -1,5 +1,5 @@
 {#
-Ovewrites: https://github.com/jupyter/nbconvert/blob/main/share/templates/lab/index.html.j2
+Overwrites: https://github.com/jupyter/nbconvert/blob/main/share/templates/lab/index.html.j2
 Look for CHANGE comments to see what we changed
 #}
 

diff --git a/src/mkdocs_jupyter/tests/mkdocs/docs/backquote_toc_test.ipynb b/src/mkdocs_jupyter/tests/mkdocs/docs/backquote_toc_test.ipynb
@@ -6,7 +6,7 @@
    "source": [
     "# Multiple Backquote tests\n",
     "<!-- # Comment Test -->\n",
-    "<!-- These comments must be ommited -->\n",
+    "<!-- These comments must be omitted -->\n",
     "\n",
     "## Multiple Backquote test #1\n",
     "test text 1 start\n",
@@ -33,7 +33,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# This code node must be ommited during traversal"
+    "# This code node must be omitted during traversal"
    ]
   },
   {
@@ -42,7 +42,7 @@
    "source": [
     "# Single Backquote tests\n",
     "<!-- # Comment Test -->\n",
-    "<!-- These comments must be ommited -->\n",
+    "<!-- These comments must be omitted -->\n",
     "\n",
     "## Single Backquote test #1\n",
     "test text 1 start\n",

diff --git a/src/mkdocs_jupyter/tests/mkdocs/docs/demo-script.py b/src/mkdocs_jupyter/tests/mkdocs/docs/demo-script.py
@@ -43,7 +43,7 @@
 #
 # ### More markdown things
 #
-# > Pellentesque pretium euismod laoreet. Nullam eget mauris ut tellus vehicula consequat. In sed molestie metus. Nulla at varius nunc, sit amet semper arcu. Integer tristique augue eget auctor aliquam. Donec ornare consectetur lectus et viverra. Duis vel elit ac lectus accumsan gravida non ac erat.
+# > Pellentesque pretium euismod laoreet. Nullam eget mauris ut tellus vehicula consequat. In sed molestie metus. Nulla at various nunc, sit amet semper arcu. Integer tristique augue eget auctor aliquam. Donec ornare consectetur lectus et viverra. Duis vel elit ac lectus accumsan gravida non ac erat.
 #
 # Ut in ipsum id neque pellentesque iaculis. Pellentesque massa erat, rhoncus id auctor vel, tempor id neque. Nunc nec iaculis enim. Duis eget tincidunt tellus. Proin vitae ultrices velit.
 #

diff --git a/src/mkdocs_jupyter/tests/mkdocs/docs/demo.ipynb b/src/mkdocs_jupyter/tests/mkdocs/docs/demo.ipynb
@@ -2399,7 +2399,7 @@
     "    <summary>Extra: Proof of decomposition</summary>\n",
     "    <p><br>First, let's recall conditional probability,<br>\n",
     "    $$P\\left (A|B\\right ) = \\frac{P\\left (A, B\\right )}{P\\left (B\\right )}$$\n",
-    "    The above equation is so derived because of reduction of sample space of $A$ when $B$ has already occured.\n",
+    "    The above equation is so derived because of reduction of sample space of $A$ when $B$ has already occurred.\n",
     "    Now, adjusting terms -<br>\n",
     "    $$P\\left (A, B\\right ) = P\\left (A|B\\right )*P\\left (B\\right )$$\n",
     "    This equation is called chain rule of probability. Let's generalize this rule for Bayesian Networks. The ordering of names of nodes is such that parent(s) of nodes lie above them (Breadth First Ordering).<br>\n",