Skip to content

A step-by-step walkthrough of the inner workings of a simple neural network. The goal is to demystify the calculations behind neural networks by breaking them down into understandable components, including forward propagation, backpropagation, gradient calculations, and parameter updates.

License

Notifications You must be signed in to change notification settings

mytechnotalent/HNN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

from IPython.display import Image
Image(filename = 'HNN.jpeg')

jpeg

Hacking Neural Networks

Author: Kevin Thomas

Welcome to the Hacking Neural Networks notebook! This notebook is designed to provide a detailed, step-by-step walkthrough of the inner workings of a simple neural network. The goal is to demystify the calculations behind neural networks by breaking them down into understandable components, including forward propagation, backpropagation, gradient calculations, and parameter updates.

Overview

The focus of this repository is to make neural networks accessible and hackable. We emphasize hands-on exploration of every step in the network's operations, highlighting the following:

  • STEP 1: FORWARD PASS: How the network computes outputs.
  • STEP 2: BACK PROPAGATION: Deriving gradients for each parameter.
  • STEP 3: OPTIMIZATION / GRADIENT DESCENT: Updating weights and biases using gradient descent.

Network Architecture

The neural network we analyze consists of the following components:

  • Inputs: Two input neurons: $x_1$ and $x_2$.
  • Weights: $w_1$ and $w_2$, which connect the inputs to the output neuron.
  • Bias: $b$, an additive term to the linear combination.
  • Activation Function: $\tanh$, which introduces non-linearity.
  • Output: $o$, the result of the activation function.

Initial Setup

Image(filename = 'ae0.jpg')

jpeg

Install Libraries

!pip install torch
Requirement already satisfied: torch in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (2.5.1)
Requirement already satisfied: filelock in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (3.13.1)
Requirement already satisfied: typing-extensions>=4.8.0 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (4.12.2)
Requirement already satisfied: networkx in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (3.3)
Requirement already satisfied: jinja2 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (3.1.4)
Requirement already satisfied: fsspec in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (2024.6.1)
Requirement already satisfied: setuptools in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (75.1.0)
Requirement already satisfied: sympy==1.13.1 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from torch) (1.13.1)
Requirement already satisfied: mpmath<1.4,>=1.1.0 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from sympy==1.13.1->torch) (1.3.0)
Requirement already satisfied: MarkupSafe>=2.0 in /opt/anaconda3/envs/prod/lib/python3.12/site-packages (from jinja2->torch) (2.1.3)

Imports

import torch

Set Random Seed / Reproducibility

torch.manual_seed(42)
<torch._C.Generator at 0x10ff09170>

Define Input x1 w/ Gradients Enabled

x1 = torch.tensor([2.0], dtype=torch.double, requires_grad=True) 
x1
tensor([2.], dtype=torch.float64, requires_grad=True)
x1.ndim
1
x1.shape
torch.Size([1])

Define Input x2 w/ Gradients Enabled

x2 = torch.tensor([0.0], dtype=torch.double, requires_grad=True)
x2
tensor([0.], dtype=torch.float64, requires_grad=True)
x2.ndim
1
x2.shape
torch.Size([1])

Define Initial Weight w1 w/ Gradients Enabled

w1 = torch.tensor([-3.0], dtype=torch.double, requires_grad=True)
w1
tensor([-3.], dtype=torch.float64, requires_grad=True)
w1.ndim
1
w1.shape
torch.Size([1])

Define Initial Weight w2 w/ Gradients Enabled

w2 = torch.tensor([1.0], dtype=torch.double, requires_grad=True) 
w2
tensor([1.], dtype=torch.float64, requires_grad=True)
w2.ndim
1
w2.shape
torch.Size([1])

Define Initial Bias b w/ Gradients Enabled

b = torch.tensor([6.8814], dtype=torch.double, requires_grad=True)
b
tensor([6.8814], dtype=torch.float64, requires_grad=True)
b.ndim
1
b.shape
torch.Size([1])

Define Learning Rate

learning_rate = 0.01  # step size for gradient descent

Define Number of Epochs

epochs = 10  # number of iterations

Define Target Value

target = torch.tensor([0.0], dtype=torch.double)  # desired output value
target
tensor([0.], dtype=torch.float64)
target.ndim
1
target.shape
torch.Size([1])

Training Loop

# iterate through 10 epochs
for epoch in range(epochs):
    
    # display the corresponding network diagram image
    display(Image(filename=f"ae{epoch}.jpg"))

    # forward pass: calculate n (linear combination) and o (output using tanh)
    n = x1 * w1 + x2 * w2 + b  # n = x1*w1 + x2*w2 + b
    o = torch.tanh(n)  # o = tanh(n)
    
    # calculate the loss using Mean Squared Error
    loss = 0.5 * (o - target).pow(2)  # loss = (1/2) * (o - target)^2

    # perform backward propagation to calculate gradients
    loss.backward()  # compute gradients for w1, w2, and b
    
    # log the current values and gradients
    print(f"EPOCH {epoch + 1}")
    if epoch + 1 == 10:
        print("--------")
    else:
        print("-------")
    print("STEP 0: INITIAL VALUES")
    print(f"  initial values:")
    print(f"    x1 = {x1.item()}, x2 = {x2.item()}")
    print(f"    w1 = {w1.item():.6f}, w2 = {w2.item():.6f}, b = {b.item():.6f}")
    print("STEP 1: FORWARD PASS")
    print(f"  forward pass:")
    print(f"    n = {n.item():.6f} (n = {x1.item()}*{w1.item()} + {x2.item()}*{w2.item()} + {b.item()})")
    print(f"    o = {o.item():.6f} (o = tanh({n.item()}))")
    print(f"    loss = {loss.item():.6f} (Loss = 0.5 * ({o.item()} - {target.item()})^2)")
    print("STEP 2: BACK PROPAGATION")
    print(f"  gradients (calculated during backward pass):")
    print(f"    w1.grad = {w1.grad.item():.6f} (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))")
    print(f"    w2.grad = {w2.grad.item():.6f} (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))")
    print(f"    b.grad = {b.grad.item():.6f} (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))")
    
    # update weights and bias using gradient descent
    with torch.no_grad():

        print("STEP 3: OPTIMIZATION / GRADIENT DESCENT")
        # explicitly show gradient descent calculation
        w1 -= learning_rate * w1.grad  # update w1
        print(f"    updated w1 = {w1.item():.6f} (w1 = w1 - lr * w1.grad = {w1.item()} - {learning_rate} * {w1.grad.item()})")
        w2 -= learning_rate * w2.grad  # update w2
        print(f"    updated w2 = {w2.item():.6f} (w2 = w2 - lr * w2.grad = {w2.item()} - {learning_rate} * {w2.grad.item()})")
        b -= learning_rate * b.grad    # update b
        print(f"    updated b = {b.item():.6f} (b = b - lr * b.grad = {b.item()} - {learning_rate} * {b.grad.item()})")

        # zero gradients for the next iteration
        w1.grad.zero_()
        w2.grad.zero_()
        b.grad.zero_()
        x1.grad.zero_()
        x2.grad.zero_()
    
    # log the updated weights and bias
    print(f"  updated parameters:")
    print(f"    w1 = {w1.item():.6f}, w2 = {w2.item():.6f}, b = {b.item():.6f}\n")

jpeg

EPOCH 1
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.000000, w2 = 1.000000, b = 6.881400
STEP 1: FORWARD PASS
  forward pass:
    n = 0.881400 (n = 2.0*-3.0 + 0.0*1.0 + 6.8814)
    o = 0.707120 (o = tanh(0.8814000000000002))
    loss = 0.250009 (Loss = 0.5 * (0.7071199874301227 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.707094 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.353547 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.007071 (w1 = w1 - lr * w1.grad = -3.00707093574203 - 0.01 * 0.7070935742030305)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.877865 (b = b - lr * b.grad = 6.877864532128985 - 0.01 * 0.35354678710151527)
  updated parameters:
    w1 = -3.007071, w2 = 1.000000, b = 6.877865

jpeg

EPOCH 2
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.007071, w2 = 1.000000, b = 6.877865
STEP 1: FORWARD PASS
  forward pass:
    n = 0.863723 (n = 2.0*-3.00707093574203 + 0.0*1.0 + 6.877864532128985)
    o = 0.698171 (o = tanh(0.8637226606449246))
    loss = 0.243721 (Loss = 0.5 * (0.6981707141514888 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.715705 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.357853 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.014228 (w1 = w1 - lr * w1.grad = -3.0142279906073903 - 0.01 * 0.7157054865360251)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.874286 (b = b - lr * b.grad = 6.874286004696305 - 0.01 * 0.35785274326801253)
  updated parameters:
    w1 = -3.014228, w2 = 1.000000, b = 6.874286

jpeg

EPOCH 3
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.014228, w2 = 1.000000, b = 6.874286
STEP 1: FORWARD PASS
  forward pass:
    n = 0.845830 (n = 2.0*-3.0142279906073903 + 0.0*1.0 + 6.874286004696305)
    o = 0.688885 (o = tanh(0.8458300234815246))
    loss = 0.237281 (Loss = 0.5 * (0.6888846949435228 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.723932 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.361966 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.021467 (w1 = w1 - lr * w1.grad = -3.0214673128405685 - 0.01 * 0.7239322233178187)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.870666 (b = b - lr * b.grad = 6.870666343579716 - 0.01 * 0.36196611165890935)
  updated parameters:
    w1 = -3.021467, w2 = 1.000000, b = 6.870666

jpeg

EPOCH 4
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.021467, w2 = 1.000000, b = 6.870666
STEP 1: FORWARD PASS
  forward pass:
    n = 0.827732 (n = 2.0*-3.0214673128405685 + 0.0*1.0 + 6.870666343579716)
    o = 0.679256 (o = tanh(0.8277317178985788))
    loss = 0.230694 (Loss = 0.5 * (0.6792561658373667 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.731710 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.365855 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.028784 (w1 = w1 - lr * w1.grad = -3.0287844105263533 - 0.01 * 0.7317097685784673)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.867008 (b = b - lr * b.grad = 6.867007794736823 - 0.01 * 0.36585488428923363)
  updated parameters:
    w1 = -3.028784, w2 = 1.000000, b = 6.867008

jpeg

EPOCH 5
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.028784, w2 = 1.000000, b = 6.867008
STEP 1: FORWARD PASS
  forward pass:
    n = 0.809439 (n = 2.0*-3.0287844105263533 + 0.0*1.0 + 6.867007794736823)
    o = 0.669281 (o = tanh(0.8094389736841165))
    loss = 0.223968 (Loss = 0.5 * (0.6692806538042507 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.738971 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.369485 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.036174 (w1 = w1 - lr * w1.grad = -3.0361741176784696 - 0.01 * 0.7389707152116205)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.863313 (b = b - lr * b.grad = 6.863312941160765 - 0.01 * 0.36948535760581025)
  updated parameters:
    w1 = -3.036174, w2 = 1.000000, b = 6.863313

jpeg

EPOCH 6
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.036174, w2 = 1.000000, b = 6.863313
STEP 1: FORWARD PASS
  forward pass:
    n = 0.790965 (n = 2.0*-3.0361741176784696 + 0.0*1.0 + 6.863312941160765)
    o = 0.658955 (o = tanh(0.7909647058038258))
    loss = 0.217111 (Loss = 0.5 * (0.6589551923491251 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.745645 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.372822 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.043631 (w1 = w1 - lr * w1.grad = -3.0436305654127542 - 0.01 * 0.7456447734284608)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.859585 (b = b - lr * b.grad = 6.859584717293623 - 0.01 * 0.3728223867142304)
  updated parameters:
    w1 = -3.043631, w2 = 1.000000, b = 6.859585

jpeg

EPOCH 7
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.043631, w2 = 1.000000, b = 6.859585
STEP 1: FORWARD PASS
  forward pass:
    n = 0.772324 (n = 2.0*-3.0436305654127542 + 0.0*1.0 + 6.859584717293623)
    o = 0.648279 (o = tanh(0.7723235864681142))
    loss = 0.210133 (Loss = 0.5 * (0.6482785444319854 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.751659 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.375830 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.051147 (w1 = w1 - lr * w1.grad = -3.051147159729109 - 0.01 * 0.7516594316354793)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.855826 (b = b - lr * b.grad = 6.855826420135445 - 0.01 * 0.37582971581773966)
  updated parameters:
    w1 = -3.051147, w2 = 1.000000, b = 6.855826

jpeg

EPOCH 8
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.051147, w2 = 1.000000, b = 6.855826
STEP 1: FORWARD PASS
  forward pass:
    n = 0.753532 (n = 2.0*-3.051147159729109 + 0.0*1.0 + 6.855826420135445)
    o = 0.637251 (o = tanh(0.7535321006772273))
    loss = 0.203045 (Loss = 0.5 * (0.6372514280663818 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.756941 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.378470 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.058717 (w1 = w1 - lr * w1.grad = -3.0587165675110963 - 0.01 * 0.7569407781987396)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.852042 (b = b - lr * b.grad = 6.852041716244451 - 0.01 * 0.3784703890993698)
  updated parameters:
    w1 = -3.058717, w2 = 1.000000, b = 6.852042

jpeg

EPOCH 9
-------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.058717, w2 = 1.000000, b = 6.852042
STEP 1: FORWARD PASS
  forward pass:
    n = 0.734609 (n = 2.0*-3.0587165675110963 + 0.0*1.0 + 6.852041716244451)
    o = 0.625877 (o = tanh(0.7346085812222585))
    loss = 0.195861 (Loss = 0.5 * (0.6258767388376627 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.761414 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.380707 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.066331 (w1 = w1 - lr * w1.grad = -3.0663307123827015 - 0.01 * 0.7614144871604955)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.848235 (b = b - lr * b.grad = 6.848234643808649 - 0.01 * 0.38070724358024777)
  updated parameters:
    w1 = -3.066331, w2 = 1.000000, b = 6.848235

jpeg

EPOCH 10
--------
STEP 0: INITIAL VALUES
  initial values:
    x1 = 2.0, x2 = 0.0
    w1 = -3.066331, w2 = 1.000000, b = 6.848235
STEP 1: FORWARD PASS
  forward pass:
    n = 0.715573 (n = 2.0*-3.0663307123827015 + 0.0*1.0 + 6.848234643808649)
    o = 0.614160 (o = tanh(0.7155732190432458))
    loss = 0.188596 (Loss = 0.5 * (0.6141597625050071 - 0.0)^2)
STEP 2: BACK PROPAGATION
  gradients (calculated during backward pass):
    w1.grad = 0.765007 (gradient of loss w.r.t w1: dL/dw1 = x1 * (o - target) * (1 - o^2))
    w2.grad = 0.000000 (gradient of loss w.r.t w2: dL/dw2 = x2 * (o - target) * (1 - o^2))
    b.grad = 0.382503 (gradient of loss w.r.t b: dL/db = (o - target) * (1 - o^2))
STEP 3: OPTIMIZATION / GRADIENT DESCENT
    updated w1 = -3.073981 (w1 = w1 - lr * w1.grad = -3.0739807820228937 - 0.01 * 0.765006964019203)
    updated w2 = 1.000000 (w2 = w2 - lr * w2.grad = 1.0 - 0.01 * 0.0)
    updated b = 6.844410 (b = b - lr * b.grad = 6.844409608988553 - 0.01 * 0.3825034820096015)
  updated parameters:
    w1 = -3.073981, w2 = 1.000000, b = 6.844410

Final Output

# display the corresponding network diagram image
display(Image(filename=f"ae10.jpg"))

# final Output
print("\nFinal Output:")
print(f"  Final n = {n.item():.6f}")
print(f"  Final o = {o.item():.6f} (o = tanh({n.item()}))")
print(f"  Final Loss = {loss.item():.6f}")
print(f"Final Parameters:")
print(f"  w1 = {w1.item():.6f}, w2 = {w2.item():.6f}, b = {b.item():.6f}")

jpeg

Final Output:
  Final n = 0.715573
  Final o = 0.614160 (o = tanh(0.7155732190432458))
  Final Loss = 0.188596
Final Parameters:
  w1 = -3.073981, w2 = 1.000000, b = 6.844410

About

A step-by-step walkthrough of the inner workings of a simple neural network. The goal is to demystify the calculations behind neural networks by breaking them down into understandable components, including forward propagation, backpropagation, gradient calculations, and parameter updates.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published