NN

Neural Networks

Origins

Neural networks (NNs) are algorithms designed to mimic the brain.
Popular in the 1980s and early 1990s, but popularity waned in the late 1990s.
Recent resurgence as a state-of-the-art technique in various applications.
Artificial neural networks are far simpler than the brain’s structure.

The Brain

Composition: Networks of neurons.
Function: Brain activity occurs due to neuron firing.
Neurons:
- Connect through synapses, propagating action potentials (electrical impulses).
- Synapses release neurotransmitters, which can be:
  - Excitatory: Increase potential.
  - Inhibitory: Decrease potential.
- Learning: Synapses exhibit plasticity, enabling long-term changes in connection strength.
Scale:
- ~10¹¹ neurons.
- ~10¹⁴ synapses.

Neural Networks and the Brain

Neural networks consist of computational models of neurons called perceptrons.

The Perceptron

A threshold unit:
- Fires if the weighted sum of inputs exceeds a threshold.
- Analogous to a threshold gate in Boolean circuits.

graph TD
    Input1 -->|Weight1| Perceptron
    Input2 -->|Weight2| Perceptron
    Perceptron --> Output

Soft Perceptron (Logistic)

Replaces threshold with a sigmoid activation function.
- A “squashing” function that produces a continuous output.

Structure of Neural Networks

Composed of nodes (units) connected by links.
- Each link has a weight and activation level.
- Each node has:
  - Input function: Summing weighted inputs.
  - Activation function: Transforms the input value.
  - Output.

Multi-Layer Perceptron

Feed-Forward Process:
1. Input layer units are activated by external stimuli (e.g., sensors).
2. The input function computes input values by summing weighted activations.
3. Activation function applies a non-linear transformation (e.g., sigmoid).
Inputs: Real or Boolean.
Outputs: Real or Boolean; can have multiple outputs per input.

graph LR
    Input1 --> Hidden1
    Input2 --> Hidden1
    Hidden1 --> Output
    Hidden1 --> Hidden2
    Hidden2 --> Output

Summary

Neural networks are inspired by biological neurons but are far less complex.
Their structure and functionality involve input, activation, and output, mimicking a simplified version of brain processes.
Feed-forward networks like multi-layer perceptrons are foundational to many applications.

Slides Ex.2

Error Function:

$E = \frac{1}{2} (y - o u t)^{2}$

1. Gradients for Output Layer Weights ( $W_{5}, W_{6}$ ):

Let’s first compute the error gradient for $W_{5}$ :

Step 1: Compute $\frac{\partial E}{\partial o u t}$ :

$\frac{\partial E}{\partial o u t} = - (y - o u t)$

Step 2: Derivative of output activation:

For sigmoid:

$\large out = f(z) = \frac{1}{1 + e^{-z}}$$$$\large f'(z) = out(1 - out)$

$z = h_{1} W_{5} + h_{2} W_{6} + b_{3}$

$\frac{\partial o u t}{\partial z} = o u t (1 - o u t)$

Step 3: Chain rule application:

For $W_{5}$ :

$\frac{\partial E}{\partial W _{5}} = \frac{\partial E}{\partial o u t} \cdot \frac{\partial o u t}{\partial z} \cdot \frac{\partial z}{\partial W _{5}}$

Now substituting values:

$\frac{\partial z}{\partial W _{5}} = h_{1}$
$\frac{\partial o u t}{\partial z} = o u t (1 - o u t)$

$\frac{\partial E}{\partial W _{5}} = - (y - o u t) (o u t) (1 - o u t) (h_{1})$

Similarly, for $W_{6}$ :

$\frac{\partial E}{\partial W _{6}} = - (y - o u t) (o u t) (1 - o u t) (h_{2})$

2. Gradients for Hidden Layer Weights ( $W_{1}, W_{2}, W_{3}, W_{4}$ ):

For $W_{1}$ (connected to $h_{1}$ ):

We need to backpropagate through $h_{1}$ :

$\frac{\partial E}{\partial W _{1}} = \frac{\partial E}{\partial o u t} \cdot \frac{\partial o u t}{\partial z} \cdot \frac{\partial z}{\partial h _{1}} \cdot \frac{\partial h _{1}}{\partial z _{1}} \cdot \frac{\partial z _{1}}{\partial W _{1}}$

Breaking it down:

From earlier:

$\frac{\partial E}{\partial o u t} = - (y - o u t), \frac{\partial o u t}{\partial z} = o u t (1 - o u t)$

From hidden layer $h_{1}$ :

$\frac{\partial h _{1}}{\partial z _{1}} = h_{1} (1 - h_{1})$

Weight contribution:

$\frac{\partial z _{1}}{\partial W _{1}} = X_{1}$

Combining:

$\frac{\partial E}{\partial W _{1}} = - (y - o u t) (o u t) (1 - o u t) (W_{5}) (h_{1}) (1 - h_{1}) (X_{1})$

Similarly, for $W_{2}$ :

$\frac{\partial E}{\partial W _{2}} = - (y - o u t) (o u t) (1 - o u t) (W_{5}) (h_{1}) (1 - h_{1}) (X_{2})$

3. Final Gradient Formulas:

Output Layer:

$\large \frac{\partial E}{\partial W_5} = -(y - out)(out)(1 - out)(h_1)$$$$\large \frac{\partial E}{\partial W_6} = -(y - out)(out)(1 - out)(h_2)$

Hidden Layer:

$\large \frac{\partial E}{\partial W_1} = -(y - out)(out)(1 - out)(W_5)(h_1)(1 - h_1)(X_1)$$$$\large \frac{\partial E}{\partial W_2} = -(y - out)(out)(1 - out)(W_5)(h_1)(1 - h_1)(X_2)$$$$\large \frac{\partial E}{\partial W_3} = -(y - out)(out)(1 - out)(W_6)(h_2)(1 - h_2)(X_1)$$$$\large \frac{\partial E}{\partial W_4} = -(y - out)(out)(1 - out)(W_6)(h_2)(1 - h_2)(X_2)$

Doc Hudson Last Race 🏎️

Explorer

NN

Neural Networks

Origins

The Brain

Neural Networks and the Brain

The Perceptron

Soft Perceptron (Logistic)

Structure of Neural Networks

Multi-Layer Perceptron

Summary

Slides Ex.2

Error Function:

1. Gradients for Output Layer Weights ( $W_{5}, W_{6}$ ):

Step 1: Compute $\frac{\partial E}{\partial o u t}$ :

Step 2: Derivative of output activation:

Step 3: Chain rule application:

2. Gradients for Hidden Layer Weights ( $W_{1}, W_{2}, W_{3}, W_{4}$ ):

For $W_{1}$ (connected to $h_{1}$ ):

3. Final Gradient Formulas:

Graph View

Table of Contents

Backlinks

Doc Hudson Last Race 🏎️

Explorer

NN

Neural Networks

Origins

The Brain

Neural Networks and the Brain

The Perceptron

Soft Perceptron (Logistic)

Structure of Neural Networks

Multi-Layer Perceptron

Summary

Slides Ex.2

Error Function:

1. Gradients for Output Layer Weights (W5​,W6​):

Step 1: Compute ∂out∂E​:

Step 2: Derivative of output activation:

Step 3: Chain rule application:

2. Gradients for Hidden Layer Weights (W1​,W2​,W3​,W4​):

For W1​ (connected to h1​):

3. Final Gradient Formulas:

Graph View

Table of Contents

Backlinks

1. Gradients for Output Layer Weights ( $W_{5}, W_{6}$ ):

Step 1: Compute $\frac{\partial E}{\partial o u t}$ :

2. Gradients for Hidden Layer Weights ( $W_{1}, W_{2}, W_{3}, W_{4}$ ):

For $W_{1}$ (connected to $h_{1}$ ):