A neural network is only as powerful as its individual units. The Perceptron provides the structure, and Activation Functions provide the intelligence.
1The Building Block
Every massive neural networkāfrom ChatGPT to image generatorsāis constructed from billions of tiny, identical units called Perceptrons (or Artificial Neurons).
Inspired by biological neurons in the human brain, a perceptron takes in multiple numerical inputs, processes them, and produces a single output signal. It acts as a micro-decision maker. By chaining millions of these simple decisions together, a network can exhibit incredibly complex, 'intelligent' behavior.
"""
[Input 1] --\
[Input 2] ----> [Perceptron] ---> [Output]
[Input 3] --/
"""2The Weighted Sum
Inside the perceptron, the first step is calculating the Weighted Sum.
Every input has an associated 'Weight' that determines its importance. For example, if you're predicting house prices, the 'square footage' input will have a much higher weight than the 'color of the front door'. The perceptron multiplies every input by its weight, adds them all together, and then adds a 'Bias' (a constant baseline). Mathematically, this is just a dot product.
import numpy as np
def weighted_sum(inputs, weights, bias):
# Z = (Input * Weight) + Bias
return np.dot(inputs, weights) + bias3The Need for Non-Linearity
If all we do is calculate a weighted sum, our neural network is just performing Linear Regression. No matter how many layers you add, a linear equation inside a linear equation is still just a straight line.
To solve real-world problemsālike distinguishing between a picture of a dog and a catāwe need our model to learn complex, curved, non-linear boundaries. We achieve this by passing the weighted sum through an Activation Function.
# Linear + Linear = Still Linear
# Linear + Non-Linear = COMPLEX PATTERNS
# Activation functions provide the 'curve'.4The Sigmoid Function
Historically, the Sigmoid function was the most popular activation function.
Sigmoid takes any number (from negative infinity to positive infinity) and squashes it into a tight range between 0 and 1. This creates a smooth 'S-shaped' curve. Because its output is between 0 and 1, Sigmoid is perfectly suited for outputting *probabilities*. However, it suffers from a fatal flaw in deep networks: the 'Vanishing Gradient' problem, where learning slows to a halt.
def sigmoid(x):
return 1 / (1 + np.exp(-x))
# Used primarily in the FINAL layer
# for binary classification (Yes/No).5ReLU: The Modern Standard
Today, the default activation function for the hidden layers of a neural network is ReLU (Rectified Linear Unit).
ReLU is incredibly simple: if the input is positive, it passes it through unchanged. If the input is negative, it outputs zero. Despite its simplicity, this 'bend' at zero provides all the non-linearity a network needs. Furthermore, because its slope is always exactly 1 (for positive numbers) or 0 (for negative numbers), it completely solves the vanishing gradient problem and makes training blisteringly fast.
def relu(x):
return np.maximum(0, x)
# Input: -5 -> Output: 0
# Input: 10 -> Output: 10