Why do we add a 'Bias' to the weighted sum?

Think of bias like the y-intercept (the 'b' in y = mx + b). It allows the activation function to shift left or right. Without a bias, the activation function would always have to pass exactly through the origin (0,0), which severely limits the model's ability to fit the data.

What is the 'Dying ReLU' problem?

Because ReLU outputs exactly zero for any negative input, a large negative bias can cause a neuron to constantly output zero. When it outputs zero, its gradient is zero, meaning it can never learn or recover. It becomes a 'dead' neuron. Variants like Leaky ReLU (which allows a tiny negative slope) are used to fix this.

Should I use Sigmoid in the hidden layers?

No. You should almost always use ReLU (or a variant like GELU) in the hidden layers of a deep network. Sigmoid is generally reserved strictly for the final output layer of a binary classification model to turn the final raw score into a clean 0-to-1 probability.

HTML MASTER CLASS /// LEARN TAGS /// BUILD STRUCTURE /// SEMANTIC WEB /// HTML MASTER CLASS /// LEARN TAGS ///

⚡ Total XP: 0|💻 artificialintelligence XP: 0

Perceptrons & Activation in AI & Artificial Intelligence

Learn about Perceptrons & Activation in this comprehensive AI & Artificial Intelligence tutorial. Master the mechanics of a single neuron. Understand how weights and biases form the weighted sum, and why non-linear functions like ReLU and Sigmoid are essential for building deep learning models that actually learn.

LOADING ENGINE...

Skill Matrix

UNLOCK NODES BY LEARNING NEW TAGS.

Perceptron Hub

The unit logic.

Quick Quiz //

What is the primary reason we use Activation Functions in a neural network?

A neural network is only as powerful as its individual units. The Perceptron provides the structure, and Activation Functions provide the intelligence.

1The Building Block

Every massive neural network—from ChatGPT to image generators—is constructed from billions of tiny, identical units called Perceptrons (or Artificial Neurons).

Inspired by biological neurons in the human brain, a perceptron takes in multiple numerical inputs, processes them, and produces a single output signal. It acts as a micro-decision maker. By chaining millions of these simple decisions together, a network can exhibit incredibly complex, 'intelligent' behavior.

editor.html

"""
[Input 1] --\
[Input 2] ----> [Perceptron] ---> [Output]
[Input 3] --/
"""

localhost:3000

2The Weighted Sum

Inside the perceptron, the first step is calculating the Weighted Sum.

Every input has an associated 'Weight' that determines its importance. For example, if you're predicting house prices, the 'square footage' input will have a much higher weight than the 'color of the front door'. The perceptron multiplies every input by its weight, adds them all together, and then adds a 'Bias' (a constant baseline). Mathematically, this is just a dot product.

editor.html

import numpy as np

def weighted_sum(inputs, weights, bias):
    # Z = (Input * Weight) + Bias
    return np.dot(inputs, weights) + bias

localhost:3000

3The Need for Non-Linearity

If all we do is calculate a weighted sum, our neural network is just performing Linear Regression. No matter how many layers you add, a linear equation inside a linear equation is still just a straight line.

To solve real-world problems—like distinguishing between a picture of a dog and a cat—we need our model to learn complex, curved, non-linear boundaries. We achieve this by passing the weighted sum through an Activation Function.

editor.html

# Linear + Linear = Still Linear
# Linear + Non-Linear = COMPLEX PATTERNS

# Activation functions provide the 'curve'.

localhost:3000

4The Sigmoid Function

Historically, the Sigmoid function was the most popular activation function.

Sigmoid takes any number (from negative infinity to positive infinity) and squashes it into a tight range between 0 and 1. This creates a smooth 'S-shaped' curve. Because its output is between 0 and 1, Sigmoid is perfectly suited for outputting *probabilities*. However, it suffers from a fatal flaw in deep networks: the 'Vanishing Gradient' problem, where learning slows to a halt.

editor.html

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

# Used primarily in the FINAL layer
# for binary classification (Yes/No).

localhost:3000

5ReLU: The Modern Standard

Today, the default activation function for the hidden layers of a neural network is ReLU (Rectified Linear Unit).

ReLU is incredibly simple: if the input is positive, it passes it through unchanged. If the input is negative, it outputs zero. Despite its simplicity, this 'bend' at zero provides all the non-linearity a network needs. Furthermore, because its slope is always exactly 1 (for positive numbers) or 0 (for negative numbers), it completely solves the vanishing gradient problem and makes training blisteringly fast.

editor.html

def relu(x):
    return np.maximum(0, x)

# Input: -5 -> Output: 0
# Input: 10 -> Output: 10

localhost:3000

?Frequently Asked Questions

Pascual Vila

Frontend Instructor // Code Syllabus

Lesson Glossary

[01]Perceptron

The simplest type of artificial neuron, which computes a weighted sum of its inputs.

Code Preview

The Base Unit

[02]Activation Function

A mathematical function that determines if a neuron should 'fire' by introducing non-linearity.

Code Preview

Logic Gate

[03]Sigmoid

An activation function that maps values to a range between 0 and 1; ideal for probabilities.

Code Preview

1 / (1 + e^-x)

[04]ReLU

Rectified Linear Unit: Outputs the input if it is positive, otherwise outputs zero.

Code Preview

max(0, x)

[05]Dot Product

The sum of the products of the corresponding entries of two sequences of numbers (Inputs * Weights).

Code Preview

np.dot(X, W)

Continue Learning

Foundations

Loss Functions and Optimizers (Adam, SGD)

Read lesson→

Foundations

Dimensionality Reduction (PCA)

Read lesson→

Foundations

Prompt Engineering Strategies

Read lesson→

Foundations

Forward Propagation vs. Backpropagation

Read lesson→

Foundations

Using OpenAI / Anthropic APIs

Read lesson→

Foundations

Data Cleaning and Handling Missing Values

Read lesson→

Skill Matrix

Perceptron Hub

Interactive Challenges

1The Building Block

2The Weighted Sum

3The Need for Non-Linearity

4The Sigmoid Function

5ReLU: The Modern Standard

?Frequently Asked Questions

Lesson Glossary

[01]Perceptron

[02]Activation Function

[03]Sigmoid

[04]ReLU

[05]Dot Product

Continue Learning

Article Contents