Logistic Regression is the foundational algorithm for binary classification. It transforms linear outputs into probabilities using the power of the Sigmoid function.
1The Classification Engine
Despite its confusing name, Logistic Regression is strictly a classification algorithm, not a regression algorithm. We use it when we want to predict a probability between 0 and 1, such as whether a user will click an ad, whether an email is spam, or whether a transaction is fraudulent.
It takes the core mathematics of linear regression and adapts them to answer 'Yes/No' questions rather than predicting continuous quantities.
from sklearn.linear_model import LogisticRegression
# Initialize the classification engine
model = LogisticRegression()
print("Ready for binary classification.")2The Sigmoid Function
The secret mathematical sauce of Logistic Regression is the Sigmoid Function. Linear regression can output any number from negative infinity to positive infinity. Sigmoid takes that raw number and squashes it into a strict range between 0 and 1.
This squashing creates an 'S-Curve'. Because the output is bounded between 0 and 1, we can easily interpret it as a probability. A massive positive number becomes 0.999, and a massive negative number becomes 0.001.
import numpy as np
def sigmoid(z):
return 1 / (1 + np.exp(-z))
# Any input is squashed to a probability3Decision Thresholds
Once we have our probability, we need to make a final decision. We do this using a Decision Threshold, which is typically set at 0.5.
If the model predicts a probability greater than or equal to 0.5, we assign it to Class 1 (e.g., 'Spam'). If it's less than 0.5, we assign it to Class 0 ('Not Spam'). In high-stakes environments like medicine, you might adjust this threshold to be more conservative.
model.fit(X_train, y_train)
# Get raw probabilities instead of classes
probs = model.predict_proba(X_test)
# e.g., [[0.08, 0.92], [0.85, 0.15]]4Log Loss (Cross-Entropy)
Linear Regression evaluates its mistakes using Mean Squared Error. Logistic Regression uses Log Loss (also known as Cross-Entropy).
Log Loss penalizes the model based on its confidence. If the actual answer is 1, and the model confidently predicted 0.001, the penalty is massive. If it predicted 0.49, the penalty is much smaller. The model learns by minimizing this loss function over thousands of iterations.
# Log Loss Concept:
# If actual is 1 but predicted 0.001,
# the penalty is massive due to the Log curve.
# Goal: Minimize Log Loss.5The Confusion Matrix
To evaluate how well our classification model performs in the real world, we use a Confusion Matrix. This breaks down our predictions into four distinct categories.
It shows True Positives (correctly identified Spam) and True Negatives (correctly identified Not Spam). Crucially, it highlights the errors: False Positives (flagging a normal email as Spam) and False Negatives (letting a Spam email through). Understanding these trade-offs is essential for deploying ML safely.
from sklearn.metrics import confusion_matrix
# Prints a 2x2 matrix:
# [True Negatives, False Positives]
# [False Negatives, True Positives]
print(confusion_matrix(y_test, y_pred))