Logistic Regression

Logistic Regression: The Art of Classification

Pascual Vila

Lead AI Instructor // Code Syllabus

When Linear Regression fails to capture binary outcomes, Logistic Regression steps in. By mapping continuous inputs to strict probability boundaries, it becomes the backbone of modern categorization tasks.

Why Not Linear Regression?

Linear Regression attempts to fit a straight line through data. This is great for predicting a continuous variable (like a house's price). However, if you want to classify an email as Spam (1) or Not Spam (0), a straight line will generate predictions less than 0 or greater than 1, making no probabilistic sense.

The Sigmoid Function

To fix this, Logistic Regression wraps the linear equation in the Sigmoid Function (or Logistic Function). This creates an S-shaped curve that squashes any real number down into a valid probability range between 0 and 1.

$ \sigma(z) = \frac{1}{1 + e ^ {- z}} $

[Image of sigmoid function graph]

The Decision Boundary

Once the Sigmoid function outputs a probability (e.g., 0.82), we apply a threshold. By default, Scikit-Learn uses a threshold of 0.5.

If P(y=1|X) ≥ 0.5: Classify as 1 (Positive class).
If P(y=1|X) < 0.5: Classify as 0 (Negative class).

View Implementation Tip+

Always scale your data. Because Logistic Regression uses regularization by default in Scikit-Learn (via the `C` parameter), features that are on wildly different scales can negatively impact the model. Always use `StandardScaler` on your `X_train` data before applying `.fit()`.

❓ Frequently Asked Questions

What is the difference between Linear and Logistic Regression?

Linear Regression: Used for regression problems where the output is a continuous number (e.g., predicting temperature or price). It fits a straight line.

Logistic Regression: Used for classification problems where the output is categorical (e.g., True/False, Dog/Cat). It fits an S-curve to output probabilities.

Is Logistic Regression a classification or regression algorithm?

Despite the word "regression" in its name, Logistic Regression is fundamentally a classification algorithm. It is used to categorize data into discrete classes (most commonly binary classes).

How do I view the actual probabilities in Scikit-Learn?

While `model.predict(X_test)` returns the final class labels (0 or 1), you can use `model.predict_proba(X_test)` to view the underlying probability percentages that the model calculated before applying the 0.5 threshold.

Architecture Glossary

Binary Classification

A predictive modeling task where the algorithm categorizes data into exactly two discrete classes (e.g., Yes/No, 1/0).

snippet.py

Sigmoid Function

A mathematical function having a characteristic S-shaped curve that maps any real value into a number between 0 and 1.

snippet.py

LogisticRegression()

The Scikit-Learn class used to instantiate a logistic regression model.

snippet.py

model.fit()

The method used to train the model, optimizing weights based on the provided features and labels.

snippet.py

model.predict()

Generates class predictions (e.g., strictly 0 or 1) for the input samples.

snippet.py

model.predict_proba()

Returns the raw probability estimates for all classes rather than the hard classification label.

snippet.py

Logistic Regression

Architecture Matrix

Concept: Binary Classes

Validation Split

Model Tuning Challenges

Community Tensor-Net

Share Your Models