LOGISTIC REGRESSION /// CLASSIFICATION /// SIGMOID /// SCIKIT-LEARN /// LOGISTIC REGRESSION /// CLASSIFICATION /// SIGMOID ///

Logistic Regression

Categorize the world. Master binary classification, the Sigmoid function, and Scikit-Learn implementation to build powerful predictive models.

model_training.py
1 / 9
12345
🤖

Lead Dev:Linear Regression predicts continuous numbers. But what if we want to predict a category? E.g., Spam or Not Spam? Enter Logistic Regression.


Architecture Matrix

UNLOCK NODES BY DECREASING LOSS.

Concept: Binary Classes

Unlike continuous regression, classification outputs discrete values (e.g., 0 or 1, True or False).

Validation Split

Which of these problems is a binary classification task?


Community Tensor-Net

Share Your Models

ONLINE

Built a high-accuracy classification model? Share your Jupyter notebooks and get peer reviews!

Logistic Regression: The Art of Classification

Author

Pascual Vila

Lead AI Instructor // Code Syllabus

When Linear Regression fails to capture binary outcomes, Logistic Regression steps in. By mapping continuous inputs to strict probability boundaries, it becomes the backbone of modern categorization tasks.

Why Not Linear Regression?

Linear Regression attempts to fit a straight line through data. This is great for predicting a continuous variable (like a house's price). However, if you want to classify an email as Spam (1) or Not Spam (0), a straight line will generate predictions less than 0 or greater than 1, making no probabilistic sense.

The Sigmoid Function

To fix this, Logistic Regression wraps the linear equation in the Sigmoid Function (or Logistic Function). This creates an S-shaped curve that squashes any real number down into a valid probability range between 0 and 1.

$ \sigma(z) = \frac{1}{1 + e ^ {- z}} $
[Image of sigmoid function graph]

The Decision Boundary

Once the Sigmoid function outputs a probability (e.g., 0.82), we apply a threshold. By default, Scikit-Learn uses a threshold of 0.5.

  • If P(y=1|X) ≥ 0.5: Classify as 1 (Positive class).
  • If P(y=1|X) < 0.5: Classify as 0 (Negative class).
View Implementation Tip+

Always scale your data. Because Logistic Regression uses regularization by default in Scikit-Learn (via the `C` parameter), features that are on wildly different scales can negatively impact the model. Always use `StandardScaler` on your `X_train` data before applying `.fit()`.

Frequently Asked Questions

What is the difference between Linear and Logistic Regression?

Linear Regression: Used for regression problems where the output is a continuous number (e.g., predicting temperature or price). It fits a straight line.

Logistic Regression: Used for classification problems where the output is categorical (e.g., True/False, Dog/Cat). It fits an S-curve to output probabilities.

Is Logistic Regression a classification or regression algorithm?

Despite the word "regression" in its name, Logistic Regression is fundamentally a classification algorithm. It is used to categorize data into discrete classes (most commonly binary classes).

How do I view the actual probabilities in Scikit-Learn?

While `model.predict(X_test)` returns the final class labels (0 or 1), you can use `model.predict_proba(X_test)` to view the underlying probability percentages that the model calculated before applying the 0.5 threshold.

Architecture Glossary

Binary Classification
A predictive modeling task where the algorithm categorizes data into exactly two discrete classes (e.g., Yes/No, 1/0).
snippet.py
Sigmoid Function
A mathematical function having a characteristic S-shaped curve that maps any real value into a number between 0 and 1.
snippet.py
LogisticRegression()
The Scikit-Learn class used to instantiate a logistic regression model.
snippet.py
model.fit()
The method used to train the model, optimizing weights based on the provided features and labels.
snippet.py
model.predict()
Generates class predictions (e.g., strictly 0 or 1) for the input samples.
snippet.py
model.predict_proba()
Returns the raw probability estimates for all classes rather than the hard classification label.
snippet.py