MACHINE LEARNING /// CONFUSION MATRIX /// PRECISION /// RECALL /// F1 SCORE /// SCIKIT-LEARN ///

EVALUATE MODELS

Don't be fooled by high accuracy. Understand where your algorithm fails by mastering the Confusion Matrix, Precision, and Recall.

metrics.py
1 / 10
12345
🤖

Tutor:Is 99% accuracy always good? Not if 99% of emails are normal and you missed the 1% that are phishing! Welcome to the Confusion Matrix.


Evaluation Matrix

UNLOCK NODES BY MASTERING METRICS.

The Matrix Grid

The core component mapping Actual Truths against Model Predictions.

Evaluation Check

Which type of error occurs when a spam filter sends a crucial job offer email to the spam folder?


Community Holo-Net

Share Your Models

ACTIVE

Struggling with imbalanced datasets? Share your confusion matrices and get feedback from fellow Data Scientists!

Confusion Matrix: Looking Beyond Accuracy

Accuracy is a vanity metric. If you are predicting rare events like fraud or terminal illness, evaluating your model based purely on accuracy can lead to disastrous real-world outcomes.

Anatomy of the Matrix

A Confusion Matrix is an N x N grid used for evaluating the performance of a classification model, where N is the number of target classes. For binary classification, it splits predictions into four distinct quadrants:

  • True Positives (TP): The model predicted 'Yes', and the actual label was 'Yes'.
  • True Negatives (TN): The model predicted 'No', and the actual label was 'No'.
  • False Positives (FP): The model predicted 'Yes', but the actual label was 'No' (Type I Error).
  • False Negatives (FN): The model predicted 'No', but the actual label was 'Yes' (Type II Error).

The Big Three: Precision, Recall, and F1

Using the four quadrants, we derive deeper metrics that tell us exactly *how* the model is failing.

Precision (Quality)

Measures the accuracy of positive predictions. Optimize this when False Positives are costly (e.g., falsely flagging a normal email as spam).

$Precision = \frac{TP}{TP + FP}$

Recall / Sensitivity (Quantity)

Measures the ability to find all positive instances. Optimize this when False Negatives are costly (e.g., missing a cancerous tumor).

$Recall = \frac{TP}{TP + FN}$

F1-Score (Balance)

The harmonic mean of Precision and Recall. It is the go-to metric when dealing with imbalanced datasets.

$F1 = 2 \times \frac{Precision \times Recall}{Precision + Recall}$

Frequently Asked Questions (ML Evaluation)

Why is accuracy a bad metric for imbalanced datasets?

If your dataset has 990 healthy patients and 10 sick patients, a "dumb" model that predicts everyone is healthy will achieve 99% accuracy. However, it fails entirely at its core task (finding sick patients). In this scenario, evaluating the model via a Confusion Matrix to observe Recall and Precision is absolutely necessary.

When should I prioritize Recall over Precision?

Prioritize Recall (minimizing False Negatives) in life-or-death, security, or safety scenarios. For example, in cancer screening, it is better to have a False Positive (causing the patient to get a secondary checkup) than a False Negative (sending a sick patient home undiagnosed).

What does the classification_report in Scikit-Learn do?

The classification_report() function builds a text report showing the main classification metrics (Precision, Recall, F1-Score, and Support) for each distinct class in your dataset, offering a holistic view of where your model excels or struggles.

Metrics Glossary

True Positive (TP)
Model correctly predicted the positive class.
concept.py
False Positive (FP)
Model incorrectly predicted the positive class (Type I Error).
concept.py
False Negative (FN)
Model incorrectly predicted the negative class (Type II Error).
concept.py
Precision
Proportion of positive identifications that were actually correct.
concept.py
Recall
Proportion of actual positives that were correctly identified.
concept.py
F1-Score
Harmonic mean of precision and recall. Best for imbalanced classes.
concept.py