Polynomial Regression: Capturing the Curve

AI Faculty
Lead Data Scientist // Build Apps with AI
Not everything is a straight line. By adding polynomial features, we can use the exact same Linear Regression math to fit complex, curving datasets. It is the bridge between simple linearity and deep complexity.
The Math Intuition
A simple linear regression equation looks like this: y = ΞΈβ + ΞΈβxβ
If our data curves, this straight line will result in high errors (underfitting). Instead of abandoning the linear model, we engineer new features. We add powers of our original feature:
Because the coefficients ($ \theta $) are still linear, the model remains a Linear Regression model, it is just operating in a higher-dimensional feature space!
Implementation in Scikit-Learn
We do not calculate squares and cubes manually. Scikit-Learn provides a preprocessing class called PolynomialFeatures.
from sklearn.preprocessing import PolynomialFeatures from sklearn.linear_model import LinearRegression # 1. Instantiate the transformer poly = PolynomialFeatures(degree=2, include_bias=False) # 2. Transform the original X into X_poly X_poly = poly.fit_transform(X) # 3. Fit a normal Linear Regression on the new data model = LinearRegression() model.fit(X_poly, y)
β Model Architecture FAQ
Is Polynomial Regression a Linear or Non-Linear model?
It is a Linear model. The term "Linear" refers to the model's coefficients (weights), not the features. Since the equation is a linear combination of the coefficients, it solves exactly the same way under the hood.
How do I choose the correct 'degree'?
Choosing the degree is a classic bias-variance tradeoff.
- Degree 1: Simple straight line (High Bias / Underfitting).
- Degree 2 or 3: Captures generic curves well.
- Degree 10+: The curve will touch every single training point but fail completely on new data (High Variance / Overfitting). Use cross-validation to find the sweet spot!