Hyperparameter Tuning & Grid Search
"A machine learning model without tuned hyperparameters is like a high-performance sports car running on regular fuel. It works, but it's far from optimal."
Parameters vs. Hyperparameters
In machine learning, parameters are learned automatically by the algorithm during the training process (like the slope and intercept in linear regression). Hyperparameters, on the other hand, are the structural settings of the model that you, the engineer, must specify before training begins (such as the depth of a decision tree or the learning rate).
What is Grid Search?
GridSearchCV (Grid Search Cross-Validation) is an algorithmic tool provided by Scikit-Learn that allows you to specify a dictionary of hyperparameters and the values you want to test. The algorithm will methodically build and evaluate a model for every possible combination of those values to find the absolute best performer.
The Role of Cross-Validation
Tuning a model on your training set directly often leads to overfittingβthe model memorizes the data instead of learning general patterns. Grid Search uses Cross-Validation (the 'CV' in GridSearchCV) to split the training data into multiple folds, ensuring the hyperparameter combination performs consistently across different subsets of data.
β A.I. Frequently Asked Questions
Grid Search vs. Random Search?
Grid Search tests *every single* combination. Random Search (RandomizedSearchCV) tests a random sample of combinations from the grid. For very large grids, Random Search is faster and often finds a near-optimal solution with drastically less computational cost.
What are common hyperparameters to tune?
Random Forests: n_estimators, max_depth.
SVM: C, kernel, gamma.
Logistic Regression: C, penalty.