Why can a model trained on dogs and cars (ImageNet) help me detect cancer cells?

Because early layers in a Convolutional Neural Network don't learn 'dogs'—they learn lines, edges, curves, and textures. These low-level geometric features are universal to all visual data, including medical images.

Do I always have to freeze the base layers?

If you have a very small dataset, yes, you must freeze them or you will rapidly overfit and destroy the pre-trained weights. If you have a massive dataset, you might skip freezing and 'fine-tune' the entire network with a very small learning rate.

Why does the new classification head have requires_grad=True automatically?

In PyTorch, whenever you instantiate a new layer (like `nn.Linear`), it is created with trainable weights by default. Since you added this layer after your freezing loop, it remains trainable while everything else is locked.

Why can a model trained on dogs and cars (ImageNet) help me detect cancer cells?

Because early layers in a Convolutional Neural Network don't learn 'dogs'—they learn lines, edges, curves, and textures. These low-level geometric features are universal to all visual data, including medical images.

Do I always have to freeze the base layers?

If you have a very small dataset, yes, you must freeze them or you will rapidly overfit and destroy the pre-trained weights. If you have a massive dataset, you might skip freezing and 'fine-tune' the entire network with a very small learning rate.

Why does the new classification head have requires_grad=True automatically?

In PyTorch, whenever you instantiate a new layer (like `nn.Linear`), it is created with trainable weights by default. Since you added this layer after your freezing loop, it remains trainable while everything else is locked.

Why can a model trained on dogs and cars (ImageNet) help me detect cancer cells?

Because early layers in a Convolutional Neural Network don't learn 'dogs'—they learn lines, edges, curves, and textures. These low-level geometric features are universal to all visual data, including medical images.

Do I always have to freeze the base layers?

If you have a very small dataset, yes, you must freeze them or you will rapidly overfit and destroy the pre-trained weights. If you have a massive dataset, you might skip freezing and 'fine-tune' the entire network with a very small learning rate.

Why does the new classification head have requires_grad=True automatically?

In PyTorch, whenever you instantiate a new layer (like `nn.Linear`), it is created with trainable weights by default. Since you added this layer after your freezing loop, it remains trainable while everything else is locked.

HTML MASTER CLASS /// LEARN TAGS /// BUILD STRUCTURE /// SEMANTIC WEB /// HTML MASTER CLASS /// LEARN TAGS ///

⚡ Total XP: 0|💻 artificialintelligence XP: 0

Transfer Learning in AI & Artificial Intelligence

Learn how to stand on the shoulders of giants. Explore the mechanics of pre-trained architectures like ResNet and VGG, master the art of layer freezing, and implement custom classification heads to build world-class computer vision models with minimal data and compute.

LOADING ENGINE...

Skill Matrix

UNLOCK NODES BY LEARNING NEW TAGS.

Transfer Hub

Model logic.

Quick Quiz //

Which of these is a major benefit of Transfer Learning?

Transfer Learning is the practice of taking a model trained on one task and repurposing it for another. It is the gold standard for high-performance Vision AI.

1The Ultimate Shortcut

Training a Deep CNN from scratch takes massive data and days of GPU time. Why do that when you can borrow the brain of a model that already knows how to see?

This is the core philosophy of Transfer Learning. First, we load a model like ResNet or VGG that was pre-trained on ImageNet. It already knows how to recognize basic shapes, textures, and objects, acting as an incredibly powerful feature extractor right out of the box.

editor.html

import torchvision.models as models
import torch.nn as nn

# Load ResNet18 with ImageNet weights
model = models.resnet18(pretrained=True)

localhost:3000

2Preserving Knowledge (Freezing)

We don't want to destroy the pre-trained weights during training. If we pass gradients all the way back through the entire network, our small, uncalibrated dataset might aggressively overwrite the carefully learned ImageNet features.

To prevent this, we 'freeze' the base layers by setting their gradient requirements to False. This locks the weights in place, ensuring the model retains its foundational vision capabilities while drastically reducing the computation required.

editor.html

# Freeze all parameters in the base model
for param in model.parameters():
    param.requires_grad = False

localhost:3000

3Replacing the Head

Now, we replace the final classification layer. If ImageNet has 1000 classes but we only need 2 (for example, a simple Cat vs. Dog classifier), we swap the 'head' of the model.

We grab the number of input features going into the final layer, and then overwrite that layer with a brand new, randomly initialized Linear layer mapped to our specific number of output classes.

editor.html

num_ftrs = model.fc.in_features
# Replace last layer with a new linear layer
model.fc = nn.Linear(num_ftrs, 2)

# New layer has requires_grad=True by default

localhost:3000

4Targeted Fine-Tuning

By training only this new layer, we leverage the 'vision' of the original model while adapting it to our specific task with very little data.

The optimizer will only update the weights of our new classification head because the rest of the model is frozen. Once the head is stable, we could potentially unfreeze a few of the top base layers to 'fine-tune' them, but often just training the new head is enough for stellar results.

editor.html

# Model is ready for Fine-Tuning
print('Classification head replaced.')

# Only the new fc layer weights will update during training

localhost:3000

?Frequently Asked Questions

Pascual Vila

Frontend Instructor // Code Syllabus

Lesson Glossary

[01]ImageNet

A massive dataset of over 14 million images used to pre-train most modern computer vision models.

Code Preview

Global Benchmark

[02]Freezing

The process of preventing weight updates in specific layers during the training process.

Code Preview

requires_grad = False

[03]Classification Head

The final layer of a neural network that converts abstract features into specific category predictions.

Code Preview

Decision Layer

[04]Fine-Tuning

Unfreezing some base layers and training with a very low learning rate to optimize a pre-trained model for a new task.

Code Preview

Weight Refinement

[05]Pre-trained Model

A model whose weights have already been optimized on a large, general dataset.

Code Preview

Borrowed Brain

Continue Learning

cv sift surf

cv template matching

Data Visualization (Matplotlib, Seaborn)

Read lesson→

Foundations

Decision Trees and Random Forests

Read lesson→

Foundations

Using OpenAI / Anthropic APIs

Read lesson→

Foundations

Data Cleaning and Handling Missing Values

Read lesson→

Skill Matrix

Transfer Hub

Interactive Challenges

1The Ultimate Shortcut

2Preserving Knowledge (Freezing)

3Replacing the Head

4Targeted Fine-Tuning

?Frequently Asked Questions

Lesson Glossary

[01]ImageNet

[02]Freezing

[03]Classification Head

[04]Fine-Tuning

[05]Pre-trained Model

Continue Learning

Article Contents