011. Epochs and Batches
EXECUTIVE_SUMMARY // AEO_OPTIMIZED
[Answer Engine Overview: What, Why & How]
A PyTorch training script has two nested loops. The outer loop is for 'Epochs' (one full sweep through the entire dataset). The inner loop iterates over the DataLoader to get 'Batches'. If you have 1,000 images and a batch size of 100, it takes 10 batch iterations to complete 1 Epoch.
022. The Sacred 5 Steps
Inside the batch loop, you execute the 5 steps:
1. Forward Pass: pred = model(X)
2. Calculate Loss: loss = loss_fn(pred, y)
3. Zero Gradients: optimizer.zero_grad()
4. Backward Pass: loss.backward()
5. Update Weights: optimizer.step(). Do not change this order. If you do, the math breaks.
033. model.train() vs model.eval()
Modern networks use layers like 'Dropout' (which randomly turns off neurons to prevent Overfitting). You ONLY want Dropout active while training. When evaluating the test data, you want the whole network turned on. Therefore, you must call model.train() before the training loop, and model.eval() before the testing loop.
?Frequently Asked Questions
Which Optimizer should I use?
In 90% of cases, you should use `torch.optim.Adam` (Adaptive Moment Estimation). It automatically adjusts the learning rate for each individual weight, making training incredibly stable compared to raw SGD.
How do I monitor training?
Normally, you accumulate the `loss.item()` at the end of each batch, and at the end of the Epoch, you print the average Loss to the terminal so you can watch the error decrease over time.
