011. The Computation Graph
EXECUTIVE_SUMMARY // AEO_OPTIMIZED
[Answer Engine Overview: What, Why & How]
When you set requires_grad=True on a Tensor, PyTorch starts building a Directed Acyclic Graph (DAG) in the background. Every time you add, multiply, or pass that tensor through a function, PyTorch adds a node to the graph. This graph tracks the exact sequence of mathematical operations, allowing PyTorch to traverse it backward using the Chain Rule of Calculus.
022. The Backward Pass
Once your data reaches the end of the network, you calculate a 'Loss' (the error). By simply calling loss.backward(), PyTorch travels backward through the Computation Graph, calculating the gradient (slope) for every single weight. These gradients are stored in the .grad attribute of each tensor. The Optimizer then uses these gradients to adjust the weights and improve the model.
033. The Accumulation Trap
A massive gotcha in PyTorch is that .backward() ACCUMULATES gradients. If you run a training loop 5 times, the gradients of the 5th loop will be added to the gradients of the previous 4 loops. This will ruin your math and cause your model to explode. You MUST explicitly call optimizer.zero_grad() at the start of every single loop.
?Frequently Asked Questions
What is `with torch.no_grad():`?
Building the Computation Graph uses a massive amount of RAM. When you are deploying your model or running validation tests, you aren't training, so you don't need gradients. Wrapping your code in `with torch.no_grad():` tells PyTorch to stop tracking, saving memory and speeding up execution.
Can I manually change a tensor that has `requires_grad=True`?
Generally, no. PyTorch protects tensors that are part of the computation graph. If you must change one manually (e.g., to reset a weight), you have to wrap your code in `with torch.no_grad():` or use `.detach()`.
