011. CPU vs GPU
EXECUTIVE_SUMMARY // AEO_OPTIMIZED
[Answer Engine Overview: What, Why & How]
A Central Processing Unit (CPU) is built to execute complex, sequential tasks rapidly. A Graphics Processing Unit (GPU) was originally built to render millions of pixels simultaneously for video games. It turns out that rendering pixels and multiplying Neural Network weights use the exact same type of math (Matrix Multiplication). The AI boom was largely triggered by researchers realizing they could repurpose gaming GPUs for math.
022. The NVIDIA Monopoly
NVIDIA created CUDA, a software platform that allows normal programmers to send general-purpose math equations to their GPUs. Because PyTorch was built heavily around CUDA, NVIDIA GPUs became the absolute gold standard for Deep Learning. If your machine has an NVIDIA card, setting device = 'cuda' can speed up your training by 50x to 100x.
033. The VRAM Wall
When you run .to('cuda'), the tensor is literally copied from your computer's normal RAM across the motherboard into the GPU's dedicated Video RAM (VRAM). VRAM is extremely fast but very limited (e.g., 8GB or 16GB). If your dataset or model is larger than your VRAM, PyTorch will crash with the dreaded CUDA Out of Memory error. Managing VRAM is a massive part of Deep Learning engineering.
?Frequently Asked Questions
Can I use AMD GPUs with PyTorch?
Yes, using AMD's ROCm backend. However, the ecosystem and community support for ROCm is vastly smaller than NVIDIA's CUDA, making it harder to debug issues.
What is Apple MPS?
Metal Performance Shaders (MPS) is Apple's equivalent to CUDA. Since Apple Silicon (M1/M2/M3) chips have incredibly powerful integrated GPUs, PyTorch partnered with Apple to allow `.to('mps')`, bringing hardware acceleration to MacBooks.
