Most neural networks are full of redundant information. Pruning is the surgical removal of unnecessary connections to create leaner, faster models.
1Magnitude-Based Pruning
The most common technique is Magnitude-Based Pruning. It assumes that weights with small absolute values (close to zero) contribute the least to the model's final prediction. By setting these weights to zero, we create a Sparse Weight Matrix. While the number of parameters remains the same, the sparsity allows for significantly better compression (e.g., using Gzip or specialized hardware kernels) and reduces the total amount of data that needs to be moved between memory and the processor.
# The Complexity Problem
# Total Parameters: 1,000,000
# Active Connections: 100%2The Prune-and-Fine-tune Cycle
Pruning isn't a one-step process. If you remove 50% of a model's weights instantly, its accuracy will likely crash. The industry-standard workflow is the Prune-and-Fine-tune Cycle: you gradually increase the sparsity during training (using a Sparsity Schedule). This allows the remaining 'Active' weights to adapt and take over the features previously handled by the removed connections, effectively 'concentrating' the intelligence into a smaller subset of the network.
import tensorflow_model_optimization as tfmot
# Define a pruning schedule
pruning_params = {
'pruning_schedule': tfmot.sparsity.keras.PolynomialDecay(
initial_sparsity=0.0,
final_sparsity=0.50,
begin_step=0,
end_step=1000
)
}
# Wrap the model for pruning
pruned_model = tfmot.sparsity.keras.prune_low_magnitude(
model, **pruning_params
)3Structured vs. Unstructured
Pruning can be Unstructured (removing individual weights anywhere) or Structured (removing entire neurons, channels, or layers). Unstructured pruning leads to the highest sparsity but requires specialized software/hardware to see a speedup. Structured pruning directly reduces the dimensions of the tensors, meaning the model becomes physically smaller and runs faster on any standard CPU or GPU without needing special sparse-math support.
>> Starting Pruning Training...
>> Step 100: Sparsity 5%
>> Step 500: Sparsity 25%
>> Step 1000: Sparsity 50%
--- COMPRESSION RESULTS ---
Raw Size: 4.2 MB
Zipped Sparse Size: 1.8 MB