The world's biggest networks — like the Google Knowledge Graph or Pinterest's product graph — have billions of nodes and trillions of edges. They don't fit in GPU memory. To solve them, we must rethink how we sample data.
1The Subgraph Strategy (GraphSAINT)
Node-wise sampling (like GraphSAGE) prevents full-graph memory issues by only looking at a fixed number of neighbors per node. However, it still suffers from Neighbor Explosion: a 3-layer GraphSAGE network with a sample size of 10 must load 1,000 nodes just to compute the embedding for a *single* target node.
GraphSAINT solves this by flipping the paradigm. Instead of starting at a target node and expanding outwards, GraphSAINT uses a random walk to extract a dense, self-contained Subgraph from the massive master graph. It then loads this subgraph onto the GPU and runs a standard, full-batch GCN over it. Because every node in the subgraph can serve as both a target and a neighbor for other nodes in the same batch, computation is heavily shared, completely eliminating neighbor explosion.
// GraphSAINT: Subgraph Sampling
function graphSAINT_Epoch(MasterGraph, steps=100) {
let total_loss = 0;
for (let b = 0; b < num_batches; b++) {
// 1. Extract subgraph via random walk
const Subgraph = randomWalk(MasterGraph, steps);
// 2. Compute normal GCN entirely in-memory
const predictions = GCN(Subgraph);
// 3. Apply Bias Correction (Importance Sampling)
const loss = calculateLoss(predictions);
total_loss += applyNormalization(loss, Subgraph);
updateWeights(total_loss);
}
}2Clustering and Computation (ClusterGCN)
ClusterGCN takes a deterministic approach to the same problem. Instead of random walks, it uses graph partitioning algorithms (like METIS) to break the massive master graph into distinct, densely connected clusters of nodes.
During training, a batch is formed by picking one (or a few) of these pre-computed clusters. Because the clusters are partitioned to minimize the number of cut edges between them, the vast majority of a node's neighbors reside inside the same batch. This guarantees high computational efficiency and preserves local structural motifs perfectly. To prevent the model from overfitting to the isolated clusters (ignoring the cross-cluster edges), ClusterGCN often uses a stochastic multiple-clustering approach, where clusters are randomly merged into larger batches during training.
// ClusterGCN: Partition-based Training
// 1. Preprocessing: Partition Graph (CPU)
const clusters = runMetis(MasterGraph, num_parts=1000);
// 2. Training Loop (GPU)
for (let epoch = 0; epoch < MAX_EPOCHS; epoch++) {
// Stochastic batching: mix 5 random clusters
const batch_clusters = sampleRandom(clusters, 5);
const Subgraph = mergeClusters(batch_clusters);
// Cross-cluster edges within the batch are restored
trainGNN(Subgraph);
}