How do you represent a complex, million-node graph with a single vector? Pooling is the critical bridge from local node-level features to global, graph-level intelligence.
1Global Readout: The Flat Summary
For graph-level tasks — like predicting whether a molecule is toxic, or classifying a sub-reddit's community type — you eventually need to compress all node embeddings into a single fixed-size vector. This operation is called the Readout Layer or Global Pooling.
The simplest approaches are order-invariant aggregations over the entire graph: Global Mean, Global Sum, and Global Max. Each has different properties. Mean calculates the average feature distribution but is blind to graph size (a graph with 10 carbon atoms looks identical to a graph with 100 carbon atoms). Sum preserves graph volume, which is critical for chemistry tasks where atom count dictates physical properties. Max extracts the strongest isolated signals (useful for detecting a single toxic substructure). However, all global readouts are 'flat' — they instantly destroy all community and motif structures. You compress a complex topology into a single dense vector in one brutal step.
// Simple Global Readout
// h_nodes shape: [num_nodes, features]
function graphReadout(h_nodes, method) {
if (method === 'sum') {
return sumCols(h_nodes); // Volume aware
}
if (method === 'mean') {
return meanCols(h_nodes); // Distribution
}
if (method === 'max') {
return maxCols(h_nodes); // Strongest signal
}
}
// Result: single [features] vector
// Topology is completely forgotten.2Hierarchical Pooling (DiffPool & SAGPool)
Hierarchical Pooling attempts to mirror the natural organization of networks, much like a CNN pools pixels into edges, edges into textures, and textures into objects. A graph is pooled down progressively across multiple layers.
DiffPool (Differentiable Pooling) trains a secondary GNN to learn an assignment matrix *S*. This matrix maps the N nodes of the current graph into C clusters (where C < N). The features of the nodes in a cluster are aggregated to form a new super-node, and the edges between clusters form a new coarsened adjacency matrix. This preserves structural motifs (like functional groups in a molecule) at intermediate layers.
SAGPool (Self-Attention Graph Pooling) takes a different approach: node dropping. It uses a GNN to calculate an attention score for every node. It simply sorts the nodes by score, drops the bottom 50% (the 'noise'), and keeps the top 50% (the 'signal'), preserving the edges between the retained nodes. This is much more memory efficient than DiffPool, which requires dense N×C assignment matrices.
// SAGPool: Top-K Node Dropping
function sagPool(X, A, ratio = 0.5) {
// 1. Score nodes using a GCN
const scores = sigmoid(A_norm @ X @ W_att);
// 2. Find cut-off threshold
const k = Math.floor(X.length * ratio);
const topKIndices = argsort(scores).slice(0, k);
// 3. Drop uninformative nodes
const X_pooled = X[topKIndices] * scores[topKIndices];
const A_pooled = A[topKIndices][topKIndices];
return { X: X_pooled, A: A_pooled };
}