Intelligence in a graph is distributed. Message passing is the mechanism by which nodes share their internal states to build a collective understanding of the network β and every GNN you will ever use is built on top of this single loop.
1The Message-Aggregate-Update Loop
Every GNN layer performs a three-step dance that repeats for every node in the graph. In Step 1 β Message, each neighbor j sends a message to node i. This message can be as simple as h_j (the neighbor's raw features) or as complex as a learned function of both nodes and the edge between them. In Step 2 β Aggregate, all incoming messages are collapsed into a single fixed-size vector using a permutation-invariant function. This step is critical: it must handle 1 neighbor or 10,000 neighbors with the same operation. In Step 3 β Update, the node combines its current state h_i with the aggregated message using a neural network (typically an MLP), producing the next-layer embedding h_i^(k+1).
This three-step process is repeated for every layer in your GNN. After K layers, each node's embedding encodes not just its own features but a rich summary of its K-hop neighborhood β the context grows outward with each pass. This is directly analogous to how a CNN's receptive field grows with depth, except that here the 'pixels' are nodes and the 'grid' is the arbitrary topology of the graph.
// Message Passing Loop (K layers)
for (let k = 0; k < K; k++) {
const msgs = {};
// Step 1: GENERATE messages
for (const [u, v] of edgeList) {
msgs[v] = msgs[v] || [];
msgs[v].push(MSG_FN(h[u], h[v]));
}
// Step 2: AGGREGATE messages
const agg = {};
for (const v in msgs) {
agg[v] = SUM(msgs[v]);
}
// Step 3: UPDATE node state
for (const v of nodes) {
h[v] = relu(MLP([h[v], agg[v]]));
}
}2Aggregation Choices and the Over-Smoothing Cliff
The choice of aggregation function has deep theoretical consequences. SUM is the most expressive β it preserves multi-set information and is used by GIN (Graph Isomorphism Network), which is provably as powerful as the Weisfeiler-Lehman test for graph isomorphism. MEAN normalizes by degree, making it robust when comparing nodes with very different connectivity. MAX pools the strongest signal from the neighborhood. However, both Mean and Max are 'non-injective' β they can map different neighborhoods to the same embedding, causing information loss.
Layer depth introduces a critical trade-off. More layers give each node a wider view of the graph, but past 5β6 layers, a pathological phenomenon called Over-Smoothing emerges. Because each node averages the features of its expanding neighborhood, eventually every node's embedding converges to the same global mean β the model loses all ability to distinguish between nodes. In practice, most production GNNs use 2β3 layers. Techniques like Jumping Knowledge Networks (which concatenate intermediate layer representations) and Residual Connections help push this limit further.
// Aggregation Functions Compared
const nbrs = [[1,2],[3,0],[0,4]];
// SUM β most expressive (GIN)
const sumAgg = nbrs.reduce(
(s, h) => s.map((v,i) => v + h[i]), [0,0]
); // β [4, 6]
// MEAN β degree-normalized
const meanAgg = sumAgg.map(
v => v / nbrs.length
); // β [1.33, 2.0]
// MAX β strongest signal
const maxAgg = nbrs.reduce(
(m, h) => m.map((v,i) => Math.max(v,h[i])),
[-Infinity,-Infinity]
); // β [3, 4]