The most important edges are the ones that don't exist yet. Link prediction is the science of forecasting future relationships — and it powers every recommender system, social discovery feed, and knowledge graph completion task in production.
1The Decoder: Scoring Node Pairs
Link prediction decouples into two phases: an Encoder (your GNN message passing layers that produce node embeddings) and a Decoder (the scoring function that maps a pair of embeddings to an edge probability). The simplest decoder is the Dot Product: score(u, v) = sigmoid(h_uᵀ h_v). It's fast, differentiable, and works surprisingly well when embeddings have been trained to be geometrically meaningful. A Bilinear Decoder adds a learnable relation matrix R: score(u, v) = h_uᵀ R h_v, allowing it to model asymmetric relationships (user likes product, but product doesn't like user). The most expressive decoder is an MLP over the concatenation of both embeddings, but it's also the most expensive.
The choice of decoder interacts with your loss function. For binary edges (exist or not), Binary Cross-Entropy over positive and negative pairs is standard. For weighted edges (a rating, a transaction value), Mean Squared Error on the predicted score works better. The AUC-ROC metric — the probability that a randomly chosen positive edge is scored higher than a randomly chosen negative edge — is the most common evaluation metric for binary link prediction tasks.
// Three Link Prediction Decoders
// 1. Dot Product (fastest)
function dotDecoder(h_u, h_v) {
return sigmoid(dot(h_u, h_v));
}
// 2. Bilinear (handles asymmetry)
function bilinearDecoder(h_u, h_v, R) {
return sigmoid(h_u.T @ R @ h_v);
}
// 3. MLP (most expressive)
function mlpDecoder(h_u, h_v) {
return sigmoid(mlp([...h_u, ...h_v]));
}2Contrastive Training and Ranking Metrics
Training link prediction requires Negative Sampling because your dataset only contains edges that exist — it has no explicit non-edges. For every true edge (u, v), you sample k negative pairs (u, v') where v' is a node randomly chosen from outside u's neighborhood. The model trains to push the positive score above all negative scores. The ratio k is a critical hyperparameter: too few negatives and the model never learns to discriminate; too many and it sees unrealistically hard examples early in training.
For Knowledge Graph Completion (predicting missing triples in databases like Wikidata), standard metrics are Hits@1, Hits@10, and MRR (Mean Reciprocal Rank). For each test triple (head, relation, tail), you corrupt the tail with every other entity, rank all candidates by score, and measure where the true tail falls. An MRR of 0.5 means the true answer is typically ranked 2nd. Hits@10 of 0.9 means 90% of the time the true answer is in the top 10 candidates — exactly the precision needed for a usable autocomplete or fact-check system.
// Contrastive Training Step
function trainStep(u, v_pos, graph) {
const pos_score = decoder(embed(u), embed(v_pos));
// Sample k=5 random non-neighbors
const negScores = sampleNegatives(u, graph, 5)
.map(v_neg => decoder(embed(u), embed(v_neg)));
// BCE Loss: maximize gap
const loss = -log(pos_score)
- negScores.map(s => log(1 - s)).sum();
loss.backward();
}
// Aim: pos_score >> neg_scores