Why does link prediction require negative sampling?

A real-world graph only records edges that exist — it has no explicit negative examples. If you only train on positive examples, the model has no signal for what a non-connection looks like and will score all pairs as highly likely. Negative sampling creates artificial non-edges by randomly pairing nodes that have no observed connection. The model learns to assign low scores to these pairs, giving it the discriminative ability it needs to separate true edges from noise.

What is the difference between MRR and Hits@K?

Both measure ranking quality. MRR (Mean Reciprocal Rank) is the average of 1/rank across all test examples — it rewards being close to position 1 and is sensitive to the exact rank. Hits@K is binary: it's 1 if the correct answer appears within the top K results, 0 otherwise. MRR is more informative for small K (e.g., autocomplete), while Hits@K is more practical for recommendation (does the user see the right item on the first page?). Both are reported together in most knowledge graph benchmarks.

How do I handle the 'false negative' problem in link prediction training?

When you randomly sample negative pairs, some of them may actually be true edges that weren't observed (false negatives). This is common in sparse real-world graphs like social networks. Solutions include: (1) Filtering: remove any sampled negative that appears in the observed edge set. (2) Hard negative mining: sample negatives from nodes that are 2-3 hops away (close enough to be plausible, far enough to be true negatives). (3) Curriculum learning: start with easy random negatives and gradually switch to harder ones as the model matures.

🚀 LEVEL UP TO SENIOR:Unlock 500+ Advanced Practical Challenges & Exercises.

🎓 COURSERA PARTNER:Earn professional Google, Meta, and IBM certificates to supercharge your resume.

Tutorials

HTML MASTER CLASS /// LEARN TAGS /// BUILD STRUCTURE /// SEMANTIC WEB /// HTML MASTER CLASS /// LEARN TAGS ///

⚡ Total XP: 0|💻 artificialintelligence XP: 0

Link Prediction in AI & Artificial Intelligence

Master the full pipeline of relational discovery. Learn how to implement dot-product and MLP edge scoring functions, understand contrastive training with negative sampling, and explore ranking metrics like MRR and Hits@K that measure whether your model actually surfaces relevant connections. Identify architectural patterns used in production recommender systems at scale.

LOADING ENGINE...

Skill Matrix

UNLOCK NODES BY LEARNING NEW TAGS.

Link Hub

Affinity logic.

Quick Quiz //

What is the output of a Link Prediction model?

The most important edges are the ones that don't exist yet. Link prediction is the science of forecasting future relationships — and it powers every recommender system, social discovery feed, and knowledge graph completion task in production.

1The Decoder: Scoring Node Pairs

Link prediction decouples into two phases: an Encoder (your GNN message passing layers that produce node embeddings) and a Decoder (the scoring function that maps a pair of embeddings to an edge probability). The simplest decoder is the Dot Product: score(u, v) = sigmoid(h_uᵀ h_v). It's fast, differentiable, and works surprisingly well when embeddings have been trained to be geometrically meaningful. A Bilinear Decoder adds a learnable relation matrix R: score(u, v) = h_uᵀ R h_v, allowing it to model asymmetric relationships (user likes product, but product doesn't like user). The most expressive decoder is an MLP over the concatenation of both embeddings, but it's also the most expensive.

The choice of decoder interacts with your loss function. For binary edges (exist or not), Binary Cross-Entropy over positive and negative pairs is standard. For weighted edges (a rating, a transaction value), Mean Squared Error on the predicted score works better. The AUC-ROC metric — the probability that a randomly chosen positive edge is scored higher than a randomly chosen negative edge — is the most common evaluation metric for binary link prediction tasks.

—

// Three Link Prediction Decoders

// 1. Dot Product (fastest)
function dotDecoder(h_u, h_v) {
  return sigmoid(dot(h_u, h_v));
}

// 2. Bilinear (handles asymmetry)
function bilinearDecoder(h_u, h_v, R) {
  return sigmoid(h_u.T @ R @ h_v);
}

// 3. MLP (most expressive)
function mlpDecoder(h_u, h_v) {
  return sigmoid(mlp([...h_u, ...h_v]));
}

localhost:3000

localhost:3000/decoder-comparison

AUC-ROC Comparison

MLP Decoder: 0.951 (slowest)

Bilinear: 0.944

Dot Product: 0.921 (fastest) ✓

2Contrastive Training and Ranking Metrics

Training link prediction requires Negative Sampling because your dataset only contains edges that exist — it has no explicit non-edges. For every true edge (u, v), you sample k negative pairs (u, v') where v' is a node randomly chosen from outside u's neighborhood. The model trains to push the positive score above all negative scores. The ratio k is a critical hyperparameter: too few negatives and the model never learns to discriminate; too many and it sees unrealistically hard examples early in training.

For Knowledge Graph Completion (predicting missing triples in databases like Wikidata), standard metrics are Hits@1, Hits@10, and MRR (Mean Reciprocal Rank). For each test triple (head, relation, tail), you corrupt the tail with every other entity, rank all candidates by score, and measure where the true tail falls. An MRR of 0.5 means the true answer is typically ranked 2nd. Hits@10 of 0.9 means 90% of the time the true answer is in the top 10 candidates — exactly the precision needed for a usable autocomplete or fact-check system.

—

// Contrastive Training Step
function trainStep(u, v_pos, graph) {
  const pos_score = decoder(embed(u), embed(v_pos));
  // Sample k=5 random non-neighbors
  const negScores = sampleNegatives(u, graph, 5)
    .map(v_neg => decoder(embed(u), embed(v_neg)));
  // BCE Loss: maximize gap
  const loss = -log(pos_score)
    - negScores.map(s => log(1 - s)).sum();
  loss.backward();
}
// Aim: pos_score >> neg_scores

localhost:3000

localhost:3000/link-training

Epoch 100 Scores

Positive edge (A,B): 0.91 ✓

Negative edge (A,Z): 0.06 ✓

BCE Loss: 0.142 ↓ converging