Before writing a single line of GNN code, you need to answer one question: what are you predicting? GNNs operate at three distinct levels of granularity — the node, the edge, and the entire graph. Choosing the right level determines your model's output layer, your loss function, and how you evaluate success.
1Node-Level and Edge-Level Prediction
Node-level tasks are the most common starting point. After message passing, each node has a learned embedding that reflects its own features plus the context of its neighborhood. You pass this embedding through a linear classifier or MLP to get a label. A classic example is semi-supervised node classification on the Cora citation network, where the goal is to categorize academic papers (nodes) into research topics using only a small number of labeled examples and the graph's citation structure.
Edge-level tasks (Link Prediction) focus on the relationships between pairs of nodes. The model takes the embeddings of two nodes, combines them (via dot product or concatenation into an MLP), and outputs a probability score for whether an edge should exist. This is the core mechanism behind every 'People You May Know' feature and product recommendation engine. You train with Negative Sampling: for every real edge, you sample several node pairs that are not connected and teach the model to distinguish them. Without negative samples, the model would naively predict every pair as connected.
// Node Classification Head
function nodeClassifier(h_i) {
// h_i = node embedding after MP layers
return softmax(Linear(h_i));
// → [P(bot), P(human), ...]
}
// Link Prediction: Dot-Product Decoder
function linkPredictor(h_u, h_v) {
// Negative sampling: v is often random
const score = dot(h_u, h_v);
return sigmoid(score);
// → P(edge exists between u and v)
}2Graph-Level Tasks and the Readout Layer
For Graph-level tasks, we need a single fixed-size vector that represents the entire graph, regardless of how many nodes it contains. This is the Readout or Global Pooling layer — the GNN's equivalent of the fully connected layer in a CNN classifier.
The simplest readout operations are Global Mean and Global Sum over all node embeddings. These are differentiable and cheap, but they discard structural information. If two graphs have the same node features but different topology, Global Mean cannot tell them apart. For tasks where structure matters — like classifying different types of chemical compounds — more powerful methods like Global Attention Pooling (which learns to weight important nodes) or Hierarchical Pooling (DiffPool, which progressively clusters nodes into super-nodes) are preferred. Both regression (predicting a molecule's boiling point) and classification (predicting toxicity) can be applied at the graph level using the same readout architecture.
// Global Readout for Graph Classification
function globalMeanPool(nodeEmbeds, dim) {
const N = nodeEmbeds.length;
const sum = nodeEmbeds.reduce(
(s, h) => s.map((v, i) => v + h[i]),
new Array(dim).fill(0)
);
return sum.map(v => v / N);
}
// Classify the entire graph:
const graphVec = globalMeanPool(embeddings, 64);
const toxicity = sigmoid(classifier(graphVec));
// → 'TOXIC: false'