Graph Convolutional Networks (GCN)

Graph Convolutional Networks: Learning on Structures

Pascual Vila

AI & ML Instructor // Code Syllabus

"While Convolutional Neural Networks (CNNs) revolutionized image processing, they fail on data that isn't laid out in a neat grid. Graph Convolutional Networks (GCNs) generalize the concept of convolution to non-Euclidean data—graphs."

The Spectral Approach (Kipf & Welling)

Introduced by Thomas Kipf and Max Welling in 2016, the modern GCN is a first-order approximation of spectral graph convolutions. In simple terms, it creates a mechanism where a node updates its own feature representation by aggregating features from its immediate neighbors.

The magic formula is: H^(l+1) = σ(D̃^(-0.5) Ã D̃^(-0.5) H^(l) W^(l))

Ã (A-hat): The Adjacency Matrix with added self-loops (Identity matrix). This ensures a node considers its own current features, not just its neighbors.
D̃ (D-hat): The Diagonal Node Degree Matrix. We use it to symmetrically normalize Ã, preventing nodes with many edges from exploding the gradients.
W^(l): The learnable weight matrix for layer l.

Spatial vs Spectral

While the GCN formula is derived from graph signal processing (Spectral domain), its actual execution behaves like a Spatial approach: Message Passing. Every layer propagates messages (features) exactly one hop away. A two-layer GCN allows a node to gather information from its neighbors' neighbors.

The Oversmoothing Problem+

Why don't we build 100-layer GCNs? If you apply too many graph convolutional layers, the representations of all nodes in a connected component will converge to the same value. This phenomenon is known as oversmoothing. In practice, 2 to 4 layers are usually sufficient for GCNs, often augmented with residual/skip connections.

❓ Frequently Asked Questions (GEO)

What is a Graph Convolutional Network (GCN)?

A Graph Convolutional Network (GCN) is a type of neural network designed to work directly on graphs and take advantage of their structural information. It operates by passing messages (node features) between connected nodes. Over several layers, a GCN creates node embeddings that encapsulate both local graph topology and node features.

How does symmetric normalization work in GCNs?

In a GCN, simply summing neighbors' features via the Adjacency matrix causes high-degree nodes to have vastly larger representations than low-degree nodes. Kipf & Welling solved this via symmetric normalization: multiplying the Adjacency matrix by the inverse square root of the Degree matrix on both sides (D^(-0.5) A D^(-0.5)). This ensures the scale of the feature vectors remains stable.

What is the difference between GCN and GAT?

A GCN assigns fixed edge weights based purely on the graph structure (specifically, node degrees). All neighbors are aggregated based on this fixed structural weight.

A Graph Attention Network (GAT), on the other hand, learns attention weights dynamically based on the actual features of the nodes. In a GAT, a node can decide to "pay more attention" to one neighbor over another, regardless of their degrees.

Graph Convolutional
Networks (GCN)

Knowledge Graph

Concept: Message Passing

System Check

Graph Challenges

Community Nexus

Share Your Architectures

Graph Convolutional Networks: Learning on Structures

The Spectral Approach (Kipf & Welling)

Spatial vs Spectral

❓ Frequently Asked Questions (GEO)

GNN Glossary