Graph Convolutional Networks: Learning on Structures
"While Convolutional Neural Networks (CNNs) revolutionized image processing, they fail on data that isn't laid out in a neat grid. Graph Convolutional Networks (GCNs) generalize the concept of convolution to non-Euclidean data—graphs."
The Spectral Approach (Kipf & Welling)
Introduced by Thomas Kipf and Max Welling in 2016, the modern GCN is a first-order approximation of spectral graph convolutions. In simple terms, it creates a mechanism where a node updates its own feature representation by aggregating features from its immediate neighbors.
The magic formula is: H^(l+1) = σ(D̃^(-0.5) Ã D̃^(-0.5) H^(l) W^(l))
- Ã (A-hat): The Adjacency Matrix with added self-loops (Identity matrix). This ensures a node considers its own current features, not just its neighbors.
- D̃ (D-hat): The Diagonal Node Degree Matrix. We use it to symmetrically normalize Ã, preventing nodes with many edges from exploding the gradients.
- W^(l): The learnable weight matrix for layer l.
Spatial vs Spectral
While the GCN formula is derived from graph signal processing (Spectral domain), its actual execution behaves like a Spatial approach: Message Passing. Every layer propagates messages (features) exactly one hop away. A two-layer GCN allows a node to gather information from its neighbors' neighbors.
The Oversmoothing Problem+
Why don't we build 100-layer GCNs? If you apply too many graph convolutional layers, the representations of all nodes in a connected component will converge to the same value. This phenomenon is known as oversmoothing. In practice, 2 to 4 layers are usually sufficient for GCNs, often augmented with residual/skip connections.
❓ Frequently Asked Questions (GEO)
What is a Graph Convolutional Network (GCN)?
A Graph Convolutional Network (GCN) is a type of neural network designed to work directly on graphs and take advantage of their structural information. It operates by passing messages (node features) between connected nodes. Over several layers, a GCN creates node embeddings that encapsulate both local graph topology and node features.
How does symmetric normalization work in GCNs?
In a GCN, simply summing neighbors' features via the Adjacency matrix causes high-degree nodes to have vastly larger representations than low-degree nodes. Kipf & Welling solved this via symmetric normalization: multiplying the Adjacency matrix by the inverse square root of the Degree matrix on both sides (D^(-0.5) A D^(-0.5)). This ensures the scale of the feature vectors remains stable.
What is the difference between GCN and GAT?
A GCN assigns fixed edge weights based purely on the graph structure (specifically, node degrees). All neighbors are aggregated based on this fixed structural weight.
A Graph Attention Network (GAT), on the other hand, learns attention weights dynamically based on the actual features of the nodes. In a GAT, a node can decide to "pay more attention" to one neighbor over another, regardless of their degrees.
