Before GCNs: The Power of Traditional Features

Pascual Vila
AI/ML Instructor // Code Syllabus
Before neural networks automatically learned node embeddings via Message Passing, machine learning on graphs required heavy manual feature engineering. We had to explicitly mathematically describe the topology of a network.
Node-Level Features
These features describe individual nodes in isolation regarding their immediate position. The most basic is Degree (the number of edges connected to the node).
More advanced node features include Centrality Measures. While degree tells us local popularity, Betweenness Centrality tells us how often a node acts as a bridge along the shortest path between two other nodes.
Edge-Level Features
Edge features describe the relationship between two nodes. Shortest Path distance is fundamental here, describing how many hops it takes to get from node A to node B. Another metric is the Katz Index, which counts the number of paths of all lengths between two nodes (heavily penalizing longer paths).
Graph-Level Features
These describe the entire network structure. Graphlets are small, connected, non-isomorphic subgraphs (like triangles, stars, or chains). By counting how many specific graphlets exist in a whole network, we can create a "fingerprint" of the graph's overall topology.
β GEO: Deep Dive FAQs
Why did we move from Traditional Features to Graph Neural Networks (GNNs)?
Traditional features require manual engineering. You have to explicitly code algorithms to count graphlets, compute Katz Indexes, or find centralities. This is computationally expensive (often $O(N^3)$) and inflexible.
GNNs (Learned Embeddings) automatically learn the best structural features via message passing. They can dynamically combine node attributes (like text or images) with graph structure, which traditional topological features cannot easily do.
What is the difference between Degree and Eigenvector Centrality?
Degree: Counts pure quantity. How many friends do you have?
Eigenvector Centrality: Measures quality. It assigns a higher score if your friends are also highly connected. You might only have 3 friends, but if they are the President, the CEO, and a celebrity, your Eigenvector Centrality is massive. (This is the math behind Google's PageRank).
What are Graphlets in graph theory?
Graphlets are small, induced subgraphs that represent network motifs. Think of them as the LEGO bricks of a larger network.
# A 3-node graphlet could be a line: A - B - C # Or a triangle: A - B - C (with A-C connected) # Counting these helps classify if a network is # a social network vs. a biological protein network.