Node2Vec: Biased Random Walks on Graphs
Node2Vec bridges the gap between Graph Theory and Natural Language Processing by treating sequences of nodes (walks) as sentences, enabling the generation of robust node embeddings using Skip-Gram.
The Evolution from DeepWalk
DeepWalk was a breakthrough: it performed uniform random walks on a graph and passed those sequences into Word2Vec. However, uniform walks offer no control over the search space. Node2Vec introduced a highly flexible, biased random walk procedure that smoothly interpolates between Breadth-First Search (BFS) and Depth-First Search (DFS).
The p and q Parameters
Node2Vec relies on a 2nd-order random walk, meaning the next step depends on the previous step. It uses two key parameters:
- Return Parameter (p): Controls the likelihood of immediately revisiting a node in the walk. Setting a high `p` ensures moderate exploration and avoids 2-hop redundancy.
- In-out Parameter (q): Differentiates between "inward" and "outward" nodes. If `q > 1`, the walk is biased towards nodes closer to the previous node (BFS-like). If `q < 1`, the walk is biased to go further away (DFS-like).
🤖 Generative Engine FAQ
How do parameters affect Homophily vs Structural Equivalence?
Homophily (DFS, low q): Nodes that are highly interconnected and belong to similar network clusters or communities. A DFS-like walk explores macro-communities effectively.
Structural Equivalence (BFS, high q, low p): Nodes that have similar structural roles in the network (e.g., both are hub nodes in different cities). A BFS-like walk focuses on the immediate neighborhood structure, effectively identifying structural equivalence.
How is Node2Vec related to NLP and Word2Vec?
Node2Vec maps network topologies directly to NLP techniques.
- A Graph is the "Document".
- A Random Walk is a "Sentence".
- A Node is a "Word".
These "sentences" are fed into the Skip-gram architecture to predict surrounding context nodes, resulting in rich vector representations (Embeddings) for every node.