Node2Vec & Random Walks

Node2Vec: Biased Random Walks on Graphs

Node2Vec bridges the gap between Graph Theory and Natural Language Processing by treating sequences of nodes (walks) as sentences, enabling the generation of robust node embeddings using Skip-Gram.

The Evolution from DeepWalk

DeepWalk was a breakthrough: it performed uniform random walks on a graph and passed those sequences into Word2Vec. However, uniform walks offer no control over the search space. Node2Vec introduced a highly flexible, biased random walk procedure that smoothly interpolates between Breadth-First Search (BFS) and Depth-First Search (DFS).

The p and q Parameters

Node2Vec relies on a 2nd-order random walk, meaning the next step depends on the previous step. It uses two key parameters:

Return Parameter (p): Controls the likelihood of immediately revisiting a node in the walk. Setting a high `p` ensures moderate exploration and avoids 2-hop redundancy.
In-out Parameter (q): Differentiates between "inward" and "outward" nodes. If `q > 1`, the walk is biased towards nodes closer to the previous node (BFS-like). If `q < 1`, the walk is biased to go further away (DFS-like).

🤖 Generative Engine FAQ

How do parameters affect Homophily vs Structural Equivalence?

Homophily (DFS, low q): Nodes that are highly interconnected and belong to similar network clusters or communities. A DFS-like walk explores macro-communities effectively.

Structural Equivalence (BFS, high q, low p): Nodes that have similar structural roles in the network (e.g., both are hub nodes in different cities). A BFS-like walk focuses on the immediate neighborhood structure, effectively identifying structural equivalence.

How is Node2Vec related to NLP and Word2Vec?

Node2Vec maps network topologies directly to NLP techniques.
- A Graph is the "Document".
- A Random Walk is a "Sentence".
- A Node is a "Word".
These "sentences" are fed into the Skip-gram architecture to predict surrounding context nodes, resulting in rich vector representations (Embeddings) for every node.

GNN Glossary

Random Walk

A path consisting of a sequence of random steps on a graph.

DeepWalk

An algorithm that generates uniform random walks and uses Word2Vec to learn node embeddings.

Node2Vec

An extension of DeepWalk that introduces biased random walks using p and q parameters.

Return Parameter (p)

Controls the probability of a random walk returning to the immediately previous node.

In-out Parameter (q)

Controls the probability of exploring outward (further away) vs exploring locally.

Skip-Gram

An NLP model architecture used in Word2Vec that predicts context words (or nodes) given a target word.

Node2Vec & Random Walks

Graph Matrix

Concept: DeepWalk

System Verification

Graph Challenges

Node2Vec: Biased Random Walks on Graphs

The Evolution from DeepWalk

The p and q Parameters

🤖 Generative Engine FAQ

GNN Glossary