GloVe & FastText: Beyond Word2Vec
While Word2Vec revolutionized NLP by utilizing local context windows, two later architectures solved its core limitations: GloVe capitalized on global corpus statistics, and FastText conquered the dreaded "Out-Of-Vocabulary" problem via subword embeddings.
GloVe: Global Vectors for Word Representation
Developed by researchers at Stanford, GloVe addresses a key flaw in local-context models like Word2Vec: they poorly utilize the vast statistics of the corpus as a whole.
GloVe constructs a massive Word-Word Co-occurrence Matrix. It counts how frequently every word appears near every other word in the entire training dataset. By training on the non-zero entries of this matrix, GloVe forces the model to learn word vectors such that their dot product equals the logarithm of the words' probability of co-occurrence. This results in incredibly precise semantic mappings (e.g., man is to king as woman is to queen).
FastText: Enriching Vectors with Subwords
Standard embedding models (Word2Vec, GloVe) assign a distinct vector to every word. If your model encounters a word it wasn't trained on (an OOV - Out-Of-Vocabulary word), it crashes or assigns a useless random vector.
Facebook's AI Research lab developed FastText to fix this. Instead of treating words as atomic units, FastText represents each word as a bag of character n-grams (subwords).
- Example: The word "apple" with n=3 is represented as the sum of the vectors for
<ap,app,ppl,ple, andle>. - The Magic: If it later encounters the typo "appple", it has never seen the full word. But it has seen the subwords "app", "ple", etc. It sums those up, generating a highly accurate guess for the unknown word.
🤖 Architecture FAQs
What is the main difference between GloVe and Word2Vec?
Word2Vec is a predictive model that uses a local window of context (e.g., 5 words around the target word) to update vectors. GloVe is a count-based model that first builds a giant matrix of how often words co-occur in the whole document, then reduces its dimensionality. GloVe often captures global analogies better.
Why is FastText better for morphologically rich languages?
Languages like Turkish, Finnish, or German create new words by combining roots and suffixes. A standard vocabulary dictionary cannot hold all combinations. Because FastText learns vectors for the subwords (prefixes, roots, suffixes), it can understand complex compound words it has never seen before by adding the root vectors together.
When should I NOT use FastText?
FastText models require significantly more RAM than Word2Vec or GloVe because they must store vectors for millions of subword n-grams, rather than just whole words. If memory is heavily constrained and typos/OOV words aren't a concern, standard Word2Vec or GloVe is more efficient.