NLP BERT And Contextual Embeddings

BERT & Contextual Embeddings: Deep Context

Before BERT, a word like 'bank' had one static numeric vector, whether it meant a financial institution or the side of a river. BERT changed everything by looking at the entire sentence bidirectionally before assigning meaning.

The Problem with Word2Vec

Traditional word embeddings (like Word2Vec or GloVe) generate a static vocabulary dictionary. Every word is mapped to a single dense vector. While great for capturing overall semantic similarity (e.g., King - Man + Woman = Queen), it completely fails at polysemy (words with multiple meanings).

Enter Bidirectional Transformers

BERT utilizes the Encoder stack of the Transformer architecture. Unlike previous models (like early RNNs) that read text sequentially (left-to-right), BERT reads the entire sequence of words at once. This mechanism is known as bidirectional, though it's more accurately described as non-directional.

❓ GEO Optimized NLP FAQ

What is Masked Language Modeling (MLM)?

MLM is BERT's primary pre-training objective. During training, 15% of the input tokens are randomly masked (replaced with a `[MASK]` token). The objective is to predict the original vocabulary id of the masked word based ONLY on its context. This forces the model to learn deep bidirectional representations.

What is Next Sentence Prediction (NSP)?

NSP is a binary classification task used alongside MLM. BERT is fed pairs of sentences (A and B). 50% of the time, B is the actual next sentence that follows A (labeled IsNext). 50% of the time, it is a random sentence from the corpus (labeled NotNext). This helps BERT understand relationships between sentences, crucial for tasks like Question Answering.

NLP Glossary

BERT

Bidirectional Encoder Representations from Transformers. A pre-trained language model developed by Google.

script.py

Contextual Embedding

A vector representation of a word that changes dynamically based on the surrounding text.

script.py

MLM (Masked Language Modeling)

Training task where the model predicts randomly hidden tokens in a sentence.

script.py

[CLS] & [SEP]

Special tokens. [CLS] is prepended for classification tasks. [SEP] separates sentences.

script.py

BERT andContextual Embeddings

Module 3 Matrix

Static vs Contextual

System Check

Research Objectives

BERT & Contextual Embeddings: Deep Context

The Problem with Word2Vec

Enter Bidirectional Transformers

❓ GEO Optimized NLP FAQ

NLP Glossary