Sequence to Sequence Models (Seq2Seq)

Seq2Seq Models: Mapping Meaning

Pascual Vila

Data Science Instructor // Code Syllabus

Many NLP tasks—like translation, summarization, and dialogue generation—require mapping an input sequence of varying length to an output sequence of varying length. The Sequence-to-Sequence (Seq2Seq) framework was the breakthrough that made this possible.

The Architecture: Encoder and Decoder

A standard Seq2Seq model consists of two recurrent neural networks (RNNs or LSTMs):

The Encoder: It processes the input sequence (e.g., "Hello world") token by token. At each step, it updates its internal hidden state. After reading the entire sequence, its final hidden state acts as a summary of the input.
The Decoder: It is trained to generate the output sequence. It initializes its hidden state with the Encoder's final state and begins generating tokens one by one, usually starting with a special [SOS] (Start of Sequence) token.

The Context Vector (The Bottleneck)

The link between the Encoder and Decoder is the Context Vector. This is a fixed-length vector that must contain the mathematical representation of the entire input sentence.

While revolutionary, this architecture has a fatal flaw: the bottleneck. Because the context vector has a fixed size (e.g., 256 or 512 dimensions), the model struggles to remember the beginning of very long sentences by the time it reaches the end.

View Training Tip: Teacher Forcing+

Teacher Forcing: During training, the Decoder predicts the next word based on its previous prediction. If it makes an error early on, the rest of the sequence will be garbage. Teacher Forcing is a technique where the actual target word is passed as the next input during training, rather than the Decoder's guess, drastically speeding up convergence.

❓ Frequently Asked Questions (Seq2Seq)

What is a Sequence-to-Sequence (Seq2Seq) model?

A Sequence-to-Sequence (Seq2Seq) model is a machine learning architecture designed to transform an input sequence (like a sentence in English) into an output sequence (like a sentence in French). It consists of an Encoder that reads the input and a Decoder that generates the output.

What is the role of the Context Vector in Seq2Seq?

The Context Vector is the final hidden state produced by the Encoder. It acts as a dense, mathematical summary of the entire input sequence. The Decoder uses this vector as its initial state to begin generating the output sequence.

Why do basic Seq2Seq models struggle with long paragraphs?

Basic Seq2Seq models suffer from the "bottleneck problem." Because the Context Vector is a fixed size, forcing the Encoder to compress a massive amount of information (like a long paragraph) into it results in data loss, causing the model to forget earlier parts of the text. This limitation led to the development of Attention Mechanisms.

NLP Glossary: Seq2Seq

Encoder

The neural network component that reads the input sequence token by token and compresses it into a hidden state representation.

python_snippet.py

encoder_hidden = encoder(input_tokens)

Decoder

The component that generates the output sequence step-by-step, starting from the Context Vector provided by the Encoder.

python_snippet.py

output_token = decoder(previous_token, hidden_state)

Context Vector

A fixed-length latent representation (vector) containing the semantic meaning of the entire input sequence. Acts as the bridge between Encoder and Decoder.

python_snippet.py

context_vector = encoder_hidden_final

Teacher Forcing

A training technique where the ground truth target token is fed as the next input to the Decoder, rather than the Decoder's own prediction.

python_snippet.py

if random() < teacher_forcing_ratio: decoder_input = target_tensor[i]

[SOS] and [EOS]

Special tokens. SOS (Start of Sequence) triggers the Decoder to begin. EOS (End of Sequence) tells the Decoder to stop generating.

python_snippet.py

if predicted_token == EOS_token: break

Auto-regressive

A property of models where the prediction at step 't' is used as an input for step 't+1'. The Decoder operates auto-regressively.

python_snippet.py

decoder_input = predicted_token

Sequence-To-Sequence Models

Architecture Graph

Module: Encoder

Evaluation Metric

Architecture Challenges

Data Scientist Hub

Discuss Architectures

Seq2Seq Models: Mapping Meaning

The Architecture: Encoder and Decoder

The Context Vector (The Bottleneck)

❓ Frequently Asked Questions (Seq2Seq)

NLP Glossary: Seq2Seq