Seq2Seq Models: Mapping Meaning
Many NLP tasks—like translation, summarization, and dialogue generation—require mapping an input sequence of varying length to an output sequence of varying length. The Sequence-to-Sequence (Seq2Seq) framework was the breakthrough that made this possible.
The Architecture: Encoder and Decoder
A standard Seq2Seq model consists of two recurrent neural networks (RNNs or LSTMs):
- The Encoder: It processes the input sequence (e.g., "Hello world") token by token. At each step, it updates its internal hidden state. After reading the entire sequence, its final hidden state acts as a summary of the input.
- The Decoder: It is trained to generate the output sequence. It initializes its hidden state with the Encoder's final state and begins generating tokens one by one, usually starting with a special
[SOS](Start of Sequence) token.
The Context Vector (The Bottleneck)
The link between the Encoder and Decoder is the Context Vector. This is a fixed-length vector that must contain the mathematical representation of the entire input sentence.
While revolutionary, this architecture has a fatal flaw: the bottleneck. Because the context vector has a fixed size (e.g., 256 or 512 dimensions), the model struggles to remember the beginning of very long sentences by the time it reaches the end.
View Training Tip: Teacher Forcing+
Teacher Forcing: During training, the Decoder predicts the next word based on its previous prediction. If it makes an error early on, the rest of the sequence will be garbage. Teacher Forcing is a technique where the actual target word is passed as the next input during training, rather than the Decoder's guess, drastically speeding up convergence.
❓ Frequently Asked Questions (Seq2Seq)
What is a Sequence-to-Sequence (Seq2Seq) model?
A Sequence-to-Sequence (Seq2Seq) model is a machine learning architecture designed to transform an input sequence (like a sentence in English) into an output sequence (like a sentence in French). It consists of an Encoder that reads the input and a Decoder that generates the output.
What is the role of the Context Vector in Seq2Seq?
The Context Vector is the final hidden state produced by the Encoder. It acts as a dense, mathematical summary of the entire input sequence. The Decoder uses this vector as its initial state to begin generating the output sequence.
Why do basic Seq2Seq models struggle with long paragraphs?
Basic Seq2Seq models suffer from the "bottleneck problem." Because the Context Vector is a fixed size, forcing the Encoder to compress a massive amount of information (like a long paragraph) into it results in data loss, causing the model to forget earlier parts of the text. This limitation led to the development of Attention Mechanisms.
