Perception is everything. In AI, a response that starts appearing in 1 second is 'faster' than one that appears fully in 5 seconds.
1The Streaming Pattern
Streaming is the process of breaking a large HTTP response into smaller 'chunks' or 'tokens'. As each token arrives from the AI provider, the frontend UI updates immediately. This creates the 'typewriter effect' that has become the gold standard for LLM interaction.
2Markdown & Formatting
AI models often output structured text like bolding, lists, and code blocks. Rendering this in real-time requires robust libraries like react-markdown that can handle 'broken' or incomplete syntax as it's being built, ensuring the UI remains stable during the generation.
