An LLM is like a person with a 10-second memory. To have a real conversation, you must write everything down and read it back to them every time you speak.
1The Stateless Nature of LLMs
It's crucial to understand that modern AI APIs are inherently completely stateless. When you send a message, the API immediately forgets it the moment the response is finished. To build a genuinely interactive chat application, the burden is entirely on you to manage the Conversation History.
We use a rigid, standardized JSON format structured as a strict array of message objects. Each object has a specific 'Role': System for core instructions, User for human input, and Assistant for the AI's replies.
// Standard Message History Array
const history = [
{
role: "system",
content: "You are a senior developer tutoring a junior."
},
{
role: "user",
content: "What is statelessness?"
},
{
role: "assistant",
content: "It means the API has no memory of past requests."
}
];User: What is math?
Assistant: Math is...
2Storage & Session Management
In production, we rely on blazingly fast databases like Redis to securely house these histories. Every single time a user hits 'send', your backend must instantly query the database, retrieve the entire historical array, append the new message, and then transmit that massive block to the AI.
Because active sessions bloat quickly, you must implement aggressive Session Management using Time-To-Live (TTL) settings to quietly archive or delete old, abandoned chats.
// Fetching history from Redis before calling API
const chatId = 'session_123';
const activeHistory = await redis.get(chatId) || [];
const response = await openai.chat.completions.create({
model: "gpt-4o",
messages: [...activeHistory, { role: "user", content: userInput }]
});
// Update Redis with new messages and reset TTL
await redis.set(chatId, newHistory, { EX: 60 * 60 * 24 }); // 24 hoursRetrieval: < 2ms
TTL: 24 Hours
Status: Active Session Loaded
3Hybrid Archival & Threading
A highly sophisticated architecture utilizes a Hybrid Storage Pattern. We store the active session inside an in-memory Redis cache to guarantee sub-millisecond latency. When the user safely closes the browser tab, a background worker flushes that entire chat history into a cheaper PostgreSQL database for permanent archival.
Furthermore, as your product matures, you will need to implement Advanced Threading, allowing a single power user to maintain multiple, mathematically isolated conversation threads simultaneously.
// Hybrid Archival Worker (Cron Job)
async function archiveStaleSessions() {
const staleSessions = await redis.getExpiredSessions();
for (const session of staleSessions) {
// 1. Move to cheap, long-term SQL storage
await postgres.insert('chat_archives', session.data);
// 2. Delete from expensive Redis memory
await redis.delete(session.id);
}
}Archived Chats (SQL) -> 2.4 Million
Thread 2: 'Vacation Plan' (Archived)
