A chatbot without memory is just a search engine in a chat window. True conversational AI requires context. In this lesson, you'll learn how to architect stateful memory for your automated assistants.
1The Stateless Problem
By default, LLM APIs (like OpenAI's chat endpoint) are completely Stateless. Each API call is a blank slate — the model has no idea what was said in the previous call. This is by design; it makes the API simpler and cheaper to run. But it's your problem to solve.
The consequence is immediate: if a user says 'My name is Alex' in turn 1, and then asks 'What is my name?' in turn 3, a stateless integration will answer 'I don't know'. From the model's perspective, it genuinely doesn't. It never saw turn 1.
This is the most common mistake in AI chatbot development: people assume the model 'remembers'. It doesn't. Your workflow is the memory. The model is just a stateless function that takes input and returns output. Making it conversational is entirely your responsibility as the builder.
// WRONG: Stateless call (AI forgets everything)
const response = await openai.chat.completions.create({
messages: [
{ role: 'user', content: 'What is my name?' } // AI has no idea!
]
});
// CORRECT: Stateful call (AI has history)
const response = await openai.chat.completions.create({
messages: [
{ role: 'user', content: 'My name is Alex' },
{ role: 'assistant', content: 'Nice to meet you, Alex!' },
{ role: 'user', content: 'What is my name?' }
]
});2Building Stateful Memory
To make an AI conversation stateful, your workflow must manage a Conversation Array — a structured list of every message exchanged, in order, with role labels (user or assistant). On each new message, you read the stored history, append the new user message, call the API with the full array, then append the AI's response and save it back.
For multi-user bots, you need Session IDs to keep histories separate. A Session ID can be the user's phone number, a browser cookie, or a database-generated UUID. Every database write and read uses this ID as a key. Without it, every user would see a shared, mixed-up history — a critical privacy bug.
The data must live in external storage (Redis, Postgres, Supabase) not inside the n8n workflow itself. Workflows are ephemeral — when an execution ends, all local data evaporates. Your database is the only thing that survives between runs.
// n8n Code Node: Memory write/read
const sessionId = $input.item.json.userId;
// 1. Read existing history from database
const history = await db.getHistory(sessionId);
// 2. Add new user message
history.push({ role: 'user', content: newMessage });
// 3. Call AI with full context
const aiReply = await callLLM(history);
// 4. Append AI reply and save
history.push({ role: 'assistant', content: aiReply });
await db.saveHistory(sessionId, history);3Window Buffer & Summarization
Every LLM has a hard Context Window limit — the maximum number of tokens it can process in one request. GPT-4o's limit is 128,000 tokens. Sounds big until you realize a busy support chatbot conversation can grow to millions of tokens over days. Send too much and the API throws a context_length_exceeded error.
The simple fix is a Window Buffer: only keep the last N messages (e.g., 20). When the array exceeds that limit, shift out the oldest entries. This is a memory.shift() operation. Simple, but it causes the AI to 'forget' important early context.
The advanced fix is Summarization Memory: use a cheap, fast model (GPT-4o-mini) to compress old messages into a short paragraph before discarding them. That summary gets injected into the System Prompt as 'background context'. The AI doesn't lose the information — it gets a compressed version instead.
// Window Buffer implementation
const MAX_MESSAGES = 20;
if (history.length > MAX_MESSAGES) {
const overflow = history.splice(0, history.length - MAX_MESSAGES);
// Optional: summarize overflow before discarding
const summary = await summarize(overflow);
systemPrompt += `\n\nEarlier context: ${summary}`;
}