Why can't I store memory inside the n8n node itself?

Automation workflows run in ephemeral executions. When the execution ends, all local variables are gone. You must use an external database (Redis, Postgres, Supabase) or n8n's specific Memory nodes to persist data between runs.

Does sending the full conversation history cost more money?

Yes, significantly. Every token in the history counts toward your usage bill. A 100-turn conversation could cost 10x more than a single-message call. This is why Window Buffering and Summarization Memory aren't optional — they're essential for cost control.

What happens when the context window is exceeded?

The API call fails immediately with a context_length_exceeded error. Your automation crashes and the user gets no response. Implement Window Buffer limits before you hit production, not after.

Chatbot Memory & Context Architecture

A chatbot without memory is just a search engine in a chat window. True conversational AI requires context. In this lesson, you'll learn how to architect stateful memory for your automated assistants.

1The Stateless Problem

By default, LLM APIs (like OpenAI's chat endpoint) are completely Stateless. Each API call is a blank slate — the model has no idea what was said in the previous call. This is by design; it makes the API simpler and cheaper to run. But it's your problem to solve.

The consequence is immediate: if a user says 'My name is Alex' in turn 1, and then asks 'What is my name?' in turn 3, a stateless integration will answer 'I don't know'. From the model's perspective, it genuinely doesn't. It never saw turn 1.

This is the most common mistake in AI chatbot development: people assume the model 'remembers'. It doesn't. Your workflow is the memory. The model is just a stateless function that takes input and returns output. Making it conversational is entirely your responsibility as the builder.

editor.html

// WRONG: Stateless call (AI forgets everything)
const response = await openai.chat.completions.create({
  messages: [
    { role: 'user', content: 'What is my name?' } // AI has no idea!
  ]
});

// CORRECT: Stateful call (AI has history)
const response = await openai.chat.completions.create({
  messages: [
    { role: 'user', content: 'My name is Alex' },
    { role: 'assistant', content: 'Nice to meet you, Alex!' },
    { role: 'user', content: 'What is my name?' }
  ]
});

localhost:3000

2Building Stateful Memory

To make an AI conversation stateful, your workflow must manage a Conversation Array — a structured list of every message exchanged, in order, with role labels (user or assistant). On each new message, you read the stored history, append the new user message, call the API with the full array, then append the AI's response and save it back.

For multi-user bots, you need Session IDs to keep histories separate. A Session ID can be the user's phone number, a browser cookie, or a database-generated UUID. Every database write and read uses this ID as a key. Without it, every user would see a shared, mixed-up history — a critical privacy bug.

The data must live in external storage (Redis, Postgres, Supabase) not inside the n8n workflow itself. Workflows are ephemeral — when an execution ends, all local data evaporates. Your database is the only thing that survives between runs.

editor.html

// n8n Code Node: Memory write/read
const sessionId = $input.item.json.userId;

// 1. Read existing history from database
const history = await db.getHistory(sessionId);

// 2. Add new user message
history.push({ role: 'user', content: newMessage });

// 3. Call AI with full context
const aiReply = await callLLM(history);

// 4. Append AI reply and save
history.push({ role: 'assistant', content: aiReply });
await db.saveHistory(sessionId, history);

localhost:3000

3Window Buffer & Summarization

Every LLM has a hard Context Window limit — the maximum number of tokens it can process in one request. GPT-4o's limit is 128,000 tokens. Sounds big until you realize a busy support chatbot conversation can grow to millions of tokens over days. Send too much and the API throws a context_length_exceeded error.

The simple fix is a Window Buffer: only keep the last N messages (e.g., 20). When the array exceeds that limit, shift out the oldest entries. This is a memory.shift() operation. Simple, but it causes the AI to 'forget' important early context.

The advanced fix is Summarization Memory: use a cheap, fast model (GPT-4o-mini) to compress old messages into a short paragraph before discarding them. That summary gets injected into the System Prompt as 'background context'. The AI doesn't lose the information — it gets a compressed version instead.

editor.html

// Window Buffer implementation
const MAX_MESSAGES = 20;

if (history.length > MAX_MESSAGES) {
  const overflow = history.splice(0, history.length - MAX_MESSAGES);

  // Optional: summarize overflow before discarding
  const summary = await summarize(overflow);
  systemPrompt += `\n\nEarlier context: ${summary}`;
}

localhost:3000

Chatbot Memory & Context Architecture

Skill Matrix

Memory Hub

Interactive Challenges

1The Stateless Problem

2Building Stateful Memory

3Window Buffer & Summarization

?Frequently Asked Questions

Lesson Glossary

[01]Stateless

[02]Stateful

[03]Session ID

[04]Window Buffer

[05]Context Window

[06]Summarization Memory

Article Contents