Vector Databases: Giving AI a Memory

Pascual Vila
AI Architect // Code Syllabus
"LLMs are brilliant but amnesiac. Without a vector database, your AI is trapped entirely within its training cutoff date."
Understanding Embeddings
Before we can search unstructured data (like text, images, or audio), we must convert it into a format machines can understand: mathematical arrays called embeddings. An embedding represents the "semantic meaning" of data. Words with similar meanings exist closer together in this multi-dimensional mathematical space.
Why Not PostgreSQL?
Standard relational databases query using exact keyword matches (e.g., SELECT * WHERE text LIKE '%dog%'). If the user searches for "canine", standard SQL finds nothing. Vector databases calculate distance metrics (like Cosine Similarity) between the user's query vector and stored vectors, allowing it to realize that "canine" and "dog" are mathematically close.
❓ AI Engineering FAQ
What is the difference between a relational database and a vector database?
Relational Databases (like PostgreSQL, MySQL) store data in rows and columns and rely on exact keyword or pattern matches. They are excellent for structured, tabular data.
Vector Databases (like Pinecone, Weaviate, Qdrant) store high-dimensional arrays (embeddings). They use algorithms like Approximate Nearest Neighbor (ANN) to find data based on semantic similarity, making them essential for AI context retrieval.
What is Cosine Similarity?
Cosine similarity is a mathematical metric used to determine how similar two vectors are, irrespective of their size. It measures the cosine of the angle between two vectors projected in a multi-dimensional space.
- 1.0: Vectors point in exactly the same direction (perfect semantic match).
- 0.0: Vectors are orthogonal (unrelated).
- -1.0: Vectors point in opposite directions (opposite meaning).
How do Vector Databases work with LLMs in RAG?
RAG (Retrieval-Augmented Generation) connects LLMs to custom data. The workflow is:
- The user sends a prompt.
- The application converts the prompt into an embedding vector.
- The application queries the vector database using that vector.
- The database returns the top K most semantically similar text chunks.
- The application injects those text chunks into the system prompt alongside the user's original query, allowing the LLM to generate an answer based on private data.