Knowledge Base: Auto-RAG

Building autonomous brains by syncing PDF and Notion data to Vector Stores.

rag_engine_core_v2

1 / 4

VECTORIZING DATA...

📄 ➡️ 🧠 ➡️ 💬

LOG:Step 1: Load and Chunk. Large PDFs exceed LLM tokens, so we split them into overlap chunks.

Step 1: Recursive Chunking

Using libraries like LangChain to scrape text while maintaining metadata (page numbers, source URLs). Essential for source attribution.

📄

Master PDF chunking & cleaning.

💾

Embed data into Pinecone/Supabase.

📝

Automate workspace ingestion via API.