Most RAG systems fail at this simple question: "What's the most common GitHub issue AND what are people saying about it?" Vanilla RAG follows a simple pattern: query -> retrieve -> generate. It's effective for straightforward question-answering, but struggles when tasks get complex. Let's say you ask: "What's the most common GitHub issue from last month, and what are people saying about it in our internal chat?" Traditional RAG would try to match your entire query to one knowledge source. It might find something relevant, but probably not exactly what you need. Agentic RAG works differently: 1. 𝗣𝗹𝗮𝗻𝗻𝗶𝗻𝗴: The agent breaks down your query into subtasks (select a tool to query the GitHub issues from last month, build a query to fetch the most common one, search internal chat for mentions) 2. 𝗧𝗼𝗼𝗹 𝗨𝘀𝗲: It routes the first part to your GitHub database, gets results, then routes the second part to your chat system using context from the first search 3. 𝗥𝗲𝗳𝗹𝗲𝗰𝘁𝗶𝗼𝗻: The agent validates the retrieved information and can re-query if something doesn't look right This is really promising for complex queries that need multiple data sources or multi-step reasoning. 𝗧𝗵𝗲 𝘁𝗿𝗮𝗱𝗲𝗼𝗳𝗳𝘀: Agentic RAG typically requires multiple LLM calls instead of one. This means added latency and cost. It is also much more complex to develop, deploy and maintain. Here's my recommendation: For many use-cases, a simple RAG pipeline is sufficient, but if you are dealing with complex queries, response quality is very important and your users can afford waiting a few extra seconds - an Agentic RAG workflow is probably better suited for your use-case. The architecture can be simple (a single router agent) or complex (multiple specialized agents coordinating). You can have one agent that retrieves from your internal docs, another that searches the web, and a coordinator that decides which to use. For more information, my colleagues did a very nice Blog post about the different Agentic workflow: https://lnkd.in/eS2mFxUF
I think that agentic rag can be used relatively cheaply if you use models smartly. Tiny fine-tuned models with single responsibility become as good as multi-billion parameter LLMs.
I forgot to mention it - but Weaviate Cloud does include an out-of-the-box Query Agent that is doing the heavy lifting for you :-) The Query Agent is an agent aware of your schema & your data that will build the most relevant queries automatically. qa = QueryAgent( client=client, collections=["GitHub_Issues", "GitHub_Chats"], system_prompt=system_prompt, ) response = qa.ask("What's the most common GitHub issue AND what are people saying about it?") Checkout https://docs.weaviate.io/agents/query for more information :-)