Tips for Improving Retrieval with Agentic Agents

Explore top LinkedIn content from expert professionals.

Summary

Agentic agents in AI are designed to independently make decisions, retrieve relevant information, and act based on context, making them a key component in advanced Retrieval-Augmented Generation (RAG) systems. Improving retrieval processes for these agents involves creating smarter strategies to ensure they access and use the most relevant data effectively, even in vast and complex datasets.

  • Focus on context management: Provide agents with deliberate and dynamic context by retrieving only relevant data, summarizing essential information, and clearing outdated data before progressing to the next step.
  • Adopt innovative retrieval techniques: Use methods like dynamic few-shot prompting, metadata-enriched chunks, and contextual summaries to make your agent smarter and reduce errors or irrelevant data handling.
  • Design modular agent workflows: Build retrieval pipelines with tools like vector search, summarization, and question-answering modules to handle specific tasks while ensuring flexibility to adapt to different use cases, corpora, or query types.
Summarized by AI based on LinkedIn member posts
  • View profile for Chip Huyen
    Chip Huyen Chip Huyen is an Influencer

    Building something new | AI x storytelling x education

    298,295 followers

    Very useful tips on tool use and memory from Manus's context engineering blog post. Key takeaways: 1. Reversible compact summary Most models allow 128K context, which can easily fill up after a few turns when working with data like PDFs or web pages. When the context gets full, they have to compact it. It’s important to compact the context so that it’s reversible. Eg, removing the content of a file/web page if the path/URL is kept. 2. Tool use Given how easy it is to add new tools (e.g., with MCP servers), the number of tools a user adds to an agent can explode. Too many tools make it easier for the agent to choose the wrong action, making them dumber. They caution against removing tools mid-iteration. Instead, you can force an agent to choose certain tools with response prefilling. Ex: starting your response with <|im_start|>assistant<tool_call>{"name": “browser_ forces the agent to choose a browser. Name your tools so that related tools have the same prefix. Eg: browser tools should start with `browser_`, and command line tools should start with `shell_` 3. Dynamic few shot prompting They cautioned against using the traditional few shot prompting for agents. Seeing the same few examples again and again will cause the agent to overfit to these examples. Ex: if you ask the agent to process a batch of 20 resumes, and one example in the prompt visits the job description, the agent might visit the same job description 20 times for these 20 resumes. Their solution is to introduce small structured variations each time an example is used: different phrasing, minor noise in formatting, etc Link: https://lnkd.in/gHnWvvcZ #AIAgents #AIEngineering #AIApplications

  • View profile for Daniel Svonava

    Build better AI Search with Superlinked | xYouTube

    38,362 followers

    Stop Wasting Time with in Research Papers! Build an AI Agent That Gets It. 🧠⚡️ This notebook shows how to build an AI agent that finds relevant AND recent papers, summarizes them, and answers your questions – without the usual search system headaches. The Problem 😩: Too many papers, too little time. Traditional search needs complex, slow reranking. The Fix (Superlinked Magic ✨): We combine: ▪️ What it's about: Semantic search (TextSimilaritySpace) ▪️ When it was published: Temporal relevance (RecencySpace w/ time penalties)...into ONE smart vector index. Result? Accurate search that already considers recency. Bye-bye, reranking! 👋 Here's the Playbook 🏗️: 1️⃣ Prep Data: Load ArXiv paper info (title, summary, publish date).   2️⃣ Define Search DNA: Tell Superlinked how to understand text + time using Schema and Spaces, use RecencySpace to encode time!   3️⃣ Build the Index: Combine the spaces into one searchable Superlinked Index. 4️⃣ Set-up Tools:   ▪️ RetrievalTool: Finds papers using the index (balancing relevance & recency weights).   ▪️ SummarizationTool: Condenses papers using an LLM.   ▪️ QuestionAnsweringTool: Answers questions using paper context (or general knowledge if needed).  5️⃣ Assemble the Agent: A KernelAgent smartly routes your query ("find," "summarize," "answer?") to the right tool using an LLM for classification. Why This Rocks 🔥: ▪️ No More Reranking: Semantic + temporal search in one shot = accuracy without complexity. ▪️ Recency Matters: Time penalties automatically prioritize newer relevant papers. ▪️ Modular Power: Clean tools handle specific jobs. Easy to extend. ▪️ Flexible Search: Tune weights to favour relevance (1.0) or recency (0.5) as needed. ▪️ Doesn't Dead-End: QA tool uses paper context first, then general knowledge. That's the gist! 🚀 Dig into the notebook code to see it in action! 👇

  • View profile for Josh Reini

    Evals, OSS and AI @ ❄️

    5,001 followers

    Your retriever can’t rank what it can’t see. Add doc-level summaries to boost retrieval performance for RAG. If your retriever can’t see document‑level context, it will keep surfacing the wrong chunks (and your LLM will hallucinate to fill the gaps). Here are 2 simple ways to inject document-level summaries into retrieval and actually move the needle: 1️⃣ Contextual Chunks (Prepend the summary) Attach the doc summary to every chunk before indexing. Now each chunk carries both local detail and global intent. 👉 Use when: You want a single retrieval step with richer signals—no extra orchestration. 2️⃣ Two‑Stage Search (Doc → Chunk) First retrieve the right documents using only their summaries. Then search within those docs to find the exact chunks you need. 👉 Use when: Your corpus is large, latency matters, and you want to slash noise before scoring chunks. 📚 I'll drop the gist for setting this up in Snowflake in the comments. If you want to learn more, I'll also include a recent blog from Rajhans Samdani and team showing how contextual chunking boosted RAG performance for finance.

  • View profile for Anastasiia S.

    Vice President | PHD | TOP AI Voice | Associate Prof. | GTM Strategist | AI startups marketing advisor | Helping AI startups with GTM | Operation Leader |GenAI community leader | @Generative AI | @GenAI.Works | @Wand AI

    36,685 followers

    Why 90% of AI Agents Break Beyond Demos. Building Production-Grade AI Agents: A 5-Step Roadmap (see the useful links in comments) Most AI agents look great in a demo…but the second they hit real users? They break. Edge cases. Scaling issues. Spaghetti prompts. Here is a 5-step roadmap to help teams and solo builders take agents from fragile prototypes to scalable, reliable systems. ◾ Step 1: Master Python for Production AI Core skills to master: - FastAPI: Build secure, lightweight endpoints for your agents. - Async Programming: Handle I/O-bound tasks efficiently (API calls, DB queries) without bottlenecks. - Pydantic: Ensure predictable, validated data flows in and out of your agent. ◾Step 2: Make Your Agent Stable and Reliable Key practices: - Logging: Treat logs as your X-ray vision. Capture errors, edge cases, and unexpected behaviors. - Testing: - Unit Tests for quick bug detection. - Integration Tests to validate end-to-end flows, tools, prompts, and APIs. ◾Step 3: Go Deep on Retrieval-Augmented Generation (RAG) Foundations: - Understand RAG: Learn its role in making agents context-aware. - Embeddings & Vector Stores: Store and retrieve knowledge based on relevance. - PostgreSQL Alternative: For simpler use cases, a well-indexed relational DB may outperform a vector database. Optimizations: - Chunking Strategies: Proper text splitting improves retrieval performance dramatically. - LangChain Integration: Orchestrate embeddings, retrieval, LLM calls, and responses. - Evaluation: Measure quality using precision, recall, and other metrics. ◾Step 4: Define a Robust Agent Architecture (with GenAI AgentOS) An agent is more than a prompt. It’s a system with state, structure, and control. To make that possible, leverage frameworks like GenAI AgentOS. -> https://lnkd.in/dNnwrbFt It provides: - Agent registration and routing: Cleanly bind agents via decorators and manage how they communicate. - State and orchestration logic: Built-in handling for retries, context, and messaging between agents. - WebSocket and Dockerized backend: Smooth deployment and scalable real-time processing. TIP: Pair it with: LangGraph, Prompt Engineering, and SQLAlchemy + Alembic. ◾Step 5: Monitor, Learn, and Improve in Production (with GenAI AgentOS Hooks) Monitoring: - Use built-in logging and context features from AgentOS as a foundation. - Layer on tools like Langfuse or custom dashboards for deeper observability. - User Insights: Analyze interactions for confusion points and failure patterns. - Continuous Iteration: Refine prompts, update tools, and fix edge cases regularly. This isn’t just about better engineering. It’s about building agents that last — not just demos, but systems with memory, reasoning, and resilience. Commit to this, and your agents won’t just survive in production — they’ll thrive. #AI #MachineLearning #AIAgents #AgenticAI Credits: Paolo Perrone

  • View profile for Sophia Yang, Ph.D.

    Head of Developer Relations @ Mistral AI

    85,077 followers

    RAG faces a lot of challenges when it comes to effectively retrieving relevant information and generating high-quality responses. How can we improve RAG? One specific issue is that using the same big text chunk for retrieval and synthesis is not optimal when there is a lot of filler text in the text chunk. The concept behind small-to-big retrieval is to use smaller text chunks during the retrieval process and subsequently provide the larger text chunk to which the retrieved text belongs to the large language model. There are two primary techniques implemented in LlamaIndex: 1. Smaller Child Chunks Referring to Bigger Parent Chunks: Fetch smaller chunks during retrieval first, then reference the parent IDs, and return the bigger chunks.  2. Sentence Window Retrieval: Fetch a single sentence during retrieval and return a window of text around the sentence. 🔗 Blog: https://lnkd.in/gzzh2cMw 🔗 Video: https://lnkd.in/gYxjB_bm

  • View profile for Nate Herkelman

    Scale Without Increasing Headcount | Founder & CEO @ Uppit AI

    36,278 followers

    This is Why Your RAG Agents Suck Most people building RAG pipelines are missing one key ingredient: metadata. The problem with chunk-based retrieval is that the AI searches for relevant chunks, but has no idea where it came from, when, or why it’s relevant. → That’s a huge problem. You’re left with “facts,” but no structure. Metadata isn't a 100% solve, but it helps close that gap, and enriches our chunks with more context, leading to smarter RAG. I just dropped a 15 minute YouTube video where I walk through a no-code n8n RAG pipeline that uses YouTube transcripts as source material and stores them in Supabase with metadata like: → Video title → YouTube URL → Start/end timestamps of the transcript chunk Now when I ask a question, the AI not only gives me the answer, it cites the exact video and moment it came from. This is a huge step toward building intelligent, trustworthy agents. It doesn’t solve all the challenges of chunk-based retrieval, but it gives your RAG system richer context, better relevance, and much more transparency. 📺 Check out the full video here: https://lnkd.in/gYF77wKU 📚 Join the #1 community for learning and mastering AI automations: https://lnkd.in/dqVsX4Ab

  • View profile for Josh Spilker

    Content & SEO @ AirOps | Ex-ClickUp / Toptal | Webinar Host

    11,748 followers

    Want LLMs to cite your content? Structure it like this: AI doesn’t reward opinions. It rewards clarity, structure, and retrievability. Based on our analysis at AirOps, here’s the 3-part framework to boost your AI visibility 👇 🔹 1. Follow Sequential Headings → Clear H1 → H2 → H3 structure = 2.8x more citations. → LLMs parse predictable formats better than freestyle text. → Every heading is a chance to clarify what comes next. This is even more important for LLMs than for Google SERPs 🔹 2. Use Rich Schema → Pages with rich schema are 13% more likely to earn AI citations. → Add question-based headers, definitions, and semantic markup. → Think like a database: make your insights machine-readable. 🔹 3. Write in Clear, Chunked Sections → Nearly 50% of all citations go to content with easy-to-read blocks. → Each section should answer one question — fully. → Use bullets, bolds, and spacing to aid comprehension (for humans and machines). LLMs synthesize, they don't search. If you don’t structure for them, they’ll skip you. More data & examples here → https://lnkd.in/ej7Q9JZK

  • View profile for Anita Kirkovska

    Head of Growth @ Vellum AI

    12,806 followers

    AI Agents don’t need more tools. They need better context. Most failures we see in agents come down to one thing: The model is looking at the wrong information, or too much of it. More tools won’t fix that. Neither will spinning up more agents to talk to each other. If you want effective agents, you need to do the hard part: manage what the model sees at every step; dynamically, deliberately, and with intent. With 1M+ context windows now available, this isn’t optional. And the best teams we know treat context like a runtime system: - retrieve what matters - summarize what changed - clean up before moving on - repeat We worked with Lee Gaul to create the most practical guide on how Context Engineering is powering the most impactful agents (e.g. Devin, Deep Research) today, and the mindset your team needs to build faster and smarter. Read it here: https://lnkd.in/d6Tvzc-p 🕭 Other useful resources Anthropic on “How they build multi-agent systems”: https://lnkd.in/damcvbe7 Cognition's “Don’t build multi-agent systems” article: https://lnkd.in/dVF-Dpqe

  • View profile for Sohrab Rahimi

    Partner at McKinsey & Company | Head of Data Science Guild in North America

    20,511 followers

    Retrieval-Augmented Generation (RAG) is at the forefront of business applications for Generative AI and LLMs, serving as a pivotal method for extracting knowledge from vast datasets. The design of RAG systems is crucial, with the retrieval component being paramount. Selecting the relevant information to feed into the generative process is vital, as it directly influences the accuracy and relevance of the output, making it a critical aspect of the system's efficacy. The paper "The Power of Noise: Redefining Retrieval for RAG Systems" investigates how to improve RAG systems by focusing on innovative retrieval strategies. Here are three practical takeaways for data scientists designing RAG systems: 1. First and foremost, 𝐢𝐧𝐜𝐨𝐫𝐩𝐨𝐫𝐚𝐭𝐢𝐧𝐠 '𝐧𝐨𝐢𝐬𝐲' 𝐨𝐫 𝐬𝐞𝐞𝐦𝐢𝐧𝐠𝐥𝐲 𝐢𝐫𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝐝𝐨𝐜𝐮𝐦𝐞𝐧𝐭𝐬 𝐢𝐧𝐭𝐨 𝐭𝐡𝐞 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐚𝐥 𝐩𝐫𝐨𝐜𝐞𝐬𝐬 𝐨𝐟 𝐚 𝐑𝐀𝐆 𝐬𝐲𝐬𝐭𝐞𝐦 𝐜𝐚𝐧 𝐩𝐚𝐫𝐚𝐝𝐨𝐱𝐢𝐜𝐚𝐥𝐥𝐲 𝐢𝐦𝐩𝐫𝐨𝐯𝐞 𝐢𝐭𝐬 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 𝐛𝐲 𝐦𝐨𝐫𝐞 𝐭𝐡𝐚𝐧 𝟑𝟎%! The presence of such documents might introduce a wider variety of context and information, stimulating the generative model to produce more accurate or nuanced responses. 2. Not surprisingly, the placement of the correct answer (gold document) among the search results significantly affects the accuracy of the model's answers. This suggests 𝐫𝐚𝐧𝐤𝐢𝐧𝐠 𝐝𝐨𝐜𝐮𝐦𝐞𝐧𝐭𝐬 𝐢𝐧 𝐚 𝐰𝐚𝐲 𝐭𝐡𝐚𝐭 𝐩𝐫𝐢𝐨𝐫𝐢𝐭𝐢𝐳𝐞𝐬 𝐫𝐞𝐥𝐞𝐯𝐚𝐧𝐭 𝐢𝐧𝐟𝐨𝐫𝐦𝐚𝐭𝐢𝐨𝐧 𝐜𝐚𝐧 𝐠𝐫𝐞𝐚𝐭𝐥𝐲 𝐢𝐧𝐟𝐥𝐮𝐞𝐧𝐜𝐞 𝐭𝐡𝐞 𝐦𝐨𝐝𝐞𝐥'𝐬 𝐞𝐟𝐟𝐞𝐜𝐭𝐢𝐯𝐞𝐧𝐞𝐬𝐬. 3. 𝐓𝐡𝐞 𝐩𝐞𝐫𝐟𝐨𝐫𝐦𝐚𝐧𝐜𝐞 𝐯𝐚𝐫𝐢𝐞𝐬 𝐰𝐢𝐭𝐡 𝐭𝐡𝐞 𝐭𝐲𝐩𝐞 𝐨𝐟 𝐫𝐞𝐭𝐫𝐢𝐞𝐯𝐞𝐫 𝐮𝐬𝐞𝐝 (𝐝𝐞𝐧𝐬𝐞 𝐯𝐬. 𝐬𝐩𝐚𝐫𝐬𝐞). Sparse retrievers like BM25 are often recommended when the search space is large, and exact keyword matches are critical for retrieval. They are efficient and scalable, making them suitable for applications where interpretability and simplicity are valued. Dense methods, in contrast, excel in scenarios requiring understanding of complex semantic relationships beyond exact keyword matches. Paper: https://lnkd.in/evHJKpnJ

  • View profile for Sangeetha Venkatesan

    NLP Engineer | Information Retrieval | Insurance domain | RAG

    4,607 followers

    🧢 Having started with intent classification, recommendation system in conversational AI, semantic similarity and having a vector within semantic vector space was quite exciting. In retrieval, semantic search alone can't solve various questions that might be comprehended by the agent. Loosening the search to include more search results with respect to different aspects - adds a good balance for reasoning model to have better recall. ⛵ Looking forward to see Cohere take on embedding with multi-aspect data - thinking about contextual relationship between the data in the scope of enterprise data like claim documents, invoices, guidelines etc. - a normal RAG pipeline chunks the documents, creates an overlap, indexes - retrieval of chunks + reasoning. Search is not scattered across many factors - might be content, or topic combined content - often its a search/filter + metadata extraction. Cohere's representation take on this, 1) Having set of documents having good contextual relationship, dependency in resolving the context - these docs are converted to JSON using compass - in a way for the embedding model to apply vector representations. 2) The JSON doc is then sent to Compass embedding model resulting in vector representation holding data + contextual preservations, hence search is greatly improved covering aspects of data source. 📣 This is a good direction for enterprise data, since apart from building RAG, optimizing RAG to cover broaden and narrowed questions, still striking a balance in accuracy is often the harder part. It depends on domain, use case, prompting specific to the use case. 🔈 This is a good example given in the cohere blog - First cohere embeddings PR - covering - time aspect, semantic aspect, type aspect. There comes an agent aspect decomposing the search question into multiple questions and orchestrating the results chained. Having this aspect decomposition in retrieval is great. 🔍 Question - Latest Operational Risk update This query contains a time aspect (latest), a semantic subject (operational risk) - referring to the Risk Assessment Guidelines for operational risk), and the type of content sought (update - implying the most recent guidelines or changes in the guidelines). As enterprise documents grow, there are different indices, different prompting for each index, agent workflows, retrieval aspects - having an orchestrator that might connect the different RAG systems with minimal changes on inclusion of new documents would be a great way for evolving RAG systems.

Explore categories