Genloop reposted this
#TuesdayPaperThoughts Edition 66: CLaRa This week's TuesdayPaperThoughts examines "CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning" from researchers at Apple and The University of Edinburgh. While RAG has been the go to solution for grounding LLMs since 2023, most systems still suffer from a fundamental architectural flaw that retrieval and generation operate in separate worlds. CLaRa proposes a rethink: what if they are unified? Key Takeaways: 1️⃣ Salient Compressor Pretraining (SCP): The framework introduces QA-driven and paraphrase-based data synthesis to train the compressor. Rather than token-by-token reconstruction, SCP focuses on semantic essentials. Compressed representations outperform text-based baselines by 2.36% on Mistral-7B despite using 16× less context. 2️⃣ Weakly Supervised Joint Training: CLaRa trains the query reasoner and generator end-to-end using only next-token prediction loss, with gradients flowing through a differentiable top-k estimator. No relevance labels needed. This approach aligns retrieval relevance with answer quality 3️⃣ Great Results: On HotpotQA with 4× compression, it achieves 96.21% Recall@5, exceeding the fully supervised BGE-Reranker (85.93%) by +10.28% proving that joint optimization with weak supervision can outperform explicit retrieval training. Can CLaRa become a new standard for RAG architectures? Research Credits: Jie He, He Bai, Sinead Williamson, Jeff Z. Pan, Navdeep Jaitly, Yizhe Zhang Paper Link: In comments