Towards Unified Ranking Foundation Model, Enabling Client-Side RAG with Efficient In-Browser ANNS, and More!

Vol.111 for Jun 30 - Jul 06, 2025

Jul 04, 2025

This week’s newsletter highlights the following research:

A Unified Ranking Foundation Model with Iterative Exclusion, from Feng et al.
Collaborative-Aware LLM Embeddings for Sequential Recommendation, from He et al.
Enabling Client-Side RAG with Efficient In-Browser ANNS, from Liu et al.
A Formal Framework for Agentic Paradigm in Recommendation, from Maragheh et al.
Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks, from Lyu et al.
Agentic Retrieval Augmented Generation for Personalized Recommendation, from Walmart
Chunk Autoregressive Modeling for Generative Recommendation with Semantic Integration, from Wang et al.
Bridging Agent Experience and Human Expertise for Complex Interactive Recommendations, from Yu et al.
Bridging Visual, Audio, and Text Modalities in Knowledge-Intensive Multimodal Graphs, from Park et al.
Resolving Knowledge Conflicts in Retrieval-Augmented Generation through Systematic Evidence Reconciliation, from Chen et al.

[1] IRanker: Towards Ranking Foundation Model

This paper from Feng et al. introduces IRanker, a 3B parameter ranking foundation model that unifies diverse ranking tasks across recommendation systems, LLM routing, and passage ranking through reinforcement learning and iterative decoding. Unlike traditional approaches that require separate models for each ranking domain, IRanker decomposes complex ranking problems into an iterative exclusion process that eliminates the worst candidate step-by-step, significantly reducing output combinatorial space and better utilizing limited context length during training. The authors train IRanker using Proximal Policy Optimization (PPO) with step-wise rewards that encourage excluding negative candidates first, then construct final rankings by reversing the exclusion order.

📚 https://arxiv.org/abs/2506.21638

👨🏽‍💻 https://github.com/ulab-uiuc/IRanker

[2] LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation

This paper from He et al. introduces LLM2Rec, an embedding model designed for sequential recommendation that addresses the inability of existing approaches to simultaneously capture semantic information and collaborative filtering (CF) signals. Traditional ID-based methods excel at encoding CF signals through co-occurrence patterns but lack generalizability to unseen domains, while text-based approaches using pre-trained language models provide strong semantic understanding but fail to capture crucial item correlations and user preference patterns. LLM2Rec employs a two-stage training framework: first, Collaborative Supervised Fine-Tuning (CSFT) adapts LLMs to infer item relationships from historical user interactions, enabling the LLM to capture CF signals; second, Item-level Embedding Modeling (IEM) transforms the specialized LLM into a structured embedding model through bidirectional attention, masked next token prediction, and item-level contrastive learning. Extensive experiments on both in-domain and out-of-domain datasets demonstrate that LLM2Rec consistently outperforms existing embedding models across multiple sequential recommenders.

📚 https://arxiv.org/abs/2506.21579

👨🏽‍💻 https://github.com/HappyPointer/LLM2Rec

[3] WebANNS: Fast and Efficient Approximate Nearest Neighbor Search in Web Browsers

This paper from Liu et al. presents WebANNS, an approximate nearest neighbor search (ANNS) engine specifically designed for web browsers to address critical limitations in existing in-browser ANNS solutions. The research identifies three key bottlenecks in current state-of-the-art engines like Mememo: computational overhead from JavaScript's interpreted nature (causing 100ms+ query delays), inefficient external storage access through IndexedDB (with over 80% redundant data loading), and poor memory utilization that fails to adapt to varying device capabilities. To solve these issues, WebANNS introduces a three-tier data management system leveraging WebAssembly for near-native computational speed, implements a phased lazy loading strategy that minimizes redundant IndexedDB accesses while maintaining correct query paths in the HNSW algorithm, and employs heuristic cache size optimization to adaptively determine optimal memory thresholds.

📚 https://arxiv.org/abs/2507.00521

👨🏽‍💻 https://github.com/morgen52/webanns

[4] The Future is Agentic: Definitions, Perspectives, and Open Challenges of Multi-Agent Recommender Systems

This paper from Maragheh et al. presents a comprehensive framework for multi-agent recommender systems powered by LLMs, where multiple autonomous agents collaborate to deliver personalized, context-aware recommendations. The authors formalize the core components through mathematical definitions, including LLM agents as tuples comprising language models, tools, and hierarchical memory systems, and multi-agent systems as triples of agents, shared environments, and communication protocols. They demonstrate the framework's capabilities through four concrete use cases: interactive party planning (Mickey Mouse birthday), synthetic user simulation for offline evaluation, multi-modal furniture recommendation combining vision and text, and brand-aligned explanation generation. The paper identifies five critical challenge areas that emerge from agentic architectures: communication protocol complexity and standardization, scalability issues including latency and cost management, hallucination propagation across agent networks, emergent misalignment and potential collusion between autonomous agents, and maintaining brand consistency while preserving generative flexibility. For each challenge, the authors provide formal problem statements, review existing mitigation strategies, and outline open research questions.

📚 https://arxiv.org/abs/2507.02097

[5] Frustratingly Simple Retrieval Improves Challenging, Reasoning-Intensive Benchmarks

This paper from Lyu et al. introduces COMPACTDS, a 380-billion-word datastore for RAG that challenges the prevailing view that retrieval doesn't benefit reasoning-intensive tasks. The authors constructed a compact, high-quality datastore from diverse sources including filtered web crawls, academic papers, textbooks, and specialized content (math, code, Q&A forums), combined with a two-stage retrieval system using approximate nearest neighbor search followed by exact search to achieve subsecond latency on a single node with 456GB RAM. When evaluated on challenging benchmarks like MMLU, MMLU Pro, GPQA, and MATH using minimal RAG (dense retrieval + generation), COMPACTDS consistently improved performance across all model sizes (8B-70B parameters). Crucially, the authors demonstrate that dataset diversity is critical; no single data source suffices alone, and their in-house datastore matches or outperforms commercial search engines like Google Search while maintaining reproducibility and self-containment.

📚 https://arxiv.org/abs/2507.01297

👨🏽‍💻 https://huggingface.co/datasets/alrope/CompactDS-102GB

[6] ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation

This paper from Walmart introduces ARAG (Agentic Retrieval Augmented Generation), a multi-agent framework that enhances personalized recommendation systems by integrating specialized LLM-based agents into the traditional RAG pipeline. The system employs four collaborative agents: a User Understanding Agent that synthesizes user preferences from long-term and session contexts, a Natural Language Inference (NLI) Agent that evaluates semantic alignment between retrieved candidate items and user intent, a Context Summary Agent that condenses NLI findings into focused context, and an Item Ranker Agent that generates final ranked recommendations based on contextual fit. Unlike conventional RAG approaches that rely on static retrieval heuristics like cosine similarity, ARAG implements a blackboard-style multi-agent system where agents share structured memory and reasoning outcomes to refine an initial recall set into semantically grounded recommendations.

📚 https://arxiv.org/abs/2506.21931

[7] Act-With-Think: Chunk Auto-Regressive Modeling for Generative Recommendation

This paper from Wang et al. introduces CAR (Chunk AutoRegressive Modeling), a generative recommendation framework that integrates semantic and behavioral information through an "act-with-think" paradigm. Traditional generative recommendation methods either focus solely on semantic item information (SIDs) while missing collaborative patterns, or treat semantic and behavioral signals as independent features, failing to capture their inherent interdependence. CAR models each item as a unified chunk containing both semantic IDs (representing the "think" aspect of user decision-making) and a unique item ID (representing the "act" aspect), enabling chunk-level autoregressive prediction that jointly learns both dimensions simultaneously rather than sequentially. The approach uses residual K-means clustering to generate non-unique semantic IDs and employs a dual-branch loss function with progressive context fusion to enhance representation learning. Through experiments, authors show a scaling effect where increasing the number of semantic ID bits leads to better performance, suggesting the framework emulates a slow-thinking mechanism similar to reasoning processes in LLMs.

📚 https://arxiv.org/abs/2506.23643

[8] Thought-Augmented Planning for LLM-Powered Interactive Recommender Agent

This paper from Yu et al. introduces TAIRA (Thought-Augmented Interactive Recommender Agent), a multi-agent system designed to handle complex and diverse user intents in interactive recommendation tasks. TAIRA employs a Manager Agent that orchestrates recommendation tasks through hierarchical planning, enhanced by Thought Pattern Distillation (TPD), a method that extracts high-level thought patterns from successful experiences of both agents and human experts to improve reasoning and planning abilities. The system consists of three main components: a thought-augmented Manager Agent for decomposing user needs and coordinating tasks, multiple Executor Agents (Searcher, Item Retriever, and Task Interpreter) for executing subtasks, and the TPD method that captures experiential guidance from agent successes, expert-corrected failures, and direct human expertise.

📚 https://arxiv.org/abs/2506.23485

👨🏽‍💻 https://github.com/Alcein/TAIRA

[9] VAT-KG: Knowledge-Intensive Multimodal Knowledge Graph Dataset for Retrieval-Augmented Generation

This paper from Park et al. introduces VAT-KG (Visual-Audio-Text Knowledge Graph), the first concept-centric and knowledge-intensive multimodal knowledge graph that comprehensively covers visual, audio, and text modalities to address limitations in existing Multimodal Large Language Models (MLLMs) that suffer from hallucinations due to incomplete knowledge. The researchers developed a four-stage construction pipeline involving multimodal alignment filtering, knowledge-intensive recaptioning using metadata from YouTube, multimodal triplet grounding with LLMs, and cross-modal description alignment with external knowledge bases like Wikipedia and Wiktionary. VAT-KG contains 102,203 unique concepts and 110,218 triplets, each linked to detailed descriptions and multimodal data (video, audio, text), constructed from filtered datasets. The authors also propose a multimodal RAG framework that retrieves semantically relevant knowledge from VAT-KG in response to queries from arbitrary modalities, incorporating a Retrieval Checker module to filter misaligned results.

📚 https://arxiv.org/abs/2506.21556

👨🏽‍💻 https://vatkg.github.io/

[10] Rethinking All Evidence: Enhancing Trustworthy Retrieval-Augmented Generation via Conflict-Driven Summarization

This paper from Chen et al. introduces CARE-RAG (Conflict-Aware and Reliable Evidence for RAG), a framework that enhances the trustworthiness of RAG systems by systematically addressing knowledge conflicts between LLMs' internal parametric knowledge and external retrieved content. The framework operates through a four-stage pipeline: (1) Parameter Record Comparison, which elicits diverse internal perspectives from the LLM to reduce hallucinations, (2) Retrieval Result Refinement, which filters and refines retrieved evidence to remove irrelevant noise, (3) Conflict-Driven Summarization, which uses a distilled 3B LLaMA3.2 model to detect and analyze conflicts between parametric and contextual evidence, and (4) CARE-RAG Generation, which synthesizes final answers by reconciling all available evidence. Additionally, the authors introduce a QA Repair mechanism to correct outdated or semantically inconsistent benchmark answers, ensuring more reliable evaluation.

📚 https://arxiv.org/abs/2507.01281

I hope this weekly roundup of top papers has provided you with valuable insights and a glimpse into the exciting advancements taking place in the field. Remember to look deeper into the papers that pique your interest.

I also blog about Machine Learning, Deep Learning, MLOps, and Software Engineering domains. I explore diverse topics, such as Natural Language Processing, Large Language Models, Recommendation Systems, etc., and conduct in-depth analyses, drawing insights from the latest research papers.

Check out my blog HERE!

Top Information Retrieval Papers of the Week

Discussion about this post