Agentic Information Retrieval, A Comparative Study of Semantic vs. Fixed-Size Strategies, and More!
Vol.74 for Oct 14 - Oct 20, 2024
Stay Ahead of the Curve with the Latest Advancements and Discoveries in Information Retrieval.
This week’s newsletter highlights the following research:
Agentic Information Retrieval as the Next Generation of IR Systems, from SJTU
A Novel Approach to Extracting Rich Embeddings from Mixture-of-Experts LLMs, from UMD
Starbucks Method for Efficient and Effective Embedding Model Training, from Zhuang et al.
Enhancing Semi-Structured Retrieval with Knowledge-Aware Query Expansion, from Xia et al.
A Comparative Study of Semantic vs. Fixed-Size Strategies, from Qu et al.
Enhancing Encoder-Only Transformers for Next-Item Prediction in Session-Based Recommenders, from Redjdal et al.
A Framework for Integrating Visual, Textual, and Graph Data in LLM-Based Recommenders, from Walmart
Layer-of-Thoughts Framework for Advanced Information Retrieval, from ROIS-DS
Language Model Agents for a Healthier Digital Public Sphere, from Lazar et al.
Adaptive-Note RAG for Improved Information Integration, from Wang et al.
[1] Agentic Information Retrieval
This position paper from SJTU introduces the concept of Agentic Information Retrieval (Agentic IR), a novel paradigm in information retrieval shaped by the capabilities of large language model agents. Unlike traditional IR systems that rely on filtering predefined sets of candidate items, Agentic IR expands the scope of accessible tasks and employs a unified agent architecture to solve complex, multi-step information retrieval problems. The authors highlight three key differences between Agentic IR and traditional IR: a broader task scope, a flexible agent-based architecture, and new key methods including prompt engineering and retrieval-augmented generation. The paper discusses emerging applications of Agentic IR, such as life assistants, business assistants, and coding assistants.
📚 https://arxiv.org/abs/2410.09713
[2] Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free
This paper from UMD explores the potential of Mixture-of-Experts (MoE) large language models as effective embedding generators without additional training. The researchers discover that the expert routers in MoE LLMs can serve as off-the-shelf embedding models, producing routing weights (RW) that complement the widely used hidden state (HS) embeddings. They propose MoE Embedding (MoEE), a novel approach that combines RW and HS to create a more comprehensive embedding. The study reveals that RW captures high-level semantics and is more robust to prompt variations, while HS focuses on output-dependent information. Notably, a weighted sum of RW and HS similarities (MoEE (sum)) often outperforms simple concatenation. Extensive experiments on the Massive Text Embedding Benchmark (MTEB) demonstrate that MoEE consistently outperforms embeddings derived solely from HS or RW, particularly in tasks requiring deep input understanding such as semantic textual similarity, classification, and clustering.
📚 https://arxiv.org/abs/2410.10814
👨🏽💻 https://github.com/tianyi-lab/MoE-Embedding
[3] Starbucks: Improved Training for 2D Matryoshka Embeddings
This paper from Zhuang et al. introduces Starbucks, an approach to improve 2D Matryoshka representation learning for embedding models. The authors address a significant drawback of the existing 2D Matryoshka Sentence Embeddings (2DMSE) method, where embeddings generated from sub-layers and sub-dimensions were less effective than separately trained models. Starbucks comprises two key components: Starbucks Masked Autoencoding for pre-training and Starbucks Representation Learning for fine-tuning. The method computes model loss based on a predefined list of layer-dimension pairs, ranging from small to large sizes. This approach allows for the generation of embeddings with targeted layer-dimension sizes, similar to how Starbucks coffee offers various cup sizes. The researchers demonstrate that Starbucks significantly outperforms 2DMSE and achieves comparable or better effectiveness than separately trained models across semantic text similarity and retrieval tasks.
📚 https://arxiv.org/abs/2410.13230
👨🏽💻 https://github.com/ielab/Starbucks
[4] Knowledge-Aware Query Expansion with Large Language Models for Textual and Relational Retrieval
This paper from Xia et al. introduces a knowledge-aware query expansion framework designed to improve information retrieval for semi-structured queries that contain both textual and relational requirements. Unlike existing methods that primarily focus on enhancing textual similarities, this approach leverages knowledge graphs (KG) to capture and utilize document relations. The authors propose a novel technique called Knowledge-Aware Retrieval, which uses document texts as rich KG node representations and employs document-based relation filtering. This method aims to generate query expansions that are not only semantically relevant but also preserve user-specified document relations.
📚 https://arxiv.org/abs/2410.13765
[5] Is Semantic Chunking Worth the Computational Cost?
This paper from Qu et al. presents a systematic evaluation of semantic chunking versus fixed-size chunking in Retrieval-Augmented Generation systems. Despite the growing popularity of semantic chunking, which segments documents based on semantic similarity, the study finds that its benefits over simpler fixed-size chunking are not consistently significant. The researchers evaluated both methods using three proxy tasks: document retrieval, evidence retrieval, and answer generation. Their findings challenge prevailing assumptions about semantic chunking, revealing that its advantages are highly task-dependent and often do not justify the added computational costs. While semantic chunking showed some benefits in certain scenarios, particularly with stitched datasets having high topic diversity, fixed-size chunking often performed better on non-synthetic datasets that more closely resemble real-world documents. The study concludes that fixed-size chunking remains a more efficient and reliable choice for practical RAG applications, with factors like embedding quality often having a greater impact on performance than the chunking strategy itself.
📚 https://arxiv.org/abs/2410.13070
[6] Optimizing Encoder-Only Transformers for Session-Based Recommendation Systems
This paper from Redjdal et al. introduces Sequential Masked Modeling (SMM) for improving session-based recommendation using encoder-only transformer architectures. SMM combines data augmentation through window sliding with a unique penultimate token masking strategy to better capture sequential dependencies in user sessions. The authors evaluate their method on three widely-used datasets (Yoochoose 1/64, Diginetica, and Tmall), comparing it to state-of-the-art single-session, cross-session, and multi-relation approaches. Their Transformer-SMM models, particularly BERT-SMM and DeBERTa-SMM, consistently outperform other models that rely on the same amount of information and even rival methods with access to more extensive user history.
📚 https://arxiv.org/abs/2410.11150
[7] Triple Modality Fusion: Aligning Visual, Textual, and Graph Data with Large Language Models for Multi-Behavior Recommendations
This paper from Walmart introduces the Triple Modality Fusion (TMF) framework, an approach to multi-behavior recommendation systems that leverages large language models to integrate visual, textual, and graph data. The authors argue that traditional recommendation models often fall short in capturing the complex nature of user behaviors and item features due to their reliance on single data sources. TMF addresses this limitation by aligning these three modalities within an LLM-based recommender. The framework initially warms up the LLM using natural language prompts, then employs a modality fusion module based on cross-attention and self-attention mechanisms to project different data types into a unified embedding space. This approach allows for a more comprehensive representation of user behaviors and item characteristics.
📚 https://arxiv.org/abs/2410.12228
[8] Layer-of-Thoughts Prompting (LoT): Leveraging LLM-Based Retrieval with Constraint Hierarchies
This paper from ROIS-DS introduces Layer-of-Thoughts (LoT) Prompting, an approach to information retrieval that leverages constraint hierarchies and Large Language Models to enhance the accuracy and explainability of search results. LoT extends the Graph-of-Thoughts concept by organizing reasoning steps into hierarchical layers, consisting of layer thoughts for conceptual steps and option thoughts for solution-finding. The framework initializes layer thoughts and processes them sequentially, generating partial solutions through option thoughts and passing aggregated outputs to subsequent layers. In retrieval tasks, LoT employs hierarchical levels to filter and rank documents based on relevance scores, providing clear explanations for document relevance.
📚 https://arxiv.org/abs/2410.12153
[9] The Moral Case for Using Language Model Agents for Recommendation
This paper from Lazar et al. argues for a paradigm shift in online content recommendation, proposing the use of Language Model (LM) agents as an alternative to current recommender systems. The authors contend that existing systems contribute to several problems in our digital communication environment, including reliance on mass surveillance, power concentration, narrow behaviorism, and compromised user agency. They suggest that LM agents, leveraging advanced natural language processing capabilities, could potentially address these issues by allowing users to express preferences in natural language, reducing dependence on surveillance, decentralizing power, and enhancing user agency. While acknowledging the challenges in implementing LM agent-based recommenders, including candidate generation and computational efficiency, the authors argue that this approach could lead to a healthier digital public sphere.
📚 https://arxiv.org/abs/2410.12123
[10] Retriever-and-Memory: Towards Adaptive Note-Enhanced Retrieval-Augmented Generation
This paper from Wang et al. introduces Adaptive Note-Enhanced RAG (Adaptive-Note), a novel approach to Retrieval-Augmented Generation for complex question-answering tasks. Unlike traditional RAG methods, which may struggle with insufficient information gathering and poor knowledge integration, Adaptive-Note employs a Retriever-and-Memory paradigm. The system features an Iterative Information Collector that accumulates knowledge in note form, and an Adaptive Memory Reviewer that determines when to stop information gathering based on knowledge saturation. This approach allows for more comprehensive information gathering and better integration of knowledge from multiple retrieval steps.
📚 https://arxiv.org/abs/2410.08821
👨🏽💻 https://github.com/thunlp/Adaptive-Note
Extras: Benchmarks
⏱ MAIR: A Massive Benchmark for Evaluating Instructed Retrieval
MAIR (Massive Instructed Retrieval Benchmark) is a comprehensive information retrieval (IR) benchmark designed to evaluate the latest instruction-tuned models. MAIR encompasses 126 distinct IR tasks across 6 domains, featuring 805 unique instructions that cover various query types, document formats, and relevance criteria. The benchmark includes 10,038 queries and over 4 million documents, sampled to balance evaluation accuracy and cost.
📝 https://arxiv.org/abs/2410.10127
👨🏽💻 https://github.com/sunnweiwei/Mair
⏱ MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems
MIRAGE-Bench from Thakur et al. is a multilingual Retrieval-Augmented Generation (RAG) benchmark covering 18 diverse languages. The authors propose a unique evaluation approach that combines the efficiency of heuristic-based metrics with the reliability of arena-based benchmarks. They achieve this by training a learning-to-rank model as a "surrogate judge" using RAG-based evaluation heuristics to produce a synthetic arena-based leaderboard. This method correlates highly with expensive LLM-based judgments (achieving a Kendall Tau of 0.909 with GPT-4 evaluations) while being more cost-effective and scalable. MIRAGE-Bench evaluates 19 multilingual LLMs using seven heuristic features and pairwise comparisons. The benchmark is built upon the MIRACL dataset, extending it for generation tasks.
📝 https://arxiv.org/abs/2410.13716
👨🏽💻 https://github.com/vectara/mirage-bench
I hope this weekly roundup of top papers has provided you with valuable insights and a glimpse into the exciting advancements taking place in the field. Remember to look deeper into the papers that pique your interest.
I also blog about Machine Learning, Deep Learning, MLOps, and Software Engineering domains. I explore diverse topics, such as Natural Language Processing, Large Language Models, Recommendation Systems, etc., and conduct in-depth analyses, drawing insights from the latest research papers.