A Practitioner's Guide to Generative Recommendation, A Comprehensive Survey of PLMs in Text Embedding, and More!
Vol.115 for Jul 28 - Aug 03, 2025
Stay Ahead of the Curve with the Latest Advancements and Discoveries in Information Retrieval.
This week’s newsletter highlights the following research:
Multi-Turn GraphRAG with End-to-End Reinforcement Learning, from Luo et al.
A Comprehensive Survey of Pretrained Language Models in Text Embedding, from Harbin Institute of Technology
A Practitioner's Guide to Generative Recommendation, from Snap Inc.
A Survey on Leveraging Large Language Models to Overcome Traditional RecSys Limitations, from Raja et al.
A Balanced Approach to Item Cold Start in Sequential Recommendations, from Sber AI Lab
Moving Beyond Nearest Neighbors for Diverse Vector Search, from Raja et al.
Enhancing RAG Robustness by Injecting External Passages into the Reasoning Process, from UCAS
RecGPT: LLM-Driven Intent-Centric Recommender Systems at Industrial Scale, from Alibaba
Label-Free Performance Estimation in Top-N Recommendation, from Tsinghua University
LLM-Enhanced Adversarial Learning for Diversity-Accuracy Trade-offs in Recommendations, from Amazon
[1] Graph-R1: Towards Agentic GraphRAG Framework via End-to-end Reinforcement Learning
This paper from Luo et al. introduces Graph-R1, an agentic GraphRAG framework that employs end-to-end reinforcement learning. Traditional RAG systems rely on chunk-based retrieval that lacks structural semantics, and existing GraphRAG methods face challenges including high construction costs, fixed one-time retrieval, and dependence on long-context reasoning. Graph-R1 proposes an approach that combines lightweight knowledge hypergraph construction with multi-turn agent-environment interaction. The framework models retrieval as an iterative "think-retrieve-rethink-generate" process within a knowledge hypergraph environment, optimized through Group Relative Policy Optimization (GRPO) with an outcome-directed reward mechanism that integrates generation quality, retrieval relevance, and structural reliability. The approach bridges the gap between graph-structured knowledge representation and natural language generation, with the end-to-end RL strategy enabling agents to learn generalizable graph reasoning strategies.
📚 https://arxiv.org/abs/2507.21892
👨🏽💻 https://github.com/LHRLAB/Graph-R1
[2] On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
This paper from Harbin Institute of Technology presents a comprehensive survey examining the role of pretrained language models (PLMs) in general-purpose text embeddings (GPTE). GPTEs are dense vector representations used across diverse NLP tasks like retrieval, classification, and clustering. The authors systematically analyze how PLMs have transformed text embedding development through both fundamental contributions, including embedding extraction methods, contrastive learning objectives, and data synthesis, and advanced capabilities such as multilingual support, code understanding, and domain-specific adaptation. The survey reveals that modern GPTE typically follows a unified architecture where PLMs serve as backbone networks with pooling strategies to generate fixed-size embeddings, optimized through contrastive learning on large-scale paired datasets. The paper serves as both a comprehensive reference for newcomers to understand recent GPTE developments and a roadmap for established researchers to grasp the current landscape and future potential of this rapidly evolving field.
📚 https://arxiv.org/abs/2507.20783
[3] Generative Recommendation with Semantic IDs: A Practitioner's Handbook
This paper from Snap Inc. introduces GRID, an open-source framework for Generative Recommendation with Semantic IDs. GRID modularizes the two-phase workflow of semantic ID-based generative recommendation: tokenization (converting item embeddings into discrete semantic IDs using quantizers like RQ-VAE, RVQ, or Residual K-Means) and generation (using sequential models to predict next items). Through experiments on Amazon datasets, the authors reveal several surprising insights that challenge conventional assumptions, notably that simpler tokenization methods like Residual K-Means often outperform complex approaches like RQ-VAE despite requiring significantly less training, and that larger language models provide only marginal improvements in recommendation performance, that user tokens (commonly used in existing literature) actually hurt performance when removed entirely, and that encoder-decoder architectures substantially outperform decoder-only models. The framework successfully reproduces existing literature results while providing novel insights into the true drivers of generative recommendation performance.
📚 https://arxiv.org/abs/2507.22224
👨🏽💻 https://github.com/snap-research/GRID [not public as of 08/01]
[4] A Comprehensive Review on Harnessing Large Language Models to Overcome Recommender System Challenges
This survey paper from Raja et al. examines how LLMs can address persistent challenges in modern recommender systems, which traditionally rely on modular architectures with candidate generation, multi-stage ranking, and re-ranking components trained separately using supervised objectives. The authors systematically analyze six categories of industrial challenges: data-centric issues (cold start, data sparsity, noisy feedback, temporal drift, multimodal integration), modeling challenges (personalization vs. generalization, scalability, long-tail modeling), evaluation gaps (offline-online discrepancies, sparse labels, balancing engagement), system design considerations, privacy/security concerns, and organizational factors. They demonstrate how LLMs provide unified, language-native solutions through techniques like content-conditioned generation, RAG, zero-shot personalization, representation bootstrapping, conversational interfaces, and prompt-based conditioning. Unlike conventional collaborative filtering methods that depend on dense user-item interaction matrices, LLMs leverage semantic reasoning and contextual understanding to enable effective recommendations without extensive historical data, particularly excelling in cold-start scenarios where new users or items lack interaction history. The paper provides a structured framework for understanding the design space of LLM-enhanced recommenders, analyzing trade-offs between accuracy, scalability, and real-time performance.
📚 https://arxiv.org/abs/2507.21117
[5] Let It Go? Not Quite: Addressing Item Cold Start in Sequential Recommendations with Content-Based Initialization
This paper from Sber AI Lab addresses the item cold start problem in sequential recommender systems by proposing a content-based initialization approach that uses trainable delta vectors. The authors identify that while content-based embeddings can help with cold items (those with few or no interactions), directly using frozen content embeddings yields suboptimal performance since they don't adapt to the recommendation task, whereas fine-tuning them causes embeddings to drift too far from their original semantic structure, degrading cold-start performance. Their solution introduces a two-component embedding system: frozen content embeddings with fixed norm combined with small trainable delta vectors with bounded norm, allowing the model to adapt item representations while maintaining proximity to original content-based semantics.
📚 https://arxiv.org/abs/2507.19473
👨🏽💻 https://github.com/ArtemF42/let-it-go
[6] Beyond Nearest Neighbors: Semantic Compression and Graph-Augmented Retrieval for Enhanced Vector Search
This paper from Raja et al. introduces a retrieval paradigm called "semantic compression" that addresses the limitations of traditional approximate nearest neighbor (ANN) search in vector databases, which often produces semantically redundant results lacking the diversity needed for applications like RAG and multi-hop question answering. The authors formalize semantic compression using submodular optimization principles, creating an objective function that balances both coverage (ensuring retrieved vectors represent the broader semantic space) and diversity (preventing selection of overly similar items) through a greedy algorithm that achieves theoretical approximation guarantees. To operationalize this concept, they propose graph-augmented vector retrieval, which overlays semantic graphs constructed from k-nearest neighbor connections, external relations, or symbolic knowledge graphs onto embedding spaces, enabling multi-hop context-aware search through methods like Personalized PageRank that can discover semantically diverse but non-local results. Their hybrid scoring framework combines vector-based relevance with graph-based influence propagation, creating a tunable system that interpolates between pure ANN retrieval and graph-based exploration.
📚 https://arxiv.org/abs/2507.19715
👨🏽💻 https://github.com/rahulrj/icml_vecdb_experiments
[7] Injecting External Knowledge into the Reasoning Process Enhances Retrieval-Augmented Generation
This paper from UCAS introduces "Passage Injection," a method to enhance RAG systems by explicitly incorporating retrieved passages into the reasoning process of LLMs rather than simply placing them in the input prompt. The authors leverage the self-reflection capabilities of reasoning-enhanced LLMs (like Qwen3 and DeepSeek-R1-Distill) to improve robustness against noisy or misleading retrieved passages. Through experiments on four factual QA datasets using BM25 retrieval, they demonstrate that Passage Injection consistently outperforms vanilla RAG across different model sizes, with particularly strong improvements on multi-hop questions requiring complex reasoning. The method shows enhanced robustness in two challenging noise scenarios: random irrelevant passages and counterfactual misleading contexts. Additionally, the approach reduces output length and computational overhead by mitigating overthinking, suggesting that integrating external knowledge directly into the step-by-step reasoning process represents a promising direction for building more reliable RAG systems.
📚 https://arxiv.org/abs/2507.19333
[8] RecGPT Technical Report
This paper from Alibaba presents RecGPT, a production-scale framework that integrates LLMs into industrial recommender systems to shift from traditional "log-fitting" approaches to intent-centric recommendation. The framework employs three specialized LLMs: LLMᵤᵢ for mining user interests from compressed behavioral sequences, LLMᵢₜ for predicting item tags based on inferred user preferences, and LLMᵣₑ for generating personalized explanations. RecGPT addresses challenges including context window limitations through hierarchical behavior compression, domain knowledge gaps via multi-stage task alignment (curriculum learning, reasoning-enhanced pre-alignment, and self-training evolution), and the semantic-collaborative gap through a user-item-tag tri-tower retrieval architecture that balances semantic relevance with collaborative filtering signals. The system incorporates rigorous data quality control using Human-LLM cooperative judges and incremental learning for continuous adaptation to evolving user preferences.
📚 https://arxiv.org/abs/2507.22879
[9] Are Recommenders Self-Aware? Label-Free Recommendation Performance Estimation via Model Uncertainty
This paper from Tsinghua University investigates whether recommendation models can achieve “self-awareness” by quantifying their uncertainty as a means of estimating performance without requiring user feedback labels. The authors propose List Distribution uncertainty (LiDu), which measures the probability that a recommender will generate a specific ranking list based on prediction distributions of individual items, differing from traditional point-wise uncertainty methods by considering list-level uncertainty rather than individual item predictions. Through experiments on both synthetic tasks and real-world datasets using five recommendation algorithms (BPRMF, LightGCN, SimpleX, SASRec, TiMiRec), they demonstrate that LiDu exhibits strong negative correlation with recommendation performance across various scenarios. Beyond performance estimation, LiDu reveals insights into model behavior, showing that users with more dynamic interests and more diverse recommendation lists tend to exhibit higher uncertainty.
📚 https://arxiv.org/abs/2507.23208
[10] Large Language Model-Enhanced Reinforcement Learning for Diverse and Novel Recommendations
This paper from Amazon presents LAAC (LLM-guided Adversarial Actor Critic), a reinforcement learning approach that leverages LLMs as reference policies. The method formulates training as a bilevel optimization problem where an actor network learns to refine LLM suggestions using system-specific data, while a critic network provides selective evaluation that favors promising novel actions over purely popular items from the dataset. To prevent overestimation of unreliable LLM recommendations, LAAC incorporates two key regularization techniques: a temporal difference loss that enforces Bellman consistency for in-sample actions, and a grounding loss that constrains critic values for LLM-suggested items to remain close to reliably estimated dataset actions. The approach effectively integrates LLM knowledge without expensive fine-tuning.
📚 https://arxiv.org/abs/2507.21274
Extras: Benchmarks
⏱️ PRGB Benchmark: A Robust Placeholder-Assisted Algorithm for Benchmarking Retrieval-Augmented Generation
The Placeholder-RAG-Benchmark (PRGB) is a fine-grained evaluation framework designed to assess the capabilities of LLMs within RAG systems. Unlike prior benchmarks that evaluate RAG performance holistically, PRGB isolates and tests specific generation abilities through three core dimensions: multi-level filtering, combination, and reference reasoning. To reduce reliance on a model's internal knowledge, PRGB introduces a placeholder-based method that substitutes key information in documents with variable placeholders.
📝 https://arxiv.org/abs/2507.22927
👨🏽💻 https://github.com/Alipay-Med/PRGB
I hope this weekly roundup of top papers has provided you with valuable insights and a glimpse into the exciting advancements taking place in the field. Remember to look deeper into the papers that pique your interest.
I also blog about Machine Learning, Deep Learning, MLOps, and Software Engineering domains. I explore diverse topics, such as Natural Language Processing, Large Language Models, Recommendation Systems, etc., and conduct in-depth analyses, drawing insights from the latest research papers.