Training Optimization for Complex Reasoning Tasks in ACN: Dynamic Batch-Aware Advantage Weighting for Agentic RAG

Release Date:2026-06-26 Author:Chen Yu, Li Fan, Wu Jie, Gao Weipeng, Ouyang Ye

Abstract: With the emergence of AI-agent communication networks (ACN) in the 6G era, the efficient training of agents for complex rea‑ soning tasks has become a critical capability for scalable ACN deployment. As a representative complex reasoning task, retrieval- augmented multi-hop question answering (e. g., agentic retrieval-augmented generation) requires agents to perform multi-step reasoning through reflection, planning, and tool-use mechanisms. However, reinforcement learning training still faces reward sparsity and sample ef‑ ficiency challenges, limiting agents’ rapid evolution and adaptability. We propose dynamic batch-aware advantage weighting (DB-AW), integrating two core components at the batch level: the difficulty-aware weighting component dynamically amplifies positive advantages based on long-term success rates, directing learning toward learnable yet challenging samples; and the batch filtering component removes zero-variance groups, ensuring each update contains non-zero gradient signals. Experiments show that DB-AW achieves 18%, 17%, and 15% relative improvements on Qwen2.5-7B, Qwen2.5-3B, and LLaMA3.2-3B, respectively, while improving the effective update rate from 68% to 100%, significantly reducing agent training costs. As a lightweight and reusable algorithmic module, DB-AW can be readily inte‑ grated into methods such as group relative policy optimization (GRPO), providing a practical pathway for efficient training of complex rea‑ soning agents in ACN.


Keywords: Agentic RAG; AI-agent communication network; batch filtering; difficulty-aware weighting; multi-hop question answering; re‑ inforcement learning

download: PDF