Why this is here: SOURCE-BACKED + 95 signal strength + source-backed + recent this week + low-noise result.
VQV Signal
SOURCE-BACKED
95% signal strength
ATOD: Annealed Turn-aware On-policy Distillation for Multi-turn Autonomous Agents
Training small language-model agents for long-horizon interactive tasks requires both fast imitation and reward-driven improvement. On-policy distillation (OPD) provides dense teacher guidance and typically improves rapidly in the early stage, but its gains s...
Score 69
Source Type arxiv
Reposts 0
Topic Quality 62
Open the original source for full context, or open the topic page to see related signals and the topic timeline.