Hasty Briefsbeta

Bilingual

Latent Agents: A Post-Training Procedure for Internalized Multi-Agent Debate

6 hours ago
  • #Multi-Agent Debate
  • #LLM Distillation
  • #Activation Steering
  • Proposes a framework to distill multi-agent debate into a single LLM via a two-stage fine-tuning pipeline, reducing token usage by up to 93%.
  • Internalizes debate through structure learning, dynamic reward scheduling, and length clipping, matching or exceeding explicit multi-agent debate performance.
  • Identifies agent-specific subspaces via activation steering, showing interpretable activation directions for different agent perspectives.
  • Demonstrates a practical application by instilling malicious agents and using negative steering to control harmful behaviors with less performance loss.
  • Provides code availability and insights for understanding and controlling internalized reasoning in distilled models.