Actual LLM agents are coming
17 days ago
- #AI Research
- #LLM Agents
- #Reinforcement Learning
- OpenAI released DeepResearch, a specialized variant for web and document search, capable of planning search strategies and cross-referencing sources.
- Claude Sonnet 3.7 applies similar advancements to code, outperforming past models on complex programming tasks.
- Anthropic defines LLM agents as systems where LLMs dynamically direct their own processes and tool usage.
- Common agentic systems use predefined code paths, leading to limitations like inability to plan, memorize, or act effectively long-term.
- The 'bitter lesson' suggests that hardcoding knowledge into models is suboptimal; scaling computation through search and learning is better.
- LLM agents are trained with reinforcement learning, using verifiers to check rewards, and often require drafts and multi-step training.
- Training LLM agents involves generating large amounts of data through emulations or simulations, similar to game RL.
- Actual LLM agents can automate complex processes like search, network engineering, and financial tasks without predefined prompts.
- Big labs currently dominate LLM agent development due to their resources, but democratizing training and deployment is critical for broader adoption.