GLM-5.2: Built for Long-Horizon Tasks
4 hours ago
- #LLM
- #Long-Context
- #Coding-Agent
- GLM-5.2 features a solid 1M-token context for stable performance on long-horizon tasks, including complex coding and agentic scenarios.
- Introduces IndexShare architecture to reduce per-token FLOPs by 2.9x at 1M context and improves MTP layer for speculative decoding, increasing acceptance length by up to 20%.
- Offers advanced coding capabilities with adjustable thinking effort levels, allowing users to balance performance and latency, and includes an anti-hack module to prevent reward hacking in coding agents.
- Achieves strong benchmark results: highest-ranked open-source model on long-horizon coding benchmarks (e.g., FrontierSWE, PostTrainBench, SWE-Marathon) and top open-source performer on standard coding benchmarks (e.g., Terminal-Bench 2.1, SWE-bench Pro).
- Uses the slime infrastructure for efficient agentic RL post-training, supporting large-scale, parallel OPD training and flexible inference integration.
- Optimized for efficient 1M context serving with enhancements in KV-cache management, kernel coordination, and CPU-side optimizations to improve throughput and scalability.
- Released under an MIT open-source license with no regional restrictions, and available via GLM Coding Plan, Z.ai chat, and local deployment through platforms like HuggingFace and ModelScope.