GLM-4.5: Reasoning, Coding, and Agentic Abililties
9 months ago
- #AI Models
- #Machine Learning
- #Natural Language Processing
- Introduction of GLM-4.5 and GLM-4.5-Air, the latest flagship models in the GLM family.
- GLM-4.5 has 355B total parameters (32B active), while GLM-4.5-Air has 106B total parameters (12B active).
- Both models unify reasoning, coding, and agentic capabilities to meet complex application requirements.
- Hybrid reasoning models with thinking mode (complex reasoning) and non-thinking mode (instant responses).
- Available on Z.ai, Z.ai API, and open-weights on HuggingFace and ModelScope.
- Comparison with models from OpenAI, Anthropic, Google DeepMind, etc., on 12 benchmarks.
- GLM-4.5 ranks 3rd overall, excelling in agentic tasks, reasoning, and coding.
- Agentic tasks: 128k context length, native function calling, and strong performance on benchmarks like TAU-bench and BFCL-v3.
- Web browsing performance: GLM-4.5 outperforms Claude-4-Opus and is close to o4-mini-high.
- Reasoning benchmarks: Strong performance in MMLU Pro, AIME24, MATH 500, and GPQA.
- Coding benchmarks: Excels in SWE-bench Verified and Terminal Bench, with high tool calling success rate (90.6%).
- Full-stack development capabilities: Frontend, backend, and database management.
- Model architecture: MoE with loss-free balance routing, sigmoid gates, and Grouped-Query Attention.
- Training stages: Pre-training on 15T general tokens and 7T code & reasoning tokens, followed by domain-specific fine-tuning.
- Reinforcement Learning (RL) infrastructure 'slime' for efficient and scalable training.
- Post-training enhancements: Supervised fine-tuning and specialized RL for reasoning and agentic tasks.