GLM-4.5: Reasoning, Coding, and Agentic Abililties

10 months ago

#AI Models
#Machine Learning
#Natural Language Processing

Introduction of GLM-4.5 and GLM-4.5-Air, the latest flagship models in the GLM family.
GLM-4.5 has 355B total parameters (32B active), while GLM-4.5-Air has 106B total parameters (12B active).
Both models unify reasoning, coding, and agentic capabilities to meet complex application requirements.
Hybrid reasoning models with thinking mode (complex reasoning) and non-thinking mode (instant responses).
Available on Z.ai, Z.ai API, and open-weights on HuggingFace and ModelScope.
Comparison with models from OpenAI, Anthropic, Google DeepMind, etc., on 12 benchmarks.
GLM-4.5 ranks 3rd overall, excelling in agentic tasks, reasoning, and coding.
Agentic tasks: 128k context length, native function calling, and strong performance on benchmarks like TAU-bench and BFCL-v3.
Web browsing performance: GLM-4.5 outperforms Claude-4-Opus and is close to o4-mini-high.
Reasoning benchmarks: Strong performance in MMLU Pro, AIME24, MATH 500, and GPQA.
Coding benchmarks: Excels in SWE-bench Verified and Terminal Bench, with high tool calling success rate (90.6%).
Full-stack development capabilities: Frontend, backend, and database management.
Model architecture: MoE with loss-free balance routing, sigmoid gates, and Grouped-Query Attention.
Training stages: Pre-training on 15T general tokens and 7T code & reasoning tokens, followed by domain-specific fine-tuning.
Reinforcement Learning (RL) infrastructure 'slime' for efficient and scalable training.
Post-training enhancements: Supervised fine-tuning and specialized RL for reasoning and agentic tasks.

Hasty Briefsbeta

GLM-4.5: Reasoning, Coding, and Agentic Abililties