Hasty Briefsbeta

Bilingual

GLM-4.5: Reasoning, Coding, and Agentic Abililties

9 months ago
  • #AI Models
  • #Machine Learning
  • #Natural Language Processing
  • Introduction of GLM-4.5 and GLM-4.5-Air, the latest flagship models in the GLM family.
  • GLM-4.5 has 355B total parameters (32B active), while GLM-4.5-Air has 106B total parameters (12B active).
  • Both models unify reasoning, coding, and agentic capabilities to meet complex application requirements.
  • Hybrid reasoning models with thinking mode (complex reasoning) and non-thinking mode (instant responses).
  • Available on Z.ai, Z.ai API, and open-weights on HuggingFace and ModelScope.
  • Comparison with models from OpenAI, Anthropic, Google DeepMind, etc., on 12 benchmarks.
  • GLM-4.5 ranks 3rd overall, excelling in agentic tasks, reasoning, and coding.
  • Agentic tasks: 128k context length, native function calling, and strong performance on benchmarks like TAU-bench and BFCL-v3.
  • Web browsing performance: GLM-4.5 outperforms Claude-4-Opus and is close to o4-mini-high.
  • Reasoning benchmarks: Strong performance in MMLU Pro, AIME24, MATH 500, and GPQA.
  • Coding benchmarks: Excels in SWE-bench Verified and Terminal Bench, with high tool calling success rate (90.6%).
  • Full-stack development capabilities: Frontend, backend, and database management.
  • Model architecture: MoE with loss-free balance routing, sigmoid gates, and Grouped-Query Attention.
  • Training stages: Pre-training on 15T general tokens and 7T code & reasoning tokens, followed by domain-specific fine-tuning.
  • Reinforcement Learning (RL) infrastructure 'slime' for efficient and scalable training.
  • Post-training enhancements: Supervised fine-tuning and specialized RL for reasoning and agentic tasks.