Hasty Briefsbeta

Bilingual

GLM-4.5V: An open-source multimodal large language model from Zhipu AI

9 months ago
  • #Open Source
  • #Multimodal AI
  • #Vision-Language Models
  • GLM-4.5V and GLM-4.1V series models are open-sourced, enhancing vision-language model (VLM) reasoning capabilities.
  • GLM-4.5V offers significant improvements across multiple benchmarks and includes a desktop assistant app for debugging.
  • GLM-4.1V-9B-Thinking introduces a reasoning paradigm and RLCS for enhanced capabilities, outperforming larger models in 18 tasks.
  • Both models support multimodal preprocessing but use different conversation templates.
  • Installation and inference steps are provided for NVIDIA GPUs, with options for SGLang and vLLM.
  • Fine-tuning support is available via LLaMA-Factory, with dataset construction examples provided.
  • GLM-4.5V focuses on real-world usability, handling diverse visual content types and introducing a Thinking Mode switch.
  • Known issues include frontend code reproduction errors, overthinking, and occasional answer restatement.
  • Citations and technical details are provided for academic use.