Hasty Briefsbeta

Bilingual

Step 3.7 Flash – 198B-A11B MoE vision-language model

6 hours ago
  • #Model Deployment
  • #Multimodal AI
  • #AI Model
  • Step 3.7 Flash is a 198B-parameter sparse Mixture-of-Experts vision-language model with native image understanding capabilities.
  • It supports a 256k context window and three selectable reasoning levels for balancing speed, cost, and cognitive depth.
  • The model achieves high performance on benchmarks like SimpleVQA (79.2) and ClawEval-1.1 (67.1), demonstrating strong visual grounding and tool orchestration.
  • It can be deployed using various methods including Transformers, vLLM, SGLang, and llama.cpp, with local inference support on high-memory devices.
  • Pricing is tiered for input tokens: $0.20/M (cache miss), $0.04/M (cache hit), and $1.15/M for output tokens.
  • Step 3.7 Flash is available on the StepFun Open Platform, OpenRouter, NVIDIA NIM, and will be expanded to partners like DeepInfra and Fireworks AI.