Step 3.7 Flash – Open-source multimodal model for speed and agents
13 hours ago
- #Enterprise AI
- #AI Agents
- #Multimodal AI
- Step 3.7 Flash is a high-efficiency AI agent model focused on agent efficiency, featuring native multimodal understanding and acting across images, documents, and natural scenes.
- It enhances web and visual search with deeper retrieval capabilities and improved recognition of long-tail entities and emerging concepts.
- The model offers reliable tool use and orchestration, integrating with mainstream agent harnesses like Claude Code and KiloCode for lower integration costs.
- In agentic coding, Step 3.7 Flash shows gains over Step 3.5 Flash, with improvements on benchmarks like SWE-Bench Pro and Terminal-Bench 2.1, and supports Advisor Mode for cost-effective performance.
- Optimized for enterprise tasks, it excels in autonomous execution and domain-specific knowledge, validated on benchmarks such as Toolathlon and ClawEval-1.1.
- Step 3.7 Flash demonstrates strong visual capabilities, including visual search and reasoning with Python tools, achieving performance comparable to larger models on tasks like V* and HR-Bench.
- It operates graphical user interfaces (GUI) for tasks like app interactions, showing improvements in stability and long-horizon completion on the Android Daily benchmark.
- Benchmark results highlight its competitiveness in reasoning, coding, and agentic capabilities against models like DeepSeek V4, Gemini, and Claude Opus.
- The model is available through StepFun Open Platform and partners, with deployment options for cloud, data center, and local environments on high-memory devices.