Step 3.7 Flash – Open-source multimodal model for speed and agents

13 hours ago

#Enterprise AI
#AI Agents
#Multimodal AI

Step 3.7 Flash is a high-efficiency AI agent model focused on agent efficiency, featuring native multimodal understanding and acting across images, documents, and natural scenes.
It enhances web and visual search with deeper retrieval capabilities and improved recognition of long-tail entities and emerging concepts.
The model offers reliable tool use and orchestration, integrating with mainstream agent harnesses like Claude Code and KiloCode for lower integration costs.
In agentic coding, Step 3.7 Flash shows gains over Step 3.5 Flash, with improvements on benchmarks like SWE-Bench Pro and Terminal-Bench 2.1, and supports Advisor Mode for cost-effective performance.
Optimized for enterprise tasks, it excels in autonomous execution and domain-specific knowledge, validated on benchmarks such as Toolathlon and ClawEval-1.1.
Step 3.7 Flash demonstrates strong visual capabilities, including visual search and reasoning with Python tools, achieving performance comparable to larger models on tasks like V* and HR-Bench.
It operates graphical user interfaces (GUI) for tasks like app interactions, showing improvements in stability and long-horizon completion on the Android Daily benchmark.
Benchmark results highlight its competitiveness in reasoning, coding, and agentic capabilities against models like DeepSeek V4, Gemini, and Claude Opus.
The model is available through StepFun Open Platform and partners, with deployment options for cloud, data center, and local environments on high-memory devices.

Hasty Briefsbeta

Step 3.7 Flash – Open-source multimodal model for speed and agents