Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act

3 months ago

#AI
#Machine Learning
#Foundation Models

Step 3.5 Flash is an open-source foundation model with 196B parameters, activating only 11B per token for efficient reasoning and agentic capabilities.
Features deep reasoning at speed with 100–300 tok/s generation throughput, powered by Multi-Token Prediction (MTP-3).
Excels in coding and agentic tasks, scoring 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0.
Supports a cost-efficient 256K context window using a 3:1 Sliding Window Attention (SWA) ratio.
Optimized for local deployment on high-end consumer hardware like Mac Studio M4 Max and NVIDIA DGX Spark.
Demonstrates superior tool-use capabilities, orchestrating complex tasks like stock investment scenarios with seamless MCP integration.
Achieves high scores on elite logic and mathematics benchmarks, including AIME 2025 (99.8) and HMMT 2025 Nov. (98.0).
Supports agentic coding, decomposing complex requirements into actionable steps within a codebase.
Performs well in deep research tasks, scoring 65.27% on Scale AI Research Rubrics.
Features a multi-agent orchestration framework for complex task handling.
Enables edge-cloud collaboration, enhancing performance in complex scenarios like AndroidDaily Hard tasks.
Shows reliability in interaction, with proactive intent clarification and professional advisory capabilities.
Built on a sparse Mixture of Experts (MoE) architecture with optimized decoding and inference speeds.
Scalable RL framework (MIS-PO) ensures stable, long-horizon optimization for continuous self-improvement.
Benchmarked against top open-source models, showing strong performance in reasoning, coding, and agentic capabilities.
Known issues include reliance on longer generation trajectories and reduced stability in specialized domains.

Hasty Briefsbeta

Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act