Step 3.5 Flash: Fast Enough to Think. Reliable Enough to Act
6 days ago
- #AI
- #Machine Learning
- #Foundation Models
- Step 3.5 Flash is an open-source foundation model with 196B parameters, activating only 11B per token for efficient reasoning and agentic capabilities.
- Features deep reasoning at speed with 100–300 tok/s generation throughput, powered by Multi-Token Prediction (MTP-3).
- Excels in coding and agentic tasks, scoring 74.4% on SWE-bench Verified and 51.0% on Terminal-Bench 2.0.
- Supports a cost-efficient 256K context window using a 3:1 Sliding Window Attention (SWA) ratio.
- Optimized for local deployment on high-end consumer hardware like Mac Studio M4 Max and NVIDIA DGX Spark.
- Demonstrates superior tool-use capabilities, orchestrating complex tasks like stock investment scenarios with seamless MCP integration.
- Achieves high scores on elite logic and mathematics benchmarks, including AIME 2025 (99.8) and HMMT 2025 Nov. (98.0).
- Supports agentic coding, decomposing complex requirements into actionable steps within a codebase.
- Performs well in deep research tasks, scoring 65.27% on Scale AI Research Rubrics.
- Features a multi-agent orchestration framework for complex task handling.
- Enables edge-cloud collaboration, enhancing performance in complex scenarios like AndroidDaily Hard tasks.
- Shows reliability in interaction, with proactive intent clarification and professional advisory capabilities.
- Built on a sparse Mixture of Experts (MoE) architecture with optimized decoding and inference speeds.
- Scalable RL framework (MIS-PO) ensures stable, long-horizon optimization for continuous self-improvement.
- Benchmarked against top open-source models, showing strong performance in reasoning, coding, and agentic capabilities.
- Known issues include reliance on longer generation trajectories and reduced stability in specialized domains.