DeepSeek-V4-Flash means LLM steering is interesting again
2 hours ago
- #Model Interpretability
- #LLM Steering
- #DeepSeek-V4-Flash
- DeepSeek-V4-Flash makes LLM steering practical for local models, enabling engineers to experiment with guiding outputs via activation manipulation.
- Steering involves extracting concepts like 'respond tersely' from model activations and boosting them during inference, using methods from simple vector subtraction to advanced techniques like sparse autoencoders.
- DwarfStar 4 incorporates steering, and its recent release may spur community efforts to extract and share boostable features from open models.
- Steering is underrepresented because big labs prefer training models directly, while API users lack access to necessary weights and activations, and prompting often achieves similar results efficiently.
- Potential applications include steering for unpromptable traits like intelligence or compressing extensive knowledge into vectors, though these face challenges comparable to full model training.
- The future of steering in open-source is uncertain, with practicality to be determined in coming months, but it remains a fascinating area for exploration.