Startup's new mechanistic interpretability tool lets you debug LLMs
11 hours ago
- #AI Debugging
- #Mechanistic Interpretability
- #LLM Development
- Goodfire released Silico, a mechanistic interpretability tool for debugging and adjusting LLM parameters during training.
- The tool aims to make AI model development more like engineering and science, providing fine-grained control over behavior.
- Silico allows users to inspect and tweak specific neurons or groups within models, adjusting behaviors like reducing hallucinations or flipping ethical decisions.
- It automates interpretability work using agents, making it accessible to smaller firms and research teams beyond top labs.
- Goodfire's approach includes mapping neurons, tracing pathways, and filtering training data to steer model behavior and address flaws.
- The tool is available for a fee on a case-by-case basis, with potential applications in safety-critical fields like healthcare and finance.
- While some researchers view it as a step towards precision, others argue it adds precision to alchemy rather than fully principled engineering.