Startup's new mechanistic interpretability tool lets you debug LLMs

11 hours ago

Goodfire released Silico, a mechanistic interpretability tool for debugging and adjusting LLM parameters during training.
The tool aims to make AI model development more like engineering and science, providing fine-grained control over behavior.
Silico allows users to inspect and tweak specific neurons or groups within models, adjusting behaviors like reducing hallucinations or flipping ethical decisions.
It automates interpretability work using agents, making it accessible to smaller firms and research teams beyond top labs.
Goodfire's approach includes mapping neurons, tracing pathways, and filtering training data to steer model behavior and address flaws.
The tool is available for a fee on a case-by-case basis, with potential applications in safety-critical fields like healthcare and finance.
While some researchers view it as a step towards precision, others argue it adds precision to alchemy rather than fully principled engineering.

Hasty Briefsbeta