Don't let the LLM speak, just probe it

6 hours ago

LLMs can make classification decisions before token generation, allowing direct extraction of hidden states for classification.
Use a small MLP or linear probe on the hidden state at the last prompt token to create a fast, zero-shot classifier.
Train an optional LoRA to write verdicts, reshaping geometry for clearer decision extraction without generating text.
Optimize by caching KV for content to score multiple criteria efficiently, though this may hinder interaction for complex cases.
Technique powers applications like Predicate in safety stacks, offering low-cost, high-speed structural question answering.

Hasty Briefsbeta