My trick for getting consistent classification from LLMs
9 days ago
- #classification
- #LLM
- #vectorization
- LLMs can be inconsistent in generating class labels, but semantically consistent.
- A technique using vectorization and clustering can achieve deterministic labeling from stochastic LLMs.
- Vectorization leads to label convergence, reducing the number of unique labels significantly.
- Initial costs and latency are higher with vectorization, but they decrease over time as cache hits increase.
- By the 10,000th tweet, vectorization is 10x cheaper and more efficient than pure LLM classification.
- A Golang implementation is provided, including embedding generation, cache checks, and clustering logic.
- Benchmarking shows the effectiveness of the technique in terms of cost, latency, and scalability.