My trick for getting consistent classification from LLMs

9 days ago

Copy Link

LLMs can be inconsistent in generating class labels, but semantically consistent.
A technique using vectorization and clustering can achieve deterministic labeling from stochastic LLMs.
Vectorization leads to label convergence, reducing the number of unique labels significantly.
Initial costs and latency are higher with vectorization, but they decrease over time as cache hits increase.
By the 10,000th tweet, vectorization is 10x cheaper and more efficient than pure LLM classification.
A Golang implementation is provided, including embedding generation, cache checks, and clustering logic.
Benchmarking shows the effectiveness of the technique in terms of cost, latency, and scalability.

Hasty Briefsbeta