93% of GPT-4 performance at 1/4 cost: LLM routing with weak bandit feedback

9 days ago

Copy Link

LLM routing dynamically selects the most suitable LLM for each query/task.
Previous approaches treat LLM routing as a supervised learning problem with assumed optimal query-LLM pairings.
Real-world scenarios lack comprehensive mappings and face evolving user queries.
Proposes studying LLM routing as a contextual bandit problem for adaptive decision-making.
Develops a shared embedding space for queries and LLMs to reflect their affinity.
Introduces PILOT (Preference-prior Informed Linucb fOr adaptive rouTing), an extension of LinUCB.
Addresses diverse user budgets with an online cost policy modeled as a multi-choice knapsack problem.

Hasty Briefsbeta