93% of GPT-4 performance at 1/4 cost: LLM routing with weak bandit feedback
9 days ago
- #LLM Routing
- #Machine Learning
- #Contextual Bandit
- LLM routing dynamically selects the most suitable LLM for each query/task.
- Previous approaches treat LLM routing as a supervised learning problem with assumed optimal query-LLM pairings.
- Real-world scenarios lack comprehensive mappings and face evolving user queries.
- Proposes studying LLM routing as a contextual bandit problem for adaptive decision-making.
- Develops a shared embedding space for queries and LLMs to reflect their affinity.
- Introduces PILOT (Preference-prior Informed Linucb fOr adaptive rouTing), an extension of LinUCB.
- Addresses diverse user budgets with an online cost policy modeled as a multi-choice knapsack problem.