Hasty Briefsbeta

93% of GPT-4 performance at 1/4 cost: LLM routing with weak bandit feedback

9 days ago
  • #LLM Routing
  • #Machine Learning
  • #Contextual Bandit
  • LLM routing dynamically selects the most suitable LLM for each query/task.
  • Previous approaches treat LLM routing as a supervised learning problem with assumed optimal query-LLM pairings.
  • Real-world scenarios lack comprehensive mappings and face evolving user queries.
  • Proposes studying LLM routing as a contextual bandit problem for adaptive decision-making.
  • Develops a shared embedding space for queries and LLMs to reflect their affinity.
  • Introduces PILOT (Preference-prior Informed Linucb fOr adaptive rouTing), an extension of LinUCB.
  • Addresses diverse user budgets with an online cost policy modeled as a multi-choice knapsack problem.