Hasty Briefsbeta

Bilingual

An LLM Query Understanding Service

a year ago
  • #LLM
  • #Kubernetes
  • #Search
  • Using LLMs to enhance search capabilities by structuring queries into dimensions like color, material, and category.
  • Deploying a FastAPI app with an open-source LLM (Qwen2-7B) for query understanding without relying on external APIs.
  • Creating a Docker image for the service and deploying it on Google Kubernetes Engine (GKE) in autopilot mode.
  • Setting up a Kubernetes deployment with GPU resources and persistent storage for model data.
  • Implementing a cache using Valkey to avoid repeated LLM calls for the same queries.
  • Refactoring the service to parse search queries into structured JSON responses.
  • Monitoring and optimizing the deployment, including load testing and prompt tuning.