An LLM Query Understanding Service
a year ago
- #LLM
- #Kubernetes
- #Search
- Using LLMs to enhance search capabilities by structuring queries into dimensions like color, material, and category.
- Deploying a FastAPI app with an open-source LLM (Qwen2-7B) for query understanding without relying on external APIs.
- Creating a Docker image for the service and deploying it on Google Kubernetes Engine (GKE) in autopilot mode.
- Setting up a Kubernetes deployment with GPU resources and persistent storage for model data.
- Implementing a cache using Valkey to avoid repeated LLM calls for the same queries.
- Refactoring the service to parse search queries into structured JSON responses.
- Monitoring and optimizing the deployment, including load testing and prompt tuning.