An LLM Query Understanding Service

a year ago

Using LLMs to enhance search capabilities by structuring queries into dimensions like color, material, and category.
Deploying a FastAPI app with an open-source LLM (Qwen2-7B) for query understanding without relying on external APIs.
Creating a Docker image for the service and deploying it on Google Kubernetes Engine (GKE) in autopilot mode.
Setting up a Kubernetes deployment with GPU resources and persistent storage for model data.
Implementing a cache using Valkey to avoid repeated LLM calls for the same queries.
Refactoring the service to parse search queries into structured JSON responses.
Monitoring and optimizing the deployment, including load testing and prompt tuning.

Hasty Briefsbeta