Show HN: OSS implementation of Test Time Diffusion that runs on a 24gb GPU

15 days ago

Copy Link

TTD-RAG is a deep research agent submitted for the MMU-RAG Competition.
The system implements the 'Deep Researcher with Test-Time Diffusion (TTD-DR)' framework.
Report generation is modeled as an iterative 'denoising' process.
Features include Test-Time Diffusion Framework, Report-Level Denoising with Retrieval, Component-wise Self-Evolution, and High-Performance Serving.
The agent operates in three stages: Planning & Initial Drafting, Iterative Search & Denoising, and Final Report Generation.
Technologies used include FastAPI, vLLM, Qwen/Qwen3-4B-Instruct-2507, tomaarsen/Qwen3-Reranker-0.6B-seq-cls, FineWeb Search API, and Docker.
Setup requires Docker, NVIDIA GPU with 24GB+ VRAM, and API keys for FINEWEB_API_KEY and OPENROUTER_API_KEY.
The API includes endpoints for Health Check, Dynamic Evaluation (/run), and Static Evaluation (/evaluate).
AWS CLI commands are provided for pushing the Docker image to the competition's ECR repository.

Hasty Briefsbeta