Launch HN: IonRouter (YC W26) – High-throughput, low-cost inference

a day ago

High throughput, low cost inference powered by IonAttention.
Custom inference stack multiplexes models on a single GPU with ms swaps and real-time traffic adaptation.
Supports finetunes, custom LoRAs, or any open-source model with dedicated GPU streams and per-second billing.
Used for robotics perception, surveillance, game asset generation, and AI video pipelines.
Five vision-language models on a single GPU with 2,700 video clips and <1s cold starts.
Compatible with OpenAI client via a one-line change.
Pay-per-million-tokens pricing with no idle costs.
Features flagship models from ZhiPu AI, MoonShot AI, MiniMax, Cumulus, and others.
Includes a 14B text-to-video model generating clips in under 10 seconds.
Fast image generation in sub-4-seconds for real-time applications.
No GPU expertise required; start in under a minute.

Hasty Briefsbeta