Launch HN: IonRouter (YC W26) – High-throughput, low-cost inference
a day ago
- #AI Inference
- #GPU Optimization
- #Machine Learning
- High throughput, low cost inference powered by IonAttention.
- Custom inference stack multiplexes models on a single GPU with ms swaps and real-time traffic adaptation.
- Supports finetunes, custom LoRAs, or any open-source model with dedicated GPU streams and per-second billing.
- Used for robotics perception, surveillance, game asset generation, and AI video pipelines.
- Five vision-language models on a single GPU with 2,700 video clips and <1s cold starts.
- Compatible with OpenAI client via a one-line change.
- Pay-per-million-tokens pricing with no idle costs.
- Features flagship models from ZhiPu AI, MoonShot AI, MiniMax, Cumulus, and others.
- Includes a 14B text-to-video model generating clips in under 10 seconds.
- Fast image generation in sub-4-seconds for real-time applications.
- No GPU expertise required; start in under a minute.