Hasty Briefsbeta

Bilingual

Launch HN: IonRouter (YC W26) – High-throughput, low-cost inference

a day ago
  • #AI Inference
  • #GPU Optimization
  • #Machine Learning
  • High throughput, low cost inference powered by IonAttention.
  • Custom inference stack multiplexes models on a single GPU with ms swaps and real-time traffic adaptation.
  • Supports finetunes, custom LoRAs, or any open-source model with dedicated GPU streams and per-second billing.
  • Used for robotics perception, surveillance, game asset generation, and AI video pipelines.
  • Five vision-language models on a single GPU with 2,700 video clips and <1s cold starts.
  • Compatible with OpenAI client via a one-line change.
  • Pay-per-million-tokens pricing with no idle costs.
  • Features flagship models from ZhiPu AI, MoonShot AI, MiniMax, Cumulus, and others.
  • Includes a 14B text-to-video model generating clips in under 10 seconds.
  • Fast image generation in sub-4-seconds for real-time applications.
  • No GPU expertise required; start in under a minute.