Cedana (YC S23) Is Hiring
6 hours ago
- #GPU Migration
- #HPC Optimization
- #AI Infrastructure
- Cedana offers GPU checkpointing infrastructure to enhance AI and HPC cluster utilization and reliability, enabling transparent migration of workloads across instances without data loss.
- Their system operates at the kernel/OS level, requires no code changes, and integrates with Kubernetes, SLURM, and NVIDIA Dynamo, targeting deployment in inference platforms, neoclouds, and enterprise clusters.
- The founding team has extensive experience in AI, with research published in NeurIPS and CVPR, and a background in building reliable systems like warehouse automation and healthcare AI.
- A Forward Deployed Engineer role involves end-to-end technical engagement, deploying Cedana in diverse environments (e.g., SLURM, Kubernetes), and solving customer pain points with full ownership from OS to observability.
- Cedana aims to revolutionize compute resource allocation by building a global, real-time system for workloads like HPC and ML, using a deep-tech approach at the Linux Kernel layer and hardware level.