Cedana (YC S23) Is Hiring

6 hours ago

Cedana offers GPU checkpointing infrastructure to enhance AI and HPC cluster utilization and reliability, enabling transparent migration of workloads across instances without data loss.
Their system operates at the kernel/OS level, requires no code changes, and integrates with Kubernetes, SLURM, and NVIDIA Dynamo, targeting deployment in inference platforms, neoclouds, and enterprise clusters.
The founding team has extensive experience in AI, with research published in NeurIPS and CVPR, and a background in building reliable systems like warehouse automation and healthcare AI.
A Forward Deployed Engineer role involves end-to-end technical engagement, deploying Cedana in diverse environments (e.g., SLURM, Kubernetes), and solving customer pain points with full ownership from OS to observability.
Cedana aims to revolutionize compute resource allocation by building a global, real-time system for workloads like HPC and ML, using a deep-tech approach at the Linux Kernel layer and hardware level.

Hasty Briefsbeta