Hasty Briefsbeta

Show HN: Minimal DL library in C – 24 NAIVE CUDA/CPU ops, autodiff, Python API

2 days ago
  • #ML systems
  • #Deep Learning
  • #GPU programming
  • ML systems and GPU programming exercise to build a small DL stack end-to-end.
  • Blackwell-optimized CUDA kernels under active development.
  • PyTorch internals explainer with notes/diagrams on core pieces.
  • Book planned for longer-form writeup of design and lessons learned.
  • Minimal DL library in C with core CUDA/CPU ops, autodiff, and backprop engine.
  • Tensor abstraction with strides/views and complex indexing like numpy.
  • Python API bindings for ops, layers, and models.
  • Training components: optimizers, weight initializers, saving/loading params.
  • Tooling includes computation-graph visualizer and autogenerated tests.
  • Automatic cleanup of intermediate tensors for memory management.
  • Project built as an ML systems learning project without AI assistance.
  • Commands provided to define and train Conv-Net and MLP on GPU/CPU.
  • Visualization of model graph and running generated test code.
  • Environment setup instructions for running generated test code.
  • Data download instructions for CIFAR-10 dataset.