Hasty Briefsbeta

Bilingual

GitHub - deepseek-ai/DeepGEMM: DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

4 hours ago
  • #Kernel Library
  • #CUDA
  • #LLM Optimization
  • DeepGEMM is a unified, high-performance CUDA kernel library for modern LLMs, featuring GEMMs (FP8, FP4, BF16), fused MoE with overlapped communication (Mega MoE), MQA scoring, and HyperConnection.
  • Kernels are JIT-compiled at runtime with a lightweight module, requiring no CUDA compilation during installation, and performance matches or exceeds expert-tuned libraries.
  • Supports SM90/SM100 architectures, with environment variables for configuration (e.g., DG_JIT_USE_NVRTC), and includes utility functions for alignment, scaling factor transformation, and memory management.
  • Provides specialized APIs for grouped GEMMs (M-axis and K-axis) for MoE models, masked GEMMs for inference decoding, and MQA logits kernels for attention mechanisms.
  • Mega MoE fuses multiple operations into a single kernel, overlapping NVLink communication and computation, and requires symmetric memory allocation.
  • Released under the MIT License, with ongoing updates and performance comparisons documented via GitHub issues.