Hasty Briefsbeta

Bilingual

Open Weights Isn't Open Training

4 days ago
  • #machine-learning
  • #model-training
  • #open-source
  • Open-source ML infrastructure often has hidden bugs and inefficiencies, especially for large models.
  • Attempting to post-train a 1T+ parameter model (Kimi-K2-Thinking) revealed multiple issues with existing tools like HuggingFace and LLaMA-Factory.
  • Key problems included slow compression, uneven GPU memory distribution, and incompatibility with LoRA training due to quantized weights.
  • Solutions involved manual fixes like skipping unnecessary compression, adjusting GPU memory allocation, and modifying forward passes to handle dequantization.
  • Despite getting the model to train, performance was suboptimal, highlighting the challenges of open-source ML infrastructure for large-scale models.
  • The experience underscored the need for better, more reliable tools in the open-source ML ecosystem.