Open Weights Isn't Open Training

2 months ago

Open-source ML infrastructure often has hidden bugs and inefficiencies, especially for large models.
Attempting to post-train a 1T+ parameter model (Kimi-K2-Thinking) revealed multiple issues with existing tools like HuggingFace and LLaMA-Factory.
Key problems included slow compression, uneven GPU memory distribution, and incompatibility with LoRA training due to quantized weights.
Solutions involved manual fixes like skipping unnecessary compression, adjusting GPU memory allocation, and modifying forward passes to handle dequantization.
Despite getting the model to train, performance was suboptimal, highlighting the challenges of open-source ML infrastructure for large-scale models.
The experience underscored the need for better, more reliable tools in the open-source ML ecosystem.

Hasty Briefsbeta