Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs
4 days ago
- #LLM
- #Transformer
- #Neuroanatomy
- The author topped the HuggingFace Open LLM Leaderboard by duplicating a block of seven middle layers in a 72-billion parameter model without changing any weights.
- Key observations included the model's ability to process Base64 inputs and outputs effectively, suggesting early layers act as translators and late layers as re-translators.
- The Goliath-120b model's unconventional layer arrangement demonstrated that transformer layers are more interchangeable than previously thought.
- The author developed a 'brain scanner' to test hypotheses by duplicating layers and measuring performance on math and emotional intelligence probes.
- Optimal performance was achieved by duplicating layers 45 to 52 in the Qwen2-72B model, resulting in a 78-billion parameter model named RYS-XLarge.
- The method improved performance on multiple benchmarks, including a 17.72% boost on MuSR and 8.16% on MATH, without fine-tuning.
- Heatmaps revealed that transformer models have functional circuits in their middle layers, which perform complete cognitive operations.
- The author suggests that fine-tuning the junction between duplicated layers could further improve performance without additional VRAM usage.
- The method was validated by subsequent fine-tuned models that dominated the leaderboard, proving its effectiveness.