Hasty Briefsbeta

Bilingual

Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

4 days ago
  • #LLM
  • #Transformer
  • #Neuroanatomy
  • The author topped the HuggingFace Open LLM Leaderboard by duplicating a block of seven middle layers in a 72-billion parameter model without changing any weights.
  • Key observations included the model's ability to process Base64 inputs and outputs effectively, suggesting early layers act as translators and late layers as re-translators.
  • The Goliath-120b model's unconventional layer arrangement demonstrated that transformer layers are more interchangeable than previously thought.
  • The author developed a 'brain scanner' to test hypotheses by duplicating layers and measuring performance on math and emotional intelligence probes.
  • Optimal performance was achieved by duplicating layers 45 to 52 in the Qwen2-72B model, resulting in a 78-billion parameter model named RYS-XLarge.
  • The method improved performance on multiple benchmarks, including a 17.72% boost on MuSR and 8.16% on MATH, without fine-tuning.
  • Heatmaps revealed that transformer models have functional circuits in their middle layers, which perform complete cognitive operations.
  • The author suggests that fine-tuning the junction between duplicated layers could further improve performance without additional VRAM usage.
  • The method was validated by subsequent fine-tuned models that dominated the leaderboard, proving its effectiveness.