Show HN: How I Topped the HuggingFace Open LLM Leaderboard on Two Gaming GPUs

4 days ago

The author topped the HuggingFace Open LLM Leaderboard by duplicating a block of seven middle layers in a 72-billion parameter model without changing any weights.
Key observations included the model's ability to process Base64 inputs and outputs effectively, suggesting early layers act as translators and late layers as re-translators.
The Goliath-120b model's unconventional layer arrangement demonstrated that transformer layers are more interchangeable than previously thought.
The author developed a 'brain scanner' to test hypotheses by duplicating layers and measuring performance on math and emotional intelligence probes.
Optimal performance was achieved by duplicating layers 45 to 52 in the Qwen2-72B model, resulting in a 78-billion parameter model named RYS-XLarge.
The method improved performance on multiple benchmarks, including a 17.72% boost on MuSR and 8.16% on MATH, without fine-tuning.
Heatmaps revealed that transformer models have functional circuits in their middle layers, which perform complete cognitive operations.
The author suggests that fine-tuning the junction between duplicated layers could further improve performance without additional VRAM usage.
The method was validated by subsequent fine-tuned models that dominated the leaderboard, proving its effectiveness.

Hasty Briefsbeta