Hasty Briefsbeta

Bilingual

Improving Assembly Code Performance with LLMss via Reinforcement Learning

a year ago
  • #LLMs
  • #Code Optimization
  • #Reinforcement Learning
  • Large language models (LLMs) can optimize assembly code performance using reinforcement learning.
  • A reinforcement learning framework trains LLMs with Proximal Policy Optimization (PPO) for code optimization.
  • The reward function evaluates both functional correctness and execution performance against gcc -O3.
  • A benchmark of 8,072 real-world programs was introduced for the study.
  • The model Qwen2.5-Coder-7B-PPO achieves 96.0% test pass rates and 1.47x speedup over gcc -O3.
  • The model outperforms 20 other evaluated models, including Claude-3.7-sonnet.