Improving Assembly Code Performance with LLMss via Reinforcement Learning
a year ago
- #LLMs
- #Code Optimization
- #Reinforcement Learning
- Large language models (LLMs) can optimize assembly code performance using reinforcement learning.
- A reinforcement learning framework trains LLMs with Proximal Policy Optimization (PPO) for code optimization.
- The reward function evaluates both functional correctness and execution performance against gcc -O3.
- A benchmark of 8,072 real-world programs was introduced for the study.
- The model Qwen2.5-Coder-7B-PPO achieves 96.0% test pass rates and 1.47x speedup over gcc -O3.
- The model outperforms 20 other evaluated models, including Claude-3.7-sonnet.