Improving Assembly Code Performance with LLMss via Reinforcement Learning

a year ago

Large language models (LLMs) can optimize assembly code performance using reinforcement learning.
A reinforcement learning framework trains LLMs with Proximal Policy Optimization (PPO) for code optimization.
The reward function evaluates both functional correctness and execution performance against gcc -O3.
A benchmark of 8,072 real-world programs was introduced for the study.
The model Qwen2.5-Coder-7B-PPO achieves 96.0% test pass rates and 1.47x speedup over gcc -O3.
The model outperforms 20 other evaluated models, including Claude-3.7-sonnet.

Hasty Briefsbeta