How Can Reinforcement Learning Achieve Expert-Level [Chip] Placement?

9 hours ago

Chip placement is crucial in physical design, but RL-based methods focusing on wirelength optimization often fail to achieve expert-quality layouts.
The reward design is identified as the main cause of the performance gap with experts.
The approach bypasses formalizing complex processes by learning directly from expert layouts to derive a reward model.
It infers step-by-step expert trajectories from final expert layouts.
Using these trajectories as demonstrations or preferences, a model is trained to capture latent implicit rewards in expert results.
Experiments show the framework can learn efficiently from even a single design and generalize well to unseen cases.

Hasty Briefsbeta