Unsuccessfully training AI to play my favorite niche childhood game

a day ago

Trained 1,500 parallel bots to play DDNet, a cooperative 2D platformer with complex mechanics.
Initial approach involved running neural network inference in C++ but faced issues with policy collapse.
Switched to Python for better debugging and moved bots into the game server process to reduce latency.
Used shared memory for communication between C++ and Python, avoiding serialization overhead.
Observation space includes tile grid, player state, and checkpoint info; action space includes movement and aiming.
First reward function based on checkpoints failed due to map design inconsistencies.
Reward engineering evolved to use waypoint-based rewards derived from human replays to handle teleporters and map quirks.
Implemented adaptive checkpoint respawning to optimize training by focusing on challenging sections.
Future goals include curriculum learning across maps, scaling compute, and exploring different game modes.
Learned the importance of infrastructure, logging, and debugging in reinforcement learning projects.

Hasty Briefsbeta