Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models
5 hours ago
- #Computer Vision
- #YOLO
- #Real-time Detection
- Ultralytics YOLO26 is a unified real-time vision model family addressing limitations of previous YOLO detectors, such as reliance on non-maximum suppression (NMS), heavy detection heads from Distribution Focal Loss (DFL), long training schedules, and lack of positive label assignments for small objects.
- YOLO26 features a dual-head design for NMS-free end-to-end inference and removes DFL, resulting in a lighter head with unconstrained regression range. Its training pipeline includes MuSGD (a hybrid Muon-SGD optimizer), Progressive Loss, and STAL for label assignment ensuring small object coverage.
- The model supports detection, instance segmentation, pose estimation, classification, and oriented detection across five scales (n/s/m/l/x) in a single pipeline, with YOLOE-26 offering open-vocabulary extension for text-, visual-, and prompt-free inference.
- Performance metrics show YOLO26 achieves 40.9-57.5 mAP on COCO at 1.7-11.8 ms T4 TensorRT latency, advancing accuracy-latency Pareto efficiency, while YOLOE-26x reaches 40.6 AP on LVIS minival under text prompting.