Optimal Classification Cutoffs

4 months ago

Optimizing classification thresholds improves model performance by addressing imbalanced data and asymmetric costs.
Default 0.5 thresholds are often suboptimal for scenarios like medical diagnosis, fraud detection, spam filtering, and imbalanced datasets.
The library offers efficient algorithms (sort_scan, minimize, gradient, auto) for threshold optimization, with sort_scan being the fastest for large datasets.
API 2.0.0 features include clean design, auto-selection, O(n log n) optimization, cost-matrix decisions, and modern Python 3.10+ performance.
Standard optimization methods fail with piecewise-constant metrics like F1 score due to zero gradients and flat regions.
A quick example demonstrates how to find and apply optimal thresholds using the library, showing improved F1 scores over default thresholds.
The library is cited for academic use, with a reference provided for proper attribution.

Hasty Briefsbeta