Paying attention to feature distribution alignment (pun intended)

16 days ago

Copy Link

The post discusses the importance of feature distribution alignment in machine learning, particularly focusing on polynomial features.
It introduces the concept of orthogonality in polynomial bases and its relation to informativeness, where orthogonal bases produce uncorrelated features.
The Legendre polynomial basis is highlighted for its orthogonality, which is beneficial if features are uniformly distributed, but real data often isn't.
Two practical methods are proposed to address non-uniform data distribution: the mapping trick and the multiplication by one trick.
The mapping trick involves transforming features using their CDF to align with the Legendre basis, ensuring orthogonality w.r.t the data distribution.
A simulation demonstrates that using the CDF-based mapping results in nearly uncorrelated features, improving model performance.
The post compares the performance of models using min-max scaling versus CDF-based orthogonal features, showing better results with the latter.
The multiplication by one trick is a more complex method that allows for custom weight functions but is less practical due to its complexity.
The post concludes by emphasizing the importance of aligning feature distributions with orthogonal bases for better model performance.

Hasty Briefsbeta