Paying attention to feature distribution alignment (pun intended)
16 days ago
- #feature distribution
- #polynomial features
- #machine learning
- The post discusses the importance of feature distribution alignment in machine learning, particularly focusing on polynomial features.
- It introduces the concept of orthogonality in polynomial bases and its relation to informativeness, where orthogonal bases produce uncorrelated features.
- The Legendre polynomial basis is highlighted for its orthogonality, which is beneficial if features are uniformly distributed, but real data often isn't.
- Two practical methods are proposed to address non-uniform data distribution: the mapping trick and the multiplication by one trick.
- The mapping trick involves transforming features using their CDF to align with the Legendre basis, ensuring orthogonality w.r.t the data distribution.
- A simulation demonstrates that using the CDF-based mapping results in nearly uncorrelated features, improving model performance.
- The post compares the performance of models using min-max scaling versus CDF-based orthogonal features, showing better results with the latter.
- The multiplication by one trick is a more complex method that allows for custom weight functions but is less practical due to its complexity.
- The post concludes by emphasizing the importance of aligning feature distributions with orthogonal bases for better model performance.