Hasty Briefsbeta

Paying attention to feature distribution alignment (pun intended)

16 days ago
  • #feature distribution
  • #polynomial features
  • #machine learning
  • The post discusses the importance of feature distribution alignment in machine learning, particularly focusing on polynomial features.
  • It introduces the concept of orthogonality in polynomial bases and its relation to informativeness, where orthogonal bases produce uncorrelated features.
  • The Legendre polynomial basis is highlighted for its orthogonality, which is beneficial if features are uniformly distributed, but real data often isn't.
  • Two practical methods are proposed to address non-uniform data distribution: the mapping trick and the multiplication by one trick.
  • The mapping trick involves transforming features using their CDF to align with the Legendre basis, ensuring orthogonality w.r.t the data distribution.
  • A simulation demonstrates that using the CDF-based mapping results in nearly uncorrelated features, improving model performance.
  • The post compares the performance of models using min-max scaling versus CDF-based orthogonal features, showing better results with the latter.
  • The multiplication by one trick is a more complex method that allows for custom weight functions but is less practical due to its complexity.
  • The post concludes by emphasizing the importance of aligning feature distributions with orthogonal bases for better model performance.