Hasty Briefsbeta

Feature Extraction with KNN

16 days ago
  • #KNN
  • #Feature Extraction
  • #Machine Learning
  • The fastknn package provides a function for feature extraction using KNN, generating k * c new features based on distances between observations and their k nearest neighbors within each class.
  • The feature extraction process uses an n-fold CV approach to avoid overfitting and supports parallelization via the nthread parameter.
  • The technique is inspired by the winner solution of the Otto Group Product Classification Challenge on Kaggle.
  • An example demonstrates that KNN features can capture non-linear information that linear models like GLM cannot, improving accuracy from 83.81% to 95.24%.
  • Additional examples with chess and spirals datasets show how KNN features can transform the original space to make classes linearly separable.
  • The knnExtract() function is showcased in a Kaggle Kernel for large datasets, highlighting its practical application.