Feature Extraction with KNN

16 days ago

Copy Link

The fastknn package provides a function for feature extraction using KNN, generating k * c new features based on distances between observations and their k nearest neighbors within each class.
The feature extraction process uses an n-fold CV approach to avoid overfitting and supports parallelization via the nthread parameter.
The technique is inspired by the winner solution of the Otto Group Product Classification Challenge on Kaggle.
An example demonstrates that KNN features can capture non-linear information that linear models like GLM cannot, improving accuracy from 83.81% to 95.24%.
Additional examples with chess and spirals datasets show how KNN features can transform the original space to make classes linearly separable.
The knnExtract() function is showcased in a Kaggle Kernel for large datasets, highlighting its practical application.

Hasty Briefsbeta