In statistics and machine learning, feature selection (also called attribute selection, variable subset selection or variable selection) is the process of proposing a subset of related features (variables, predictors) for the construction of a model. Feature selection techniques are widely applied for three reasons: simplification of models to make them easier to interpret by users/researchers, shorter training times, enhanced generalization by reducing over fitting. The central premise when adopting a feature selection technique is that the data contains lots of features that is either irrelevant or redundant, and can thus be removed without causing much loss of information. Irrelevant or redundant features are two different notions, since one relevant feature may be redundant in the presence of another relevant feature with which it is correlated to a large extent.

A feature selection algorithm can be considered as the combination of a search technique for selecting new feature subsets and an evaluation measure which scores the distinct feature subsets. The simplest algorithm is to test each subset of features finding the one that minimizes the error rate. This is an exhaustive search of the space, and is computationally intractable for all but the smallest of feature sets. The choice of evaluation metric heavily influences the algorithm, and it is these evaluation metrics that distinguish between the three main sections of feature selection algorithms:

