Overview of Feature Selection Techniques





The correct selection of features for data analysis allows you to:





An assessment of the importance of attributes is necessary to interpret the results of the model.



We will consider the existing methods for selecting attributes for learning problems with and without a teacher. Each method is illustrated by an open source implementation in Python so that you can quickly test the proposed algorithms. However, this is not a complete selection: over the past 20 years, many algorithms have been created, and here you will find the most basic of them. For a deeper study, check out this review .



Models with and without a teacher



There are selection algorithms with a teacher that allow you to determine the appropriate attributes for the best quality of work of teaching tasks with the teacher (for example, in classification and regression problems). These algorithms need access to tagged data. For unlabeled data, there are also a number of feature selection methods that evaluate all features based on various criteria: variance, entropy, ability to maintain local similarity, etc. Relevant features detected using heuristic methods without a teacher can also be used in models with a teacher, because they can detect patterns other than correlation of features with the target variable.



Characteristic selection methods are usually divided into 4 categories: filters, wrappers, built-in and hybrid.



Wrappers



With this approach, we evaluate the effectiveness of a subset of features, taking into account the final result of the applied learning algorithm (for example, what is the increase in accuracy in solving the classification problem). In this combination of search strategy and modeling, any learning algorithm can be used.







Existing selection strategies:





Implementation: these algorithms are implemented in the mlxtend package, here is an example of use.





Inline methods



This group includes algorithms that simultaneously train the model and select features. This is usually implemented using the l1- regularizer (sparsity regularizer) or a condition that limits some of the signs.





Other examples of regularization algorithms: Lasso (implements l1 -regularization), ridge regression (implements l2- regularization), Elastic Net (implements l1- and l2- regularization). If you plot these methods graphically, you can see that Lasso regression limits the coefficients to an area of ​​a square, ridge regression delineates a circle, and Elastic Net occupies an intermediate position.





https://scikit-learn.org/stable/auto_examples/linear_model/plot_sgd_penalties.html



A comprehensive description of these algorithms is provided here .



Filters



With this approach, we evaluate the importance of attributes only on the basis of their inherent characteristics, without involving learning algorithms. These methods are faster and require less computational resources compared to wrapper methods. If there is not enough data to model a statistical correlation between features, then filters can produce worse results than wrappers. Unlike wrappers, such methods are less prone to retraining. They are widely used for working with high-dimensional data, when wrapper methods require too much computing power.



Teacher Methods





Methods without a teacher





Hybrid methods



Another way to implement feature selection is a hybrid of filters and wrappers combined in a two-phase process: first, the features are filtered by statistical properties, and then wrapping methods are applied.



Other sources



A lot of literature has been written in which the problem of the selection of characters is considered, and here we only slightly touched on the entire array of research works.



A complete list of other trait selection algorithms that I did not mention was implemented in the scikit-feature package.



Relevant features can also be determined using PLS (Partial least squares, as described in this article , or using linear dimension reduction methods, as shown here .



Translated Jet Infosystems



All Articles