In some use cases you may be interested in figuring out which attribute(s) is important to predict a given label. This attribute performance can be a result by itself, because it can tell you what reasons make someone or something behave in this way. In this article we will discuss common techniques to find these feature weights.
One of the most used methods to find an attribute importance is to use a statistical measure to define importance. Often used measures are Correlation, Gini Index or Information gain. In RapidMiner you can calculate these values using the Weight by Operators.
The resulting object is a weight vector. The weight vector is the central object for all feature weightening operations. If we have a look at it, it looks like this:
There are two operators which are important for the use of weight objects. Weights to Data converts this table into an example set. This can then be exported into Excel or a database.
Select by Weights allows you to select attributes using this weights. You can for example select attributes only having higher weights than 0.1 or take the k top ones.
Including Non Linear Attributes and Combinations
The Filter methods above have the problem to not incorperate non-linearities. A technique to overcome this is to generate non-linear combinations of the same attribute. The operator Generate Function Set can be used to generate things like pow(Age,2) or sqrt(Age) and combination between these. This operator is usually combined with Rename by Construction to get readable names.
Another known issue with the filter methods are dependencies between the variables. If you data set contains Age, 2xAge,3xAge and 4xAge all of them might get a hight feature weight. A technique which overcomes this issue would be MRMR. MRMR is included in RapidMiner's Feature Selection Extension.
Model Based Feature Weights
Another way to get feature weights is to use a model. Some models are able to provide a weight vector by themselves. These values are telling you how important an attribute was for the learner itself. The concrete calculation scheme is different for all learners. Weight vectors are provided by these opertors:
Generalized Linear Model
Gradient Boosted Tree
Support Vector Machine (only with linear kernel)
Logistic Regression (SVM)
It is generally advisable to tune the parameters (and choice) of these operators for a maximal prediction accuracy before taking the weights.
A special case is the Random Forest operator. A Random Forest model can be feeded into a Weight By Tree Importance operator to get a feature weight.
Feature Selection Methods
Besides Feature Weighting you can also use a Feature Selection Techniques. The difference is, that you only get a weight vector with 1 if it in the set of chosen attributes and a 0 if it is not in. The most common techniques for this are also wrapper methods namely Forward Selection and Backwards Elemination.
Polynominal Classification Problems and Clustering
In polynominal Classification problems it is often useful to do this in a one vs all fashion. This answers the question "what makes group A different from all the other"? A variation of this method includes to apply it on cluster labels to get cluster descriptions.
Evolutionary Feature Generation
Another sophistacted approach to incorparate non-linearities and also interaction terms (e.g. find a dependency like sqrt(age-weight)) is to use a evolutionary feature generation approach. The operators in RapidMiner are Yagga and Yagga2. Please have a look at the operator Info for more details.
Instead of generation a general feature weight you can also find individual dependencies for a single example. In this case you would vary the variables in an example and check the influence predicted by your model. A common use case is to check wether its worth to call a customer or not by checking his individual scoring result with and without a call.