Feature Selection mandatory columns
RapidMiner provides various feature selection techniques like forward selection, backward elimination, weight guided, evolutionary etc.
Very rarely there is a need to incorporate certain set of features(columns/attributes) always when you are trying various combination. This article demonstrates one of the ways to always have certain set of columns as part of feature selection.
Supposed you had columns like this and you wanted to ensure that columns a1 and a2 are always considered during your optimization steps.
To force RapidMiner workflow to do so, we can use the Set Role operator to let the optimization step ignore it first and then during the model building reincorporate it first
We will introduce a set role operator just outside the optimization step like seen below
Then in the parameter section we will select attribute name a1 and type in target role with any arbitrary string (Ignoreme in the screen shot).
If you have additional columns that you want to always use, then you can specify them using the set additional roles dialog.
Please note that the target role used is a different string. So you will need to come up wiht unique string for each column, simple solution will be to use ignoreme1, ignoreme2 , igmoreme3 .and so on
By setting up this meta data the optimization step basically always ignore this column, however the model operators etc will also ignore it.
Hence to counter this effect we need to add an additional step inside the "Optimize" operator.
We will add an additional Set Role inside the optimize step
And then change the role back to regular for the two attributes that we had given special role earlier.
As the data moves to the validation step, it will be included in the model building as well as validation step.
Please find attached example process too.
Hopefully you find this article helpful, Feel free to post comments or questions on community regarding these or other topics.