RM 9.1 feedback : Let's talk of the new Automatic Feature Engineering (FS)
First I wish to all readers a Merry Christmas.
Now sit comfortably because this is going to be a long post...!
1/ (Little) error of inconsistency between the chart and the table of the Feature Sets in Auto-Model :
To reproduce this error :
- Click on Auto Model
- Select the Titanic dataset
- Enable the Decision Tree model (disable all the other models) and of course, enable AFS (with accurate option)
- Run the model
It seems that it is just a coding error which causes a "shift" between the chart and the table of feature sets.
2 / Feedback RM 9.1 : My suggestions about AFE implemented in Auto-Model
2.1 Plot the optimal Trade-Offs around the optimum :
"... a small diagram is better than a long speech..."
If I good understood, the selected feature set minimize the "distance " between the selected point and the origin of the chart (0,0).
So to visualize that the selected feature sets is effectivly the optimal feature set, it can be a good idea to continue to plot
the optimal Trade Offs . If I'm wrong, thank you to correct me.
2.2 Precise the definition of "fitness".
If I good understand, the fitness (displayed as IOObjectCollection at the exit of the AFE operator) is assimilated to the error : It can be a good idea
to precise the definition of this notion. Once again, If I'm wrong thank you to correct me.
2.3 Set apply pruning and/or prepruning to FALSE for DT model (and similar models - RF...) when AFS and/or Optimize Parameters are enabled :
I will try to describe my reasoning : From my point of view , when we choose willingly to search the optimal feature set (via AFE) and/or the best combinaison of parameters (in case of DT, the best k), we expect the best possible model by sacrificing some time (I understood that RM users are not very patient...). So it is to obtain a model effectivly using the found optimal feature set and/or effectivly with the optimal combinaison of parameters.
Once again take the example of Titanic with DT (Automatically Optimize enabled) and AFE enabled.
After execution, RM concludes that :
- Feature set = 4
- k (max depth) = 4
with apply pruning and/or prepruning to FALSE, we obtain a model with effectivly 4 Features and a k = 4 which has an accuracy of 96.53% .
By default, in Auto-Model, apply pruning and/or prepruning are set to TRUE, so in practice we obtain a simple model with k = 1 and using one feature :
and with a (relativ) bad accuracy of 77,87 % so in fine , by default, we lose all interest in having realized a feature selection and / or a parametric optimization.
So my conclusion is to set for DT (and assimilated models) "apply pruning" and/or "apply prepruning" to "FALSE" when the user choose to set "Automatic Feature Selection" to "enabled".
What do you think about all these items ?
Thanks you for your patience and your listening and good luck to the RM staff for the next release of RapidMiner...