Automatic feature engineering: results interpretation
I played a bit with automatic feature engineering operator and I would like to get some points cleared; specifically, how do I interpret the result.
I have used 'balance for accuracy' = 1 and 'feature selection' option (no generation) on a dataset of 14,000 examples / 142 regular features, split 80/20 for train and test. Inside feature selection operator I used GLM learner on a numeric label (so, we have a linear regression here) and RMSE criteria for optimization.
This is the output I got in progress dialog:
In total, 5 feature sets were generated, with these fitness / complexity correspondingly:
0.408 / 142
0.408 / 62
0.410 / 59
0.458 / 55
0.466 / 50
So far, in terms of RMSE minimization, first two sets are optimal (leftmost points on the right graph). However, the first one is also identical to original set, which means ALL features were used.
- Why optimization operator still have chosen the bigger set (142) not the smaller (62), as the fitness is equal for both?
- Is there a way to make optimizer choose the most optimal set AND the smallest at the same time, in the situation like above?
- If the most accurate feature set includes all the features, does it mean that they all are contributing to predictions, so no feature can be removed without increasing the error?
- How do I interpret the left graph? I understand it shows trade-offs for error vs. complexity for different feature sets, but how exactly do I read it? Why the upper line (in blue) shows complexity 142 and error close to 0.490 (logically it's the highest error, so the 'worst' option)? On the contrary, lower line (in orange) goes around 0.408 (lowest value) but complexity is around 20? In other words, I cannot find analogy between left and right charts.