I have noticed one small thing I would count as a subject for a potention improvement in the auto-generated auto-model process.
Before training the model, there's a filtering operator applied, which is used for the followiing:
"Model on cases with label value, apply the model on cases with a missing for the target column."
So later on, 'Explain predictions' is applied to 'unm' output of this filter:
The fact is, we don't necessarily have the data in this exact format (with missing labels for the examples to actually predict). On the contrary, I see much more often a use case where a whole labeled dataset is used for auto-modelling and evaluation thru 80/20 split.
So, what I do every time I save the process from auto-model is rewiring operators in this way:
Which makes much more sense for me as I train the model on 80% of data and then apply on the remaining 20% which then also used for explaining the predictions. If I don't rewire operators I am by default getting empty result from 'Explain predictions', because this filter operator always sends 100% of data to exa port and 0% to unm port.
Do you think this can be improved in some way so the process is generated in a more flexible way, depending on the initial dataset format?