The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

feature weightage vs domain inputs.

ThiruThiru Member Posts: 100 Guru
hi all,

When I am trying to  use  'explain predictions'  - it comes out with various weightage of features which varies with selection of algorithm as well.

For eg: going for kNN - will choose feature A, feature B, feature C, feature D  as top 3.   

1.   However my domain knowledge says feature D  is the most important one.   in that case
selection of kNN ( for which feature D is not important )  will do the job even if it gives good accuracy during training and testing?

2.  or in the above scenario - should I go for model say:  SVM - which naturally consider feature D as most important attribute ?   , but the performance of SVM  is less comparin with kNN for the given data set during training
and testing.

can I have some clarity on how to approach.. particularly when there is conflict in order of preference by weightage sugessted by explain prediction operator while comparing with domain inputs.  thanks.



  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Explain Predictions and feature weighting are diagnostic tools, and your models are tools to achieve your goal, too. Don't overestimate the precision of Explain Predictions and feature weights, a complex model will have complex interactions between attributes.

    Is it easy to or hard to get all the features at the same time without missing values? Are you interested in accuracy or in an explainable model? Might your attributes have some potential for discriminating against people? And so on. 

    Sometimes our domain knowledge betrays us or it is just too simplistic. That's why we use machine learning. A, B, C probably contain additional knowledge and they help improve the model beyond just looking at D. 

    All this said: Use the model (after proper validation) that solves your problem best, however the problem is defined.

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    keep also in mind that Explain Prediction explains the prediction, not the label. So it helps you to understand what 'the model things about the world'. If the model is a bad approximation of the world in first place it does not help.

    Also one should really think about what the result of explain prediction means.

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.