RapidMiner

RapidMiner

Determining which attributes contribute to value of a label

Contributor II

Re: Determining which attributes contribute to value of a label

For 15 attributes:

Accuracy: 88.07% +/- 1.80% (mikro: 88.07%)

AUC: 0.941 +/- 0.005 (mikro: 0.941) (positive class: Tak)

 

For 6 top attributes:

Accuracy: 90.56% +/- 1.53% (mikro: 90.56%)

AUC: 0.934 +/- 0.007 (mikro: 0.934) (positive class: Tak)

 

However, the increase in accuracy came at the cost of reduced "Tak" class recall, so I went back to the wider attribute set.

 

 

OK, so now the model is built and I know the attribute importance, but one question remains:

How can I get to know which values make the model predict a "Tak" or a "Nie"?

 

In my example the top 2 attributes are COUNTRY_OF_RESIDENCE and PROFESSION. What are the actual countries/professions that give me a "Tak"?

 

Highlighted
Elite II

Re: Determining which attributes contribute to value of a label

If this is from the Random Forest learner, you would have to inspect the individual trees to determine that relationship.

Alternatively, you can run a Naive Bayes model on your reduced dataset with the top 16 attributes (or whatever you want to see).  While the overall model might not be that accurate, the model output provides a set of views that show the relationship between your attribute values (both numerical and nominal) and your label.

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
RMStaff

Re: Determining which attributes contribute to value of a label

If this is from the Random Forest learner, you would have to inspect the individual trees to determine that relationship.

Or use the Weight by Tree Importance Operator Smiley Happy

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner