RapidMiner

Determining which attributes contribute to value of a label

Contributor II

Re: Determining which attributes contribute to value of a label

For 15 attributes:

Accuracy: 88.07% +/- 1.80% (mikro: 88.07%)

AUC: 0.941 +/- 0.005 (mikro: 0.941) (positive class: Tak)

 

For 6 top attributes:

Accuracy: 90.56% +/- 1.53% (mikro: 90.56%)

AUC: 0.934 +/- 0.007 (mikro: 0.934) (positive class: Tak)

 

However, the increase in accuracy came at the cost of reduced "Tak" class recall, so I went back to the wider attribute set.

 

 

OK, so now the model is built and I know the attribute importance, but one question remains:

How can I get to know which values make the model predict a "Tak" or a "Nie"?

 

In my example the top 2 attributes are COUNTRY_OF_RESIDENCE and PROFESSION. What are the actual countries/professions that give me a "Tak"?

 

Highlighted
Elite III

Re: Determining which attributes contribute to value of a label

If this is from the Random Forest learner, you would have to inspect the individual trees to determine that relationship.

Alternatively, you can run a Naive Bayes model on your reduced dataset with the top 16 attributes (or whatever you want to see).  While the overall model might not be that accurate, the model output provides a set of views that show the relationship between your attribute values (both numerical and nominal) and your label.

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts
Moderator

Re: Determining which attributes contribute to value of a label

If this is from the Random Forest learner, you would have to inspect the individual trees to determine that relationship.

Or use the Weight by Tree Importance Operator Smiley Happy

 

~Martin

--------------------------------------------------------------------------
Head of Data Science Services at RapidMiner
Contributor

Re: Determining which attributes contribute to value of a label

Hi good day. I am starting to learn how to use rapidminer and I want to ask what operator did you used to get this output?

 

Variable Importances:
            Variable Relative Importance Scaled Importance Percentage
COUNTRY_OF_RESIDENCE          659.554016          1.000000   0.734397
          PROFESSION          107.786240          0.163423   0.120017
       NO_OF_HR_TXNS           65.660095          0.099552   0.073111
      F2F_IDENTIFIED           16.260277          0.024653   0.018105
    COUNTRY_OF_BIRTH           12.987129          0.019691   0.014461
         NATIONALITY           10.082042          0.015286   0.011226
       ANNUAL_INCOME            9.467501          0.014354   0.010542
      OLDEST_ACCOUNT            7.385976          0.011198   0.008224
      NO_OF_ACCOUNTS            6.033924          0.009148   0.006719
           CITY_SIZE            2.696423          0.004088   0.003002
           CUST_TYPE            0.175396          0.000266   0.000195
      MARITAL_STATUS            0.000000          0.000000   0.000000
                 SEX            0.000000          0.000000   0.000000
         HR_CASHFLOW            0.000000          0.000000   0.000000
                 AGE            0.000000          0.000000   0.000000

thank you!



Elite III

Re: Determining which attributes contribute to value of a label

It comes from the model output of the Gradient Boosted Trees operator.

 

Brian T., Lindon Ventures - www.lindonventures.com
Analytics Consulting by Certified RapidMiner Analysts