Building CHAID tree

prakash_sridharprakash_sridhar Member Posts: 8 Contributor II
edited November 2018 in Help
Hi,

I'm very new to rapid miner. Please excuse me if this question is really basic. I was trying to develop a CHAID Tree to score bank customers based on a bunch of demographic parameters. The label variable takes the values "fault" and "fine". Here is the process I followed:

1.  I created a process tree with an excel data source operator and followed by the CHAID operator.
2. I specified the file name, label and id columns in the Excel Source operator
3. Then I hit the run button to start the execution - I didn't find any field which allows you to select the variables you want in the model. I thought the CHAID operator by default would select the variables to develop the model.
4. The program executes.

Now, in the output: I did'nt find any variable entering the CHAID model. I just had 2 leaves in the output tree. No other splitting variable entered the model. I remember, when I did the same in SPSS atleast a couple of other variables entered the model.

What am I doing wrong here? How would I allow other variables to enter the model?

Your guidance will be extremely useful.

Thanks

Prakash

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,643  RM Founder
    Hi,

    I did'nt find any variable entering the CHAID model. I just had 2 leaves in the output tree. No other splitting variable entered the model. I remember, when I did the same in SPSS atleast a couple of other variables entered the model.
    At least one variable should be part of the tree because otherwise you would not get two leaves but only one. Did you try to tune some of the CHAID parameters? You could for example...
    • reduce the minimal size for split
    • reduce the minimal leaf size
    • change the confidence value for pruning
    • or even deactivate pruning totally

    I am of course assuming that data loading went well. You could check that by activating a breakpoint after the ExcelExampleSource operator and check if data looks fine to you.

    And a final remark: I know that a lot of statisticians prefer CHAID but I can generally not recommend it. It can actually be shown that the used chi squared test can easily be fooled by certain data sets and it does not properly calculate the notion of "information" like it should be desired in decision tree learning. So I would generally recommend to use the operator "DecisionTreeLearner" instead but that's just my opinion.

    Cheers,
    Ingo
  • prakash_sridharprakash_sridhar Member Posts: 8 Contributor II
    Thanks Ingo. There were a few issues with the way I had coded the nominal variables. I fixed it. Now, the Decision Tree and CHAID algorithms are working fine.

    I agree with the facts that you have mentioned about CHAID. I'm building multiple models with the same dataset to get familiar with Rapid Miner.

    Thanks 
Sign In or Register to comment.