The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Building CHAID tree
prakash_sridhar
Member Posts: 8 Contributor II
Hi,
I'm very new to rapid miner. Please excuse me if this question is really basic. I was trying to develop a CHAID Tree to score bank customers based on a bunch of demographic parameters. The label variable takes the values "fault" and "fine". Here is the process I followed:
1. I created a process tree with an excel data source operator and followed by the CHAID operator.
2. I specified the file name, label and id columns in the Excel Source operator
3. Then I hit the run button to start the execution - I didn't find any field which allows you to select the variables you want in the model. I thought the CHAID operator by default would select the variables to develop the model.
4. The program executes.
Now, in the output: I did'nt find any variable entering the CHAID model. I just had 2 leaves in the output tree. No other splitting variable entered the model. I remember, when I did the same in SPSS atleast a couple of other variables entered the model.
What am I doing wrong here? How would I allow other variables to enter the model?
Your guidance will be extremely useful.
Thanks
Prakash
I'm very new to rapid miner. Please excuse me if this question is really basic. I was trying to develop a CHAID Tree to score bank customers based on a bunch of demographic parameters. The label variable takes the values "fault" and "fine". Here is the process I followed:
1. I created a process tree with an excel data source operator and followed by the CHAID operator.
2. I specified the file name, label and id columns in the Excel Source operator
3. Then I hit the run button to start the execution - I didn't find any field which allows you to select the variables you want in the model. I thought the CHAID operator by default would select the variables to develop the model.
4. The program executes.
Now, in the output: I did'nt find any variable entering the CHAID model. I just had 2 leaves in the output tree. No other splitting variable entered the model. I remember, when I did the same in SPSS atleast a couple of other variables entered the model.
What am I doing wrong here? How would I allow other variables to enter the model?
Your guidance will be extremely useful.
Thanks
Prakash
0
Answers
I am of course assuming that data loading went well. You could check that by activating a breakpoint after the ExcelExampleSource operator and check if data looks fine to you.
And a final remark: I know that a lot of statisticians prefer CHAID but I can generally not recommend it. It can actually be shown that the used chi squared test can easily be fooled by certain data sets and it does not properly calculate the notion of "information" like it should be desired in decision tree learning. So I would generally recommend to use the operator "DecisionTreeLearner" instead but that's just my opinion.
Cheers,
Ingo
I agree with the facts that you have mentioned about CHAID. I'm building multiple models with the same dataset to get familiar with Rapid Miner.
Thanks