The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Decision tree using Auto model
Arupriya_Sen
Member Posts: 21 Contributor II
I am doing Fraud Detection Analysis with Decision Tree using Auto Model in RapidMiner version 9.3.0. The dataset and screenshots are attached below. Instead of getting a nice tree, I am getting just a single good leaf. This means that the Auto Model is predicting all the examples of the label attribute to be good. they are not even showing the decision options. How do I get a nice proper tree? can anyone help me with this?
Tagged:
0
Best Answers
-
varunm1 Member Posts: 1,207 UnicornI did check the data and see that your tree is pruned a lot. Please see below screenshots for comparison. To get to the model in automodel, see below screenshot. You need to select model and click open process. Play with pruning parameters and see the results of decision tree model as shown below.
With Pruning (Default in Automodel): Performance AUC 0.5
Without pruning (need to remove manually). Performance AUC 0.59, but the tree is big.
Hope this helps.
Varun
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
2 -
varunm1 Member Posts: 1,207 UnicornHello @Arupriya_Sen
Are you talking about maximal depth? If so, to manually change that you need to deselect Automatically optimize in auto model before selecting RUN. This is setting the tree depth to 25.
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5 -
varunm1 Member Posts: 1,207 UnicornAuto model splits data into 60:40 ration amd use 60 percent for training and 40 percent for testing, this is the reason you dont see all the data in confusion matrix as only 40% is tested.Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5 -
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founderthe Auto Model is not even taking into consideration all the 999 examples. Isn't that wrong? The True bad are 23 and 65 and the true good are 17 and 180.The predictions you are seeing (in the Prediction tab as well as in the confusion matrix under Performance) are for the 40% hold-out set. Please search the community for more details on this, this has been discussed before multiple times actually ;-)Can you also help me with the Predictions of the decision tree under the Prediction Tab? there are various colours used which I can't interpret.Green means that this value is supporting the prediction of this row, red means that a value is contradicting the prediction of the row. The darker the color, the stronger the support / contradiction is.Hope this helps,
Ingo6 -
varunm1 Member Posts: 1,207 UnicornHello @Arupriya_Sen
One thing you should be careful in analyzing the predictions tab is that the supporting and contradicting are related to prediction column only. It means that a strong supporting predictor for a wrong prediction is actually bad and vice versa.
Hope this helps.Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5 -
varunm1 Member Posts: 1,207 UnicornHello @Arupriya_Sen
The first screenshot related to date and time error is because the input attribute that is going into the Extract day of month operator is not in date time format. You can check this by double-clicking your data and then check in statistics if the attribute is of type Date or not. If they are either numerical or nominal type, then you can use "Numerical to Date" or "Nominal to Date" operators to convert them into date type.
The second screenshot is a warning and nothing more. Here is relevant discussion related to that.
https://community.rapidminer.com/discussion/12465/parameter-repository-entry-accesses-a-repository-by-name-wesseldoc-data1
The third screenshot, I didn't understand as I can't see an error. If you just round off warning symbols its really tough for us understand the issue. Run it and check the log, you can access log from VIEW --> SHOW PANEL --> LOG.
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5
Answers
Due to your status level on the community, we are unable to see the screenshots. @sgenzer can help you post screenshots.
Coming to the auto model question, did you check the process to see if the decision tree parameter pruning is selected. If so, it might be the reason your tree is pruned a lot and some time all. To open the process you need to select a model under a decision tree in the left bar and then click open process on the top.
Also please do not post duplicate questions on the community, they might get tagged as duplicate. You can modify your question using an edit option (Click on wheel icon and edit).
https://community.rapidminer.com/discussion/55732/decision-tree-using-auto-model#latest
Thanks
Varun
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
The 4th and 1st ones are for the same operators. The rest of the two are for different operators.
For example about the path locations of your datasets---if they are not set as relative paths, then if you move or share the process, the links will not work properly.
Or the error about the attribute name may be because you don't have any such attribute in the dataset, or the metadata for that attribute has not yet propagated.
In general, if your process runs and completes and gives you the expected output, then these types of warning errors can be disregarded. Of course if your process doesn't complete, then you have a serious error that needs resolution.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts