The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
decision tree vs k-means
I have run a decision tree and K-means in rapidminer, however my results from the two appear to be conflicting each other. I have checked, and my methods appear to be correct.
Is there any possible reason for these contradicting results? I would just like to understand possible reasoning, so I am able to understand further how rapidminer works.
Is there any possible reason for these contradicting results? I would just like to understand possible reasoning, so I am able to understand further how rapidminer works.
0
Answers
Scott
Sorry if I'm being unclear- I'm new to rapidminer and am just trying to understand why my results from these two mechanisms are contradicting each other.
If possible, can you post your xml and sample data to check how they are contradicting? From my understanding k means will cluster the data and decision tree helps interpret the clustering. As an unsupervised algorithm k means just uses numerical data to plot and divide clusters. But the supervised algorithms like decision tree work mainly based on label and not the total data at once. They train to fit their output labels. One big difference is k means consider all attributes where as decision tree drops that are not useful in fitting the output (pruning). You can get similar output if any one attribute is highly related to output. But as @sgenzer the comparison is not suitable between these two
Thanks
Varun
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
- In this case, the DT algorithm will look at your label and then generate a set of splits from all your other attributes that best helps you to best separate the different values of the label.
- The k-nn algorithm will simply look at all your data and try to find the number of groups that you specify that are most similar (based on the similarity metric you select) across all the dimensions together. (And if you don't normalize the data and you have numerical data, it can get easily skewed, but that is another story).
I hope this helps clarify.Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts