The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Question regarding generating Decision tree using RapidMiner tool
Hello everyone,
I have a question regarding on how to properly generate a decision tree using rapid miner tool. This question is related on picking the right label attribute, and as well as on how to actually generate a tree which makes sense? I've got a specific data set which I load into the "Read excel" operator, pick the label attribute I want, which I connect then to the "Decision tree" operator in rapid miner. This is how it looks like in the end:
But the resulting decission tree is either too small, too big or its not showing at all what I wanted to represent it... Is there any way that I can "force" the algorithm to branch off each time on specific column I tell it to? Something like this:
If the outlook is overcast, person X will play golf. If the outlook is rain, but if its windy, person X won't play golf, otherwise person X will play golf.
I'm quite new with data mining, and every explanation would be really nice on how can I generate a proper decision tree that will actually look like something that is readable...
Thanks a lot!
I have a question regarding on how to properly generate a decision tree using rapid miner tool. This question is related on picking the right label attribute, and as well as on how to actually generate a tree which makes sense? I've got a specific data set which I load into the "Read excel" operator, pick the label attribute I want, which I connect then to the "Decision tree" operator in rapid miner. This is how it looks like in the end:
But the resulting decission tree is either too small, too big or its not showing at all what I wanted to represent it... Is there any way that I can "force" the algorithm to branch off each time on specific column I tell it to? Something like this:
If the outlook is overcast, person X will play golf. If the outlook is rain, but if its windy, person X won't play golf, otherwise person X will play golf.
I'm quite new with data mining, and every explanation would be really nice on how can I generate a proper decision tree that will actually look like something that is readable...
Thanks a lot!
Tagged:
0
Answers
i think you need to get more into the thinking of data mining. As a data scientist your are not necessarly intersted in how this tree looks like (sometimes you are, but on a rather high level).
So what you would do is run a validation, calculate a performance measure and optimize the parameters of the tree to get the best results.
I would recommend our getting started tutorials toyou: http://docs.rapidminer.com/studio/getting-started/
~Martin
Dortmund, Germany
Best regards
P.S. Does the "Decision tree" operator includes algorithms within itself? I'd like to use the CART algorithm to generate the tree, but I'm not exactly sure how to do that...
about the lookalike: Well - as I said as a data scientist you are interested in performance, not necessarly in understandablity. Most advanced algorithms (SVM, Neural Net, Random Forest) are hard to represent at all. Your way is a rather explorative way, which is fine, but a bit different in thinking. The explorative phase might be something you do before you start actual modelling.
On the types: The types are called role. Every coloum can have one role. An id is used e.g. in Joining, a Cluster is a result of cluster algorithm. You can by the way type in any word in that field. The result is a special attributes. Those are useful, because all special attributes are ignored by operators unless you either specifically tell them to use them (use special attributes) or they need them to do their job (label for learners).
On CART: The standard RapidMiner Decision Tree is a own implementation. I think it is close to CART if you only use numerical values or something. If you want to have a "real" cart, you need to use the Weka package. There is W-SimpleCart as well as W-J4.8, which is the C4.5 implementation.
~Martin
Dortmund, Germany
- attribute
- label
- id
- weight
- batch
- cluster
- prediction
- outlier
- cost
- base_value
I understand that label colum is the way of saying to RapidMiner that that's the column that I wanna build the model to and understand. But how to tell RapidMiner what's the attribute from which I want to start branching off? This is what confuses me mostly...
Dortmund, Germany
This isn't a tree, it's a rectangle with a label "M" inside it lol...
~Martin
Dortmund, Germany
As a data scientist i would measure how accurate my model is. But keep in mind: It is a big difference to describe data well or predict future data.
~Martin
Dortmund, Germany
Theres criterion: information gain, gini indeks, gain ratio, accuracy? I guess I'd need to pick accuracy in my case then?
H
PS: Just one more question guys... What algorithms does the regular "Decision tree" operator in RapidMiner uses? I've read somewhere that it's a combination of CHAID and ID3 algorithms?
Thanks a lot once again!