The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Regression Trees in RapidMiner 5 Community Edition"
christian1983
Member Posts: 11 Contributor II
Hello to everybody,
I am working on my master thesis dealing with data mining aspects, so I began to learn using RapidMiner 5.0.
But there are lot of problems I´m facing, so I hope getting help in this forum.
My problem is to use decision trees to predict quantitative values, so i have to use trees being able to handle numerical labels, called regression trees.
Although RapidMiner 5.0 provide lots of different types of decsion trees being described to be regression trees, they can not handle numerical labels, so i´m a little bit confused about that.
Here an excerpt of my data to be analyzed:
0,0050 0,0413 0,0610 0,01 0,01 0,01 0,120
0,0050 0,0413 0,0610 0,01 0,01 0,01 0,121
0,0050 0,0413 0,0610 0,01 0,01 0,01 0,127
0,0037 0,0467 0,0913 0,01 0,01 0,01 0,099
0,0037 0,0467 0,0913 0,01 0,01 0,01 0,094
0,0037 0,0467 0,0913 0,01 0,01 0,01 0,127
0,0030 0,0363 0,0600 0,01 0,01 0,01 0,097
0,0030 0,0363 0,0600 0,01 0,01 0,01 0,101
0,0030 0,0363 0,0600 0,01 0,01 0,01 0,087
0,0030 0,0370 0,0593 0,01 0,01 0,01 0,038
0,0030 0,0370 0,0593 0,01 0,01 0,01 0,058
0,0030 0,0370 0,0593 0,01 0,01 0,01 0,038
0,0197 0,3550 0,8407 0,03 0,14 0,056 0,100
0,0197 0,3550 0,8407 0,03 0,14 0,056 0,096
Sorry for the bad layout.
The description of the decison trees aimded to be used, tells the following:
"This operator learns decision trees from both nominal and numerical data. Decision trees are powerful classification methods which often can also easily be understood. This decision tree learner works similar to Quinlan's C4.5 or CART.
The actual type of the tree is determined by the criterion, e.g. using gain_ratio or Gini for CART / C4.5."
This decision tree working similar to the CART (Classification by regression), but can not handle numerical label.
I hope you can help me.
Thank you.
[/table]
I am working on my master thesis dealing with data mining aspects, so I began to learn using RapidMiner 5.0.
But there are lot of problems I´m facing, so I hope getting help in this forum.
My problem is to use decision trees to predict quantitative values, so i have to use trees being able to handle numerical labels, called regression trees.
Although RapidMiner 5.0 provide lots of different types of decsion trees being described to be regression trees, they can not handle numerical labels, so i´m a little bit confused about that.
Here an excerpt of my data to be analyzed:
input 1 input 2 input 3 input 4 input 5 input6 label |
0,0050 0,0413 0,0610 0,01 0,01 0,01 0,121
0,0050 0,0413 0,0610 0,01 0,01 0,01 0,127
0,0037 0,0467 0,0913 0,01 0,01 0,01 0,099
0,0037 0,0467 0,0913 0,01 0,01 0,01 0,094
0,0037 0,0467 0,0913 0,01 0,01 0,01 0,127
0,0030 0,0363 0,0600 0,01 0,01 0,01 0,097
0,0030 0,0363 0,0600 0,01 0,01 0,01 0,101
0,0030 0,0363 0,0600 0,01 0,01 0,01 0,087
0,0030 0,0370 0,0593 0,01 0,01 0,01 0,038
0,0030 0,0370 0,0593 0,01 0,01 0,01 0,058
0,0030 0,0370 0,0593 0,01 0,01 0,01 0,038
0,0197 0,3550 0,8407 0,03 0,14 0,056 0,100
0,0197 0,3550 0,8407 0,03 0,14 0,056 0,096
Sorry for the bad layout.
The description of the decison trees aimded to be used, tells the following:
"This operator learns decision trees from both nominal and numerical data. Decision trees are powerful classification methods which often can also easily be understood. This decision tree learner works similar to Quinlan's C4.5 or CART.
The actual type of the tree is determined by the criterion, e.g. using gain_ratio or Gini for CART / C4.5."
This decision tree working similar to the CART (Classification by regression), but can not handle numerical label.
I hope you can help me.
Thank you.
[/table]
Tagged:
0
Answers
Interesting! Just to make sure we are all singing from the same song book, here's some stuff from Wikipedia on Predictive analytics.. So anything that can generate a rule about a number (<,=,>) could be your learner, it could even be a group of learners, what matters is the testing arrangement which applies the rule and checks the result. You will see that RM has a sensible array of operators to do this, bin makers, learners, validators, and genetic parameter optimisers. So you could build a template layout where all you have to do is add the learners to test as parameters, but....
The 'but' is about overtraining. At what stage do you decide that enough is enough? How do you decide that? Just how much data remains unseen, and what data, and why? It shouldn't be too difficult to construct a general purpose testing rig, which would expose the underlying issue... When, exactly, is a pattern really a pattern?
You should have a lot of fun with this, hope so!
RapidMiner does not support a RegressionTree itself. You could use the one of weka, assuming, Weka has one. You could write a RegressionTree yourself and contribute it, what I would prefer, as you might imagine
Greetings,
Sebastian
First of all, thank you for your quick reply.
Maybe my problem was not described well by me, but my question refers to the problem, that the provided decision trees in RM 5.0 are not able to handle numerical label, although they belong to the group Modeling.ClassificationandRegression.Tree.
Actually they must deal with numerical labels in order to make prediction based on Regression Tree Algorithm CART.
I hope, my problem is clear now.
Thank you.
i'm afraid your problem is clear. But so is the answer: RapidMiner has NO learner for regression trees. All tree learners in RapidMiner are classification trees. Nevertheless these trees can handle numeric attributes but not numeric label.
By the way, the group Modeling.ClassificationandRegression contains all learners which are suitable for classification and/or regression.
Concerning the CART algorithm you are right, this also computes regression trees (as far as i know). But as your quote states: If your master thesis is not directly about regression trees yous could use some of RapidMiners regresion learners.
Best regards and good luck with you thesis,
chero
Chero ( greets Chero ) is right in my view, the trick here is that you can use classifiers to predict numeric values, but not if you treat them as a continuous range. Effectively you break the range into steps which cover the spectrum, so you are approximating that range. Perhaps the point of my post emerges ?
You don't have to take my word for it...
http://www.dtreg.com/classregress.htm
http://www.resample.com/xlminer/help/rtree/rtree_intro.htm
http://www.cscu.cornell.edu/news/statnews/stnews62.pdf
and 00'0's more ....
To summarise, no classification algorithms handle continuous labels, but that does not mean to say that the relationship between continuous variables cannot be investigated by classification algorithms.
and yes, someday someone will add the missing AR to the CART. In fact we once had a regression tree, but it died in young age and I had to bury one of my first creations with RapidMiner...The only sign left of this little class is this group name...
Greetings,
Sebastian
Christian - If you need an open source implementation - check out rpart in R. Else, commerical products like Clementine for example have a true RT (as part of CART or CHAID).
by the way: I'm currently working on R integration for RapidMiner. With this extension, another Regression Tree would be available from within RapidMiner.
And a last note on commercial products: Before buying a commercial product like Clementine because one or two operators are missing, I would suggest contacting us. I'm quite sure we can include that operators for less than half the money Clementine would cost you...
Greetings,
Sebastian