Regression with Random Forest ?
Hi RapidMiner,
I'm doing regression with 480 input features. I tried to use Deep Learning operator but the training Root Mean Square Error is still quite high. Now I'm trying to use Random Forest because of its Random Subspace approach, but found that the Random Forest operator cannot handle numerical label. How can I deal with this?
Thank you very much for your support.
Best Regards,
phivu
Best Answers
-
earmijo Member Posts: 271 Unicorn
You cannot do it in RapidMiner unless you are willing to use R Scripts. However, the latest version of RM has a new operator Gradient Boosted Trees which is competitive with Random Forest and it can handle both numerical and polynominal labels. Explore it.
2 -
earmijo Member Posts: 271 Unicorn
Install the R Script Extension. Verify you have R installed in your computer and run the code below. I adapted the code that comes with the application to run Random Forest for a regression problem.
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" breakpoints="after" class="retrieve" compatibility="7.3.001" expanded="true" height="68" name="Retrieve Polynomial" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Polynomial"/>
<description align="center" color="blue" colored="true" width="126">Fetch example data</description>
</operator>
<operator activated="true" class="split_data" compatibility="7.3.001" expanded="true" height="103" name="Split Data" width="90" x="179" y="34">
<enumeration key="partitions">
<parameter key="ratio" value="0.5"/>
<parameter key="ratio" value="0.5"/>
</enumeration>
<description align="center" color="purple" colored="true" width="126">Split the data in a training and a test set</description>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="82" name="Learn Model" width="90" x="380" y="34">
<parameter key="script" value="# train a random Forest on the training data and return the learned model rm_main = function(data) { library(randomForest) 	Model.rf <- randomForest(label~., data =data,mtry=3,importance=FALSE,na.action=na.omit) 	return(Model.rf) } "/>
<description align="center" color="red" colored="true" width="126">Train a RandomForest model in R and return it as an R object</description>
</operator>
<operator activated="true" class="r_scripting:execute_r" compatibility="7.2.000" expanded="true" height="103" name="Apply R Model" width="90" x="514" y="238">
<parameter key="script" value="## load the trained model and apply it on the test data rm_main = function(model, data) { library(randomForest) # apply the model and build a prediction result <-predict(model, data) # add the prediction to the example set data$prediction <- result # update the meta data metaData$data$prediction <<- list(type="real", role="prediction") return(data) } "/>
<description align="center" color="red" colored="true" width="126">Apply the trained model on the test data</description>
</operator>
<connect from_op="Retrieve Polynomial" from_port="output" to_op="Split Data" to_port="example set"/>
<connect from_op="Split Data" from_port="partition 1" to_op="Learn Model" to_port="input 1"/>
<connect from_op="Split Data" from_port="partition 2" to_op="Apply R Model" to_port="input 2"/>
<connect from_op="Learn Model" from_port="output 1" to_op="Apply R Model" to_port="input 1"/>
<connect from_op="Apply R Model" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>2
Answers
Thank you Earmijo, could you elaborate more on how to use RapidMiner with R to do regression with Random Forest?
That's great, thanks!
UPDATE: As of version 8.0, Decision Tree and Random Forest can now handle numerical labels and solve regression problems.
https://docs.rapidminer.com/latest/studio/releases/changes-8.0.0.html?_ga=2.83072976.793993492.1515416834-774805979.1445867999