Decision tree, random forest and classification of data set
Hi All,
I am new to the rapid miner. Could some one please help me to create a decision tree and random forest (got 1 target attribute and 12 parameters influencing it). Also I need to classify the data (with regression) based on the output. The main objective is to check whether a single parameter or a combination of 2 or 4 or 5 parameters significantly or moderately influences the the main target attribute ? The data is attached for your reference. I tried working on selecting attributes, set roles but got some errors like missing labels and parameter missing.
Thanks,
Gopal
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi Gopal,
It seems there is a problem with your XML code : It cannot be loaded. Can you verify it.
Meanwhile, you can find an example of process including a decision tree model with your data :
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Decision_tree_basic\GP.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="1.true.real.attribute"/>
<parameter key="1" value="2.true.real.attribute"/>
<parameter key="2" value="3.true.real.attribute"/>
<parameter key="3" value="4.true.integer.attribute"/>
<parameter key="4" value="5.true.integer.attribute"/>
<parameter key="5" value="6.true.integer.attribute"/>
<parameter key="6" value="7.true.integer.attribute"/>
<parameter key="7" value="8.true.integer.attribute"/>
<parameter key="8" value="9.true.real.attribute"/>
<parameter key="9" value="10.true.real.attribute"/>
<parameter key="10" value="11.true.real.attribute"/>
<parameter key="11" value="12.true.real.attribute"/>
<parameter key="12" value="Main attribute.true.real.attribute"/>
<parameter key="13" value="13.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="set_role" compatibility="8.1.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="34">
<parameter key="attribute_name" value="Main attribute"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="8.1.000" expanded="true" height="145" name="Cross Validation" width="90" x="514" y="34">
<process expanded="true">
<operator activated="true" class="concurrency:parallel_decision_tree" compatibility="8.1.000" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="34">
<parameter key="criterion" value="least_square"/>
</operator>
<connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
<connect from_op="Decision Tree" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_regression" compatibility="8.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34">
<parameter key="correlation" value="true"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="model" to_port="result 2"/>
<connect from_op="Cross Validation" from_port="example set" to_port="result 1"/>
<connect from_op="Cross Validation" from_port="performance 1" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>I hope it helps,
Regards,
Lionel
1
Answers
@g_pawar please post your XML code too using the </> button. See the Read Before Posting instructions to your right.
Hi Thomas,
Thanks for the reply. Please find the code.
Cheers
Gopal
Thanks Lionel. Now its working.
Regards,
Gopal