Polynomial regression: Excel vs. Rapid Miner
Hi All,
I tried to understand the polynomial Regression within Rapid Miner.
I know this from Excel and there it works like this:
For example I have a Polynom
y = 1*x^3+2*x^2+3*x+20
xValues:1,2,3,4,5,6,7,8,9,10
yValues:26,42,74,128,210,326,482,684,938,1250
This is the Training Set.
The test Set with the same x_Values
xValues_New:1,2,3,4,5,6,7,8,9,10
should return the same prediction for Y.
In Excel the Function
=TREND(yValues;xValues^{1.2.3};xValues_New^{1.2.3})
returns the expected values 26, 42, 74, 128,.........
The function =RGP(yValues;xValues^{1.2.3})
returns the coefficents of the Polynom
1,2,3,20
Is it possible to rebuild this in RapidMiner with getting the same results?
I setup this process with completly different results. Can someone explain me, what's wrong?
By the way, the same stuff with a linear function or quadratic function works. (Is a function third degree to much for Rapid Miner?)
<?xml version="1.0" encoding="UTF-8"?><process version="7.3.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.3.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="generate_data" compatibility="7.3.001" expanded="true" height="68" name="Generate Data" width="90" x="45" y="85">
<parameter key="target_function" value="random"/>
<parameter key="number_examples" value="10"/>
<parameter key="number_of_attributes" value="1"/>
<parameter key="attributes_lower_bound" value="-10.0"/>
<parameter key="attributes_upper_bound" value="10.0"/>
<parameter key="gaussian_standard_deviation" value="10.0"/>
<parameter key="largest_radius" value="10.0"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="generate_id" compatibility="7.3.001" expanded="true" height="82" name="Generate ID" width="90" x="45" y="187">
<parameter key="create_nominal_ids" value="false"/>
<parameter key="offset" value="0"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.3.001" expanded="true" height="82" name="Generate Attributes" width="90" x="45" y="289">
<list key="function_descriptions">
<parameter key="X" value="id"/>
<parameter key="Y" value="id^3+2*id^2+3*id+20"/>
</list>
<parameter key="keep_all" value="true"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.3.001" expanded="true" height="82" name="Set Role" width="90" x="45" y="391">
<parameter key="attribute_name" value="Y"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.3.001" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="493">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="X|Y"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="polynomial_regression" compatibility="7.3.001" expanded="true" height="82" name="Polynomial Regression" width="90" x="313" y="391">
<parameter key="max_iterations" value="500000"/>
<parameter key="replication_factor" value="3"/>
<parameter key="max_degree" value="3"/>
<parameter key="min_coefficient" value="-100.0"/>
<parameter key="max_coefficient" value="100.0"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
<operator activated="true" class="apply_model" compatibility="7.3.001" expanded="true" height="82" name="Apply Model" width="90" x="581" y="391">
<list key="application_parameters"/>
<parameter key="create_view" value="false"/>
</operator>
<connect from_op="Generate Data" from_port="output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Polynomial Regression" to_port="training set"/>
<connect from_op="Polynomial Regression" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Polynomial Regression" from_port="exampleSet" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Best Answer
-
Thomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn
I think the operator you're looking for the is the Local Polynominal Regression operator.
0
Answers
Thanks a lot.
Thats exactly what I needed!
:-)
Kind regards
Jens