The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
different values for regressionPerformance for the same data
Legacy User
Member Posts: 0 Newbie
Hallo,
I have the problem, that I get different values for regressionPerformance for the attribute.
I have used the model1 (with featureselection) and model 2 (without featureselection - but only with attributefilter
Attribut
Model1: att3 root_mean_sqared_error 0.334 squared_correlaton 10.651
Model2: att3 root_mean_sqared_error 0.326 squared_correlaton 11.189
???
The same attribute (e.g. att3) has different value for regressionPerformance in both models. Can anyone tell me why?
Model 1
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator" breakpoints="after">
<parameter key="target_function" value="sum"/>
</operator>
<operator name="FS" class="FeatureSelection" expanded="yes">
<parameter key="user_result_individual_selection" value="true"/>
<parameter key="keep_best" value="64"/>
<parameter key="maximum_number_of_generations" value="1"/>
<operator name="BootstrappingValidation" class="BootstrappingValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="Applier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="main_criterion" value="squared_correlation"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="squared_correlation" value="true"/>
</operator>
</operator>
</operator>
</operator>
</operator>
Model 2
<operator name="Root" class="Process" expanded="yes">
<operator name="Daten laden und vorbereiten" class="OperatorChain" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="sum"/>
</operator>
</operator>
<operator name="Attribute identifizieren, Ranking, Correalation" class="OperatorChain" expanded="yes">
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="att3"/>
</operator>
<operator name="BootstrappingValidation" class="BootstrappingValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier (2)" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="true"/>
<parameter key="relative_error" value="true"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="skip_undefined_labels" value="false"/>
<parameter key="use_example_weights" value="false"/>
</operator>
</operator>
</operator>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
</operator>
I have the problem, that I get different values for regressionPerformance for the attribute.
I have used the model1 (with featureselection) and model 2 (without featureselection - but only with attributefilter
Attribut
Model1: att3 root_mean_sqared_error 0.334 squared_correlaton 10.651
Model2: att3 root_mean_sqared_error 0.326 squared_correlaton 11.189
???
The same attribute (e.g. att3) has different value for regressionPerformance in both models. Can anyone tell me why?
Model 1
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator" breakpoints="after">
<parameter key="target_function" value="sum"/>
</operator>
<operator name="FS" class="FeatureSelection" expanded="yes">
<parameter key="user_result_individual_selection" value="true"/>
<parameter key="keep_best" value="64"/>
<parameter key="maximum_number_of_generations" value="1"/>
<operator name="BootstrappingValidation" class="BootstrappingValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
</operator>
<operator name="ApplierChain" class="OperatorChain" expanded="yes">
<operator name="Applier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="main_criterion" value="squared_correlation"/>
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="squared_correlation" value="true"/>
</operator>
</operator>
</operator>
</operator>
</operator>
Model 2
<operator name="Root" class="Process" expanded="yes">
<operator name="Daten laden und vorbereiten" class="OperatorChain" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="sum"/>
</operator>
</operator>
<operator name="Attribute identifizieren, Ranking, Correalation" class="OperatorChain" expanded="yes">
<operator name="AttributeFilter" class="AttributeFilter">
<parameter key="condition_class" value="attribute_name_filter"/>
<parameter key="parameter_string" value="att3"/>
</operator>
<operator name="BootstrappingValidation" class="BootstrappingValidation" expanded="yes">
<parameter key="keep_example_set" value="true"/>
<parameter key="create_complete_model" value="true"/>
<operator name="LinearRegression" class="LinearRegression">
<parameter key="feature_selection" value="none"/>
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="ModelApplier (2)" class="ModelApplier">
<list key="application_parameters">
</list>
</operator>
<operator name="RegressionPerformance" class="RegressionPerformance">
<parameter key="root_mean_squared_error" value="true"/>
<parameter key="absolute_error" value="true"/>
<parameter key="relative_error" value="true"/>
<parameter key="correlation" value="true"/>
<parameter key="squared_correlation" value="true"/>
<parameter key="skip_undefined_labels" value="false"/>
<parameter key="use_example_weights" value="false"/>
</operator>
</operator>
</operator>
</operator>
<operator name="ModelApplier" class="ModelApplier">
<parameter key="keep_model" value="true"/>
<list key="application_parameters">
</list>
</operator>
</operator>
0
Answers
the quick answer is: Because you have two different processes. Even another usage order of random numbers can affect performance. You could use local_random_seeds to avoid this.
Greetings,
Sebastian
I don't no what you are meaning with local_random_seeds.
I have only integrate in model 1 the featureselection. I think, that is a posibility to test alle attributes itself an with combination to find out the best fit
with a linear model. But this is not a random process itself.
I will find out, what are the best attributes for prediction the label. And for this I gues the performance criteria - like the squared-corellation and the root-mean-squared-error.
best regards
Angela
That is also a feature of RapidMiner, which makes it so special.
But please read my entire question ::)
Even a random process should not alter the quality (parameter of the regressionperformance) of each value.
I therefore assume that I can not compare parameter of the regressionsperformance for specific attributes in 2 different modells.
best regards
of course a random sampling of examples affects the measured quality. And a random sampling is done by the BootstrappingValidations. Without the same random number sequence, it is not guaranteed that the same examples are selected. For example if one example which can be perfectly matched is not selected, but a outlier is selected twice, this will affect the performance heavily.
I would recommend using local random seed on your bootstrappingValidations, this should do the trick.
Greetings,
Sebastian
many thanks for this answer. I have change the local_random_seed from: -1 to other values 1, 10,100 but I get the same values for
squared_correlation for the attributes.
But I found a other way to get the correct squared_correlation from the imfortance values.
Manys thanks for your help.
Angela