The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
RM 9.1 feedback : Let's talk of the new Automatic Feature Engineering (FS) - Part 2
lionelderkrikor
RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi,
This topic of feature selection definitely inspires me :
1/ Optimize Selection (Evolutionary) operator vs AFE operator :
If I good understand, AFE operator is using an evolutionnary algorithm, so we must, a priori, find the same results with the 2 operators.
It is not the case. For example, here the results with the Titanic dataset and a DT model :
- with OS (Evol) ==> acc = 81,20 % / feature set = 5 features
- with ASE (with "balance for accuracy" = 1)==> acc= 79,07 % / feature set = 1 feature
Why ASE did not conclude the same feature set and in fine obtains the same performance ?
2/ Unexpected results with the "balance for accuracy" parameter of the AFE operator :
Always with the Titanic dataset / DT model :
When we set "Balance for accuracy" = 0 (so we expect the simplier feature set) , we obtain the ......original dataset ! :
and when we set "Balance for accuracy" = 1 , we obtain :
Why this last feature set is not obtained with "balance for accuracy" = 0 ? From my point of view, the resulting feature sets are not
consistent with the value of "balance for accuracy" parameter...
3/ The tutorial associated to the AFE operator is broken : there are missing links between some operators...
4/ Performance output port of AFE ::
There is a performance output port inside the AFE operator
but there is no performance output port outside the operator :
Is there any reason to that ? maybe, in practice, the AFE need to be itself cross-validated ?
In conclusion, can you provide some clarifications to all these items ?
Thanks you for your listening,
Regards,
Lionel
NB : The process :
This topic of feature selection definitely inspires me :
1/ Optimize Selection (Evolutionary) operator vs AFE operator :
If I good understand, AFE operator is using an evolutionnary algorithm, so we must, a priori, find the same results with the 2 operators.
It is not the case. For example, here the results with the Titanic dataset and a DT model :
- with OS (Evol) ==> acc = 81,20 % / feature set = 5 features
- with ASE (with "balance for accuracy" = 1)==> acc= 79,07 % / feature set = 1 feature
Why ASE did not conclude the same feature set and in fine obtains the same performance ?
2/ Unexpected results with the "balance for accuracy" parameter of the AFE operator :
Always with the Titanic dataset / DT model :
When we set "Balance for accuracy" = 0 (so we expect the simplier feature set) , we obtain the ......original dataset ! :
and when we set "Balance for accuracy" = 1 , we obtain :
Why this last feature set is not obtained with "balance for accuracy" = 0 ? From my point of view, the resulting feature sets are not
consistent with the value of "balance for accuracy" parameter...
3/ The tutorial associated to the AFE operator is broken : there are missing links between some operators...
4/ Performance output port of AFE ::
There is a performance output port inside the AFE operator
but there is no performance output port outside the operator :
Is there any reason to that ? maybe, in practice, the AFE need to be itself cross-validated ?
In conclusion, can you provide some clarifications to all these items ?
Thanks you for your listening,
Regards,
Lionel
NB : The process :
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="85"> <parameter key="repository_entry" value="//Samples/data/Titanic"/> </operator> <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="246" y="85"> <parameter key="attribute_filter_type" value="subset"/> <parameter key="attribute" value=""/> <parameter key="attributes" value="Ticket Number|Name|Cabin"/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="true"/> <parameter key="include_special_attributes" value="false"/> </operator> <operator activated="true" class="set_role" compatibility="9.1.000" expanded="true" height="82" name="Set Role" width="90" x="380" y="85"> <parameter key="attribute_name" value="Survived"/> <parameter key="target_role" value="label"/> <list key="set_additional_roles"/> </operator> <operator activated="true" class="multiply" compatibility="9.1.000" expanded="true" height="103" name="Multiply" width="90" x="514" y="85"/> <operator activated="true" class="optimize_selection_evolutionary" compatibility="9.1.000" expanded="true" height="103" name="Optimize Selection (Evolutionary)" width="90" x="648" y="85"> <parameter key="use_exact_number_of_attributes" value="false"/> <parameter key="restrict_maximum" value="false"/> <parameter key="min_number_of_attributes" value="1"/> <parameter key="max_number_of_attributes" value="1"/> <parameter key="exact_number_of_attributes" value="1"/> <parameter key="initialize_with_input_weights" value="false"/> <parameter key="population_size" value="5"/> <parameter key="maximum_number_of_generations" value="30"/> <parameter key="use_early_stopping" value="false"/> <parameter key="generations_without_improval" value="2"/> <parameter key="normalize_weights" value="true"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="user_result_individual_selection" value="false"/> <parameter key="show_population_plotter" value="false"/> <parameter key="plot_generations" value="10"/> <parameter key="constraint_draw_range" value="false"/> <parameter key="draw_dominated_points" value="true"/> <parameter key="maximal_fitness" value="Infinity"/> <parameter key="selection_scheme" value="tournament"/> <parameter key="tournament_size" value="0.25"/> <parameter key="start_temperature" value="1.0"/> <parameter key="dynamic_selection_pressure" value="true"/> <parameter key="keep_best_individual" value="false"/> <parameter key="save_intermediate_weights" value="false"/> <parameter key="intermediate_weights_generations" value="10"/> <parameter key="p_initialize" value="0.5"/> <parameter key="p_mutation" value="-1.0"/> <parameter key="p_crossover" value="0.5"/> <parameter key="crossover_type" value="uniform"/> <process expanded="true"> <operator activated="true" class="concurrency:cross_validation" compatibility="9.1.000" expanded="true" height="145" name="Cross Validation" width="90" x="313" y="34"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="10"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.1.000" expanded="true" height="103" name="Decision Tree" width="90" x="179" y="85"> <parameter key="criterion" value="gain_ratio"/> <parameter key="maximal_depth" value="10"/> <parameter key="apply_pruning" value="true"/> <parameter key="confidence" value="0.1"/> <parameter key="apply_prepruning" value="true"/> <parameter key="minimal_gain" value="0.01"/> <parameter key="minimal_leaf_size" value="2"/> <parameter key="minimal_size_for_split" value="4"/> <parameter key="number_of_prepruning_alternatives" value="3"/> </operator> <connect from_port="training set" to_op="Decision Tree" to_port="training set"/> <connect from_op="Decision Tree" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.1.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34"> <parameter key="main_criterion" value="first"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="true"/> <parameter key="kappa" value="false"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="false"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> </operator> <connect from_port="model" to_op="Apply Model" to_port="model"/> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/> <connect from_op="Performance" from_port="performance" to_port="performance 1"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> </process> </operator> <connect from_port="example set" to_op="Cross Validation" to_port="example set"/> <connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/> <portSpacing port="source_example set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_performance" spacing="0"/> </process> </operator> <operator activated="true" class="model_simulator:automatic_feature_engineering" compatibility="9.1.000" expanded="true" height="103" name="Automatic Feature Engineering" width="90" x="648" y="289"> <parameter key="mode" value="feature selection"/> <parameter key="balance for accuracy" value="1.0"/> <parameter key="show progress dialog" value="false"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="use optimization heuristics" value="true"/> <parameter key="maximum generations" value="30"/> <parameter key="population size" value="10"/> <parameter key="use multi-starts" value="true"/> <parameter key="number of multi-starts" value="5"/> <parameter key="generations until multi-start" value="10"/> <parameter key="use time limit" value="false"/> <parameter key="time limit in seconds" value="60"/> <parameter key="use subset for generation" value="false"/> <parameter key="maximum function complexity" value="10"/> <parameter key="use_plus" value="false"/> <parameter key="use_diff" value="false"/> <parameter key="use_mult" value="true"/> <parameter key="use_div" value="true"/> <parameter key="reciprocal_value" value="true"/> <parameter key="use_square_roots" value="false"/> <parameter key="use_exp" value="false"/> <parameter key="use_log" value="false"/> <parameter key="use_absolute_values" value="false"/> <parameter key="use_sgn" value="false"/> <parameter key="use_min" value="false"/> <parameter key="use_max" value="false"/> <process expanded="true"> <operator activated="true" class="concurrency:cross_validation" compatibility="9.1.000" expanded="true" height="145" name="Cross Validation (2)" width="90" x="313" y="85"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="10"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.1.000" expanded="true" height="103" name="Decision Tree (2)" width="90" x="179" y="85"> <parameter key="criterion" value="gain_ratio"/> <parameter key="maximal_depth" value="10"/> <parameter key="apply_pruning" value="true"/> <parameter key="confidence" value="0.1"/> <parameter key="apply_prepruning" value="true"/> <parameter key="minimal_gain" value="0.01"/> <parameter key="minimal_leaf_size" value="2"/> <parameter key="minimal_size_for_split" value="4"/> <parameter key="number_of_prepruning_alternatives" value="3"/> </operator> <connect from_port="training set" to_op="Decision Tree (2)" to_port="training set"/> <connect from_op="Decision Tree (2)" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="112" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.1.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34"> <parameter key="main_criterion" value="first"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="true"/> <parameter key="kappa" value="false"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="false"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> </operator> <connect from_port="model" to_op="Apply Model (2)" to_port="model"/> <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/> <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/> <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> </process> </operator> <operator activated="true" class="remember" compatibility="9.1.000" expanded="true" height="68" name="Remember" width="90" x="447" y="136"> <parameter key="name" value="performance"/> <parameter key="io_object" value="PerformanceVector"/> <parameter key="store_which" value="1"/> <parameter key="remove_from_process" value="true"/> </operator> <connect from_port="example set source" to_op="Cross Validation (2)" to_port="example set"/> <connect from_op="Cross Validation (2)" from_port="performance 1" to_op="Remember" to_port="store"/> <connect from_op="Remember" from_port="stored" to_port="performance sink"/> <portSpacing port="source_example set source" spacing="0"/> <portSpacing port="sink_performance sink" spacing="0"/> </process> </operator> <operator activated="true" class="recall" compatibility="9.1.000" expanded="true" height="68" name="Recall" width="90" x="849" y="340"> <parameter key="name" value="performance"/> <parameter key="io_object" value="PerformanceVector"/> <parameter key="remove_from_store" value="true"/> </operator> <connect from_op="Retrieve Titanic" from_port="output" to_op="Select Attributes" to_port="example set input"/> <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/> <connect from_op="Set Role" from_port="example set output" to_op="Multiply" to_port="input"/> <connect from_op="Multiply" from_port="output 1" to_op="Optimize Selection (Evolutionary)" to_port="example set in"/> <connect from_op="Multiply" from_port="output 2" to_op="Automatic Feature Engineering" to_port="example set in"/> <connect from_op="Optimize Selection (Evolutionary)" from_port="weights" to_port="result 2"/> <connect from_op="Optimize Selection (Evolutionary)" from_port="performance" to_port="result 1"/> <connect from_op="Automatic Feature Engineering" from_port="feature set" to_port="result 3"/> <connect from_op="Recall" from_port="result" to_port="result 4"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> <portSpacing port="sink_result 3" spacing="0"/> <portSpacing port="sink_result 4" spacing="0"/> <portSpacing port="sink_result 5" spacing="0"/> </process> </operator> </process>
Tagged:
1
Best Answers
-
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi @lionelderkrikor,Ok, now to part 2 of the comments. Thanks again BTW.1) "Optimize Selection (Evolutionary) operator vs AFE operator - If I good understand, AFE operator is using an evolutionnary algorithm, so we must, a priori, find the same results with the 2"No, they are actually not the same. The new operator uses the same basic concepts but different techniques for selection, mutation, and generation. It also uses some improved heuristics for stopping criteria and added multistarts which should lead to better results faster in most cases. "Most cases" since those are still randomized heuristics so there are no guarantees but it worked very well on the 20+ test data sets we have been analyzing and comparing and never showed statistically significant poorer performances (but sometimes performed significantly better).In addition, there seems to be a bug (see below) in the final model selection which does not always occur but does in your test case (see below and also the other thread on the "shift" issue).2) "Unexpected results with the "balance for accuracy" parameter of the AFE operator"I am 99% sure that this is the result of the "shifting" bug which sometimes occur during the model selection. You can see the same problem in the visualization of the Pareto front in AM as you have pointed out before.3) "The tutorial associated to the AFE operator is broken : there are missing links between some operators..."Yes, thanks. This has already been fixed in the recent development build and will be part of the next release.4) "Is there any reason to that ? maybe, in practice, the AFE need to be itself cross-validated?"Exactly. Well, not necessarily cross-validated but at least validated on a test set at all. The inner performance is the "training error" of the feature engineering. As you know I am a strong believer that looking after training errors is a sure recipe for disaster which is why we do not deliver it outside here to avoid problems with it in the first place. If you absolutely want to see it, you can use the the third port which all the logged results or use the logging mechanism of RapidMiner. So we do not hide it, we just make it a bit harder to misuse it ;-)Hope this helps and we will certainly have a look into the shifting bug (point 2 above) asap.Thanks,
Ingo5 -
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderBTW, here is a somewhat simplified process based on yours which uses classification error instead of accuracy. However, without the shifting bug fix this can still lead to weird behaviors in certain situations.
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve Titanic" width="90" x="45" y="34"><br> <parameter key="repository_entry" value="//Samples/data/Titanic"/><br> </operator><br> <operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="34"><br> <parameter key="attribute_filter_type" value="subset"/><br> <parameter key="attribute" value=""/><br> <parameter key="attributes" value="Ticket Number|Name|Cabin"/><br> <parameter key="use_except_expression" value="false"/><br> <parameter key="value_type" value="attribute_value"/><br> <parameter key="use_value_type_exception" value="false"/><br> <parameter key="except_value_type" value="time"/><br> <parameter key="block_type" value="attribute_block"/><br> <parameter key="use_block_type_exception" value="false"/><br> <parameter key="except_block_type" value="value_matrix_row_start"/><br> <parameter key="invert_selection" value="true"/><br> <parameter key="include_special_attributes" value="false"/><br> </operator><br> <operator activated="true" class="set_role" compatibility="9.1.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34"><br> <parameter key="attribute_name" value="Survived"/><br> <parameter key="target_role" value="label"/><br> <list key="set_additional_roles"/><br> </operator><br> <operator activated="true" class="model_simulator:automatic_feature_engineering" compatibility="9.1.001-SNAPSHOT" expanded="true" height="103" name="Automatic Feature Engineering" width="90" x="447" y="34"><br> <parameter key="mode" value="feature selection"/><br> <parameter key="balance for accuracy" value="1.0"/><br> <parameter key="show progress dialog" value="false"/><br> <parameter key="use_local_random_seed" value="false"/><br> <parameter key="local_random_seed" value="1992"/><br> <parameter key="use optimization heuristics" value="true"/><br> <parameter key="maximum generations" value="30"/><br> <parameter key="population size" value="10"/><br> <parameter key="use multi-starts" value="true"/><br> <parameter key="number of multi-starts" value="5"/><br> <parameter key="generations until multi-start" value="10"/><br> <parameter key="use time limit" value="false"/><br> <parameter key="time limit in seconds" value="60"/><br> <parameter key="use subset for generation" value="false"/><br> <parameter key="maximum function complexity" value="10"/><br> <parameter key="use_plus" value="false"/><br> <parameter key="use_diff" value="false"/><br> <parameter key="use_mult" value="true"/><br> <parameter key="use_div" value="true"/><br> <parameter key="reciprocal_value" value="true"/><br> <parameter key="use_square_roots" value="false"/><br> <parameter key="use_exp" value="false"/><br> <parameter key="use_log" value="false"/><br> <parameter key="use_absolute_values" value="false"/><br> <parameter key="use_sgn" value="false"/><br> <parameter key="use_min" value="false"/><br> <parameter key="use_max" value="false"/><br> <process expanded="true"><br> <operator activated="true" class="concurrency:cross_validation" compatibility="9.1.000" expanded="true" height="145" name="Cross Validation (2)" width="90" x="45" y="34"><br> <parameter key="split_on_batch_attribute" value="false"/><br> <parameter key="leave_one_out" value="false"/><br> <parameter key="number_of_folds" value="10"/><br> <parameter key="sampling_type" value="automatic"/><br> <parameter key="use_local_random_seed" value="false"/><br> <parameter key="local_random_seed" value="1992"/><br> <parameter key="enable_parallel_execution" value="true"/><br> <process expanded="true"><br> <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.1.000" expanded="true" height="103" name="Decision Tree (2)" width="90" x="179" y="85"><br> <parameter key="criterion" value="gain_ratio"/><br> <parameter key="maximal_depth" value="10"/><br> <parameter key="apply_pruning" value="true"/><br> <parameter key="confidence" value="0.1"/><br> <parameter key="apply_prepruning" value="true"/><br> <parameter key="minimal_gain" value="0.01"/><br> <parameter key="minimal_leaf_size" value="2"/><br> <parameter key="minimal_size_for_split" value="4"/><br> <parameter key="number_of_prepruning_alternatives" value="3"/><br> </operator><br> <connect from_port="training set" to_op="Decision Tree (2)" to_port="training set"/><br> <connect from_op="Decision Tree (2)" from_port="model" to_port="model"/><br> <portSpacing port="source_training set" spacing="0"/><br> <portSpacing port="sink_model" spacing="0"/><br> <portSpacing port="sink_through 1" spacing="0"/><br> </process><br> <process expanded="true"><br> <operator activated="true" class="apply_model" compatibility="9.1.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="112" y="34"><br> <list key="application_parameters"/><br> <parameter key="create_view" value="false"/><br> </operator><br> <operator activated="true" class="performance_classification" compatibility="9.1.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34"><br> <parameter key="main_criterion" value="first"/><br> <parameter key="accuracy" value="false"/><br> <parameter key="classification_error" value="true"/><br> <parameter key="kappa" value="false"/><br> <parameter key="weighted_mean_recall" value="false"/><br> <parameter key="weighted_mean_precision" value="false"/><br> <parameter key="spearman_rho" value="false"/><br> <parameter key="kendall_tau" value="false"/><br> <parameter key="absolute_error" value="false"/><br> <parameter key="relative_error" value="false"/><br> <parameter key="relative_error_lenient" value="false"/><br> <parameter key="relative_error_strict" value="false"/><br> <parameter key="normalized_absolute_error" value="false"/><br> <parameter key="root_mean_squared_error" value="false"/><br> <parameter key="root_relative_squared_error" value="false"/><br> <parameter key="squared_error" value="false"/><br> <parameter key="correlation" value="false"/><br> <parameter key="squared_correlation" value="false"/><br> <parameter key="cross-entropy" value="false"/><br> <parameter key="margin" value="false"/><br> <parameter key="soft_margin_loss" value="false"/><br> <parameter key="logistic_loss" value="false"/><br> <parameter key="skip_undefined_labels" value="true"/><br> <parameter key="use_example_weights" value="true"/><br> <list key="class_weights"/><br> </operator><br> <connect from_port="model" to_op="Apply Model (2)" to_port="model"/><br> <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/><br> <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/><br> <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/><br> <portSpacing port="source_model" spacing="0"/><br> <portSpacing port="source_test set" spacing="0"/><br> <portSpacing port="source_through 1" spacing="0"/><br> <portSpacing port="sink_test set results" spacing="0"/><br> <portSpacing port="sink_performance 1" spacing="0"/><br> <portSpacing port="sink_performance 2" spacing="0"/><br> </process><br> </operator><br> <connect from_port="example set source" to_op="Cross Validation (2)" to_port="example set"/><br> <connect from_op="Cross Validation (2)" from_port="performance 1" to_port="performance sink"/><br> <portSpacing port="source_example set source" spacing="0"/><br> <portSpacing port="sink_performance sink" spacing="0"/><br> </process><br> </operator><br> <connect from_op="Retrieve Titanic" from_port="output" to_op="Select Attributes" to_port="example set input"/><br> <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/><br> <connect from_op="Set Role" from_port="example set output" to_op="Automatic Feature Engineering" to_port="example set in"/><br> <connect from_op="Automatic Feature Engineering" from_port="feature set" to_port="result 1"/><br> <connect from_op="Automatic Feature Engineering" from_port="population" to_port="result 2"/><br> <connect from_op="Automatic Feature Engineering" from_port="optimization log" to_port="result 3"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> <portSpacing port="sink_result 3" spacing="0"/><br> <portSpacing port="sink_result 4" spacing="0"/><br> </process><br> </operator><br></process><br>
5
Answers
Thanks you for your time and your answers.
Regards,
Lionel
Ingo