The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Feature selection - maximize recall performance
Hello,
I'm a bit out in the blue on this one. How is it possible to maximize the recall performance metric in the feature selection phase with the Automate Feature Selection operator? Normally when I use this operator I minimized the classification error metric and then it don't generate any errors. Though when I try with the recall performance metric it throws an error. So I guess I'm not using the correct operator for maximize the recall in the feature optimize phase, but if someone could point in the right direction that would be nice.
Thanks.
Andy
Tagged:
0
Best Answers
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi,I think you need to create your own metric with performance to Data, Generate Attributes and Extract performance where you generate 1-Recall. By definition the operator wants to minimize it's performance metric.
Right @IngoRM ?Cheers,
Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany5 -
IngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM FounderHi Andy, Martin,
Yes, that's right. The operator minimizes the performance criterion which works universally well for error rates across classification and regression problems but leads to this behavior. As Martin has suggested, you can use a workaround to make this work with other measurements, too. I have added a small example process below.
Hope this helps,
Ingo<div><?xml version="1.0" encoding="UTF-8"?><process version="9.7.000-SNAPSHOT"></div><div> <context></div><div> <input/></div><div> <output/></div><div> <macros/></div><div> </context></div><div> <operator activated="true" class="process" compatibility="9.7.000-SNAPSHOT" expanded="true" name="Process"></div><div> <parameter key="logverbosity" value="init"/></div><div> <parameter key="random_seed" value="2001"/></div><div> <parameter key="send_mail" value="never"/></div><div> <parameter key="notification_email" value=""/></div><div> <parameter key="process_duration_for_mail" value="30"/></div><div> <parameter key="encoding" value="UTF-8"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="retrieve" compatibility="9.7.000-SNAPSHOT" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34"></div><div> <parameter key="repository_entry" value="//Samples/data/Titanic Training"/></div><div> </operator></div><div> <operator activated="true" class="model_simulator:automatic_feature_engineering" compatibility="9.7.000-SNAPSHOT" expanded="true" height="103" name="Automatic Feature Engineering" width="90" x="179" y="34"></div><div> <parameter key="mode" value="feature selection"/></div><div> <parameter key="balance for accuracy" value="1.0"/></div><div> <parameter key="show progress dialog" value="false"/></div><div> <parameter key="use_local_random_seed" value="false"/></div><div> <parameter key="local_random_seed" value="1992"/></div><div> <parameter key="use optimization heuristics" value="true"/></div><div> <parameter key="maximum generations" value="30"/></div><div> <parameter key="population size" value="10"/></div><div> <parameter key="use multi-starts" value="true"/></div><div> <parameter key="number of multi-starts" value="5"/></div><div> <parameter key="generations until multi-start" value="10"/></div><div> <parameter key="use time limit" value="false"/></div><div> <parameter key="time limit in seconds" value="60"/></div><div> <parameter key="use subset for generation" value="false"/></div><div> <parameter key="maximum function complexity" value="10"/></div><div> <parameter key="use_plus" value="false"/></div><div> <parameter key="use_diff" value="false"/></div><div> <parameter key="use_mult" value="true"/></div><div> <parameter key="use_div" value="true"/></div><div> <parameter key="reciprocal_value" value="true"/></div><div> <parameter key="use_square_roots" value="false"/></div><div> <parameter key="use_exp" value="false"/></div><div> <parameter key="use_log" value="false"/></div><div> <parameter key="use_absolute_values" value="false"/></div><div> <parameter key="use_sgn" value="false"/></div><div> <parameter key="use_min" value="false"/></div><div> <parameter key="use_max" value="false"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="split_validation" compatibility="9.7.000-SNAPSHOT" expanded="true" height="124" name="Validation" width="90" x="45" y="34"></div><div> <parameter key="create_complete_model" value="false"/></div><div> <parameter key="split" value="relative"/></div><div> <parameter key="split_ratio" value="0.7"/></div><div> <parameter key="training_set_size" value="100"/></div><div> <parameter key="test_set_size" value="-1"/></div><div> <parameter key="sampling_type" value="automatic"/></div><div> <parameter key="use_local_random_seed" value="true"/></div><div> <parameter key="local_random_seed" value="1992"/></div><div> <process expanded="true"></div><div> <operator activated="true" class="naive_bayes" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Naive Bayes" width="90" x="45" y="34"></div><div> <parameter key="laplace_correction" value="true"/></div><div> </operator></div><div> <connect from_port="training" to_op="Naive Bayes" to_port="training set"/></div><div> <connect from_op="Naive Bayes" from_port="model" to_port="model"/></div><div> <portSpacing port="source_training" spacing="0"/></div><div> <portSpacing port="sink_model" spacing="0"/></div><div> <portSpacing port="sink_through 1" spacing="0"/></div><div> </process></div><div> <process expanded="true"></div><div> <operator activated="true" class="apply_model" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34"></div><div> <list key="application_parameters"/></div><div> <parameter key="create_view" value="false"/></div><div> </operator></div><div> <operator activated="true" class="performance_binominal_classification" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance" width="90" x="179" y="34"></div><div> <parameter key="manually_set_positive_class" value="false"/></div><div> <parameter key="main_criterion" value="first"/></div><div> <parameter key="accuracy" value="false"/></div><div> <parameter key="classification_error" value="false"/></div><div> <parameter key="kappa" value="false"/></div><div> <parameter key="AUC (optimistic)" value="false"/></div><div> <parameter key="AUC" value="false"/></div><div> <parameter key="AUC (pessimistic)" value="false"/></div><div> <parameter key="precision" value="false"/></div><div> <parameter key="recall" value="true"/></div><div> <parameter key="lift" value="false"/></div><div> <parameter key="fallout" value="false"/></div><div> <parameter key="f_measure" value="false"/></div><div> <parameter key="false_positive" value="false"/></div><div> <parameter key="false_negative" value="false"/></div><div> <parameter key="true_positive" value="false"/></div><div> <parameter key="true_negative" value="false"/></div><div> <parameter key="sensitivity" value="false"/></div><div> <parameter key="specificity" value="false"/></div><div> <parameter key="youden" value="false"/></div><div> <parameter key="positive_predictive_value" value="false"/></div><div> <parameter key="negative_predictive_value" value="false"/></div><div> <parameter key="psep" value="false"/></div><div> <parameter key="skip_undefined_labels" value="true"/></div><div> <parameter key="use_example_weights" value="true"/></div><div> </operator></div><div> <operator activated="true" class="performance_to_data" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance to Data" width="90" x="45" y="238"/></div><div> <operator activated="true" class="generate_attributes" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="238"></div><div> <list key="function_descriptions"></div><div> <parameter key="Value" value="-1*[Value]"/></div><div> </list></div><div> <parameter key="keep_all" value="true"/></div><div> </operator></div><div> <operator activated="true" class="extract_performance" compatibility="9.7.000-SNAPSHOT" expanded="true" height="82" name="Performance (2)" width="90" x="313" y="238"></div><div> <parameter key="performance_type" value="data_value"/></div><div> <parameter key="statistics" value="average"/></div><div> <parameter key="attribute_name" value="Value"/></div><div> <parameter key="example_index" value="1"/></div><div> <parameter key="optimization_direction" value="minimize"/></div><div> </operator></div><div> <connect from_port="model" to_op="Apply Model" to_port="model"/></div><div> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/></div><div> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/></div><div> <connect from_op="Performance" from_port="performance" to_op="Performance to Data" to_port="performance vector"/></div><div> <connect from_op="Performance to Data" from_port="example set" to_op="Generate Attributes" to_port="example set input"/></div><div> <connect from_op="Generate Attributes" from_port="example set output" to_op="Performance (2)" to_port="example set"/></div><div> <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/></div><div> <portSpacing port="source_model" spacing="0"/></div><div> <portSpacing port="source_test set" spacing="0"/></div><div> <portSpacing port="source_through 1" spacing="0"/></div><div> <portSpacing port="sink_averagable 1" spacing="0"/></div><div> <portSpacing port="sink_averagable 2" spacing="0"/></div><div> </process></div><div> </operator></div><div> <connect from_port="example set source" to_op="Validation" to_port="training"/></div><div> <connect from_op="Validation" from_port="averagable 1" to_port="performance sink"/></div><div> <portSpacing port="source_example set source" spacing="0"/></div><div> <portSpacing port="sink_performance sink" spacing="0"/></div><div> </process></div><div> </operator></div><div> <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Automatic Feature Engineering" to_port="example set in"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="feature set" to_port="result 1"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="population" to_port="result 2"/></div><div> <connect from_op="Automatic Feature Engineering" from_port="optimization log" to_port="result 3"/></div><div> <portSpacing port="source_input 1" spacing="0"/></div><div> <portSpacing port="sink_result 1" spacing="0"/></div><div> <portSpacing port="sink_result 2" spacing="0"/></div><div> <portSpacing port="sink_result 3" spacing="0"/></div><div> <portSpacing port="sink_result 4" spacing="0"/></div><div> </process></div><div> </operator></div><div></process></div>
6
Answers