The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
how to select
Hi community,
I created a process of prediction models which includes three different algorithms (K-NN, Naive Bayes and decision tree).
Those models are performed at the same time by using the 'Loop Parameters' and 'Select Subprocesses' Operator. Now the model with the highest performance shall be applied for the chosen data set. So my questions are:
1)How to find the best model/measure the performance of all three models at the same time?
2)How to use that particular model?
Alternatively each algorithm could be used in three seperate processes and individually measured the performance. But I would like to have all models in one process.
Thank you for the help!
Stay safe!
Tagged:
0
Best Answer
-
hbajpai Member Posts: 102 UnicornHey @LeMarc,
You can achieve that using overall optimize parameter grid with select sub-process operator. I am sharing an example with you below that tries to find best model out of GBT, RF and Deep Learning. Depending on the accuracy, the output would be the best performing model. The example case block I shared is set up for classification problem.
Let me know if this works for you.<?xml version="1.0" encoding="UTF-8"?><process version="9.6.000"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.6.000" expanded="true" height="124" name="Optimize Parameters (Grid)" width="90" x="514" y="34"> <list key="parameters"> <parameter key="Select Subprocess.select_which" value="[1.0;3;3;linear]"/> </list> <parameter key="error_handling" value="fail on error"/> <parameter key="log_performance" value="true"/> <parameter key="log_all_criteria" value="false"/> <parameter key="synchronize" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="select_subprocess" compatibility="9.6.000" expanded="true" height="82" name="Select Subprocess" width="90" x="514" y="34"> <parameter key="select_which" value="2"/> <process expanded="true"> <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.6.000" expanded="true" height="124" name="Optimize Parameters RF" width="90" x="112" y="34"> <list key="parameters"> <parameter key="Random Forest.number_of_trees" value="[10;1000;10;linear]"/> </list> <parameter key="error_handling" value="fail on error"/> <parameter key="log_performance" value="true"/> <parameter key="log_all_criteria" value="false"/> <parameter key="synchronize" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="concurrency:cross_validation" compatibility="9.6.000" expanded="true" height="145" name="Cross Validation (2)" width="90" x="246" y="34"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="10"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="concurrency:parallel_random_forest" compatibility="9.4.000" expanded="true" height="103" name="Random Forest" width="90" x="179" y="34"> <parameter key="number_of_trees" value="10"/> <parameter key="criterion" value="accuracy"/> <parameter key="maximal_depth" value="10"/> <parameter key="apply_pruning" value="false"/> <parameter key="confidence" value="0.1"/> <parameter key="apply_prepruning" value="false"/> <parameter key="minimal_gain" value="0.01"/> <parameter key="minimal_leaf_size" value="2"/> <parameter key="minimal_size_for_split" value="4"/> <parameter key="number_of_prepruning_alternatives" value="3"/> <parameter key="random_splits" value="false"/> <parameter key="guess_subset_ratio" value="true"/> <parameter key="subset_ratio" value="0.2"/> <parameter key="voting_strategy" value="confidence vote"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> </operator> <connect from_port="training set" to_op="Random Forest" to_port="training set"/> <connect from_op="Random Forest" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.6.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="112" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.6.000" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34"> <parameter key="main_criterion" value="accuracy"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="true"/> <parameter key="kappa" value="false"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="false"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> </operator> <connect from_port="model" to_op="Apply Model (2)" to_port="model"/> <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/> <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/> <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/> <connect from_op="Performance (2)" from_port="example set" to_port="test set results"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Cross Validation (2)" to_port="example set"/> <connect from_op="Cross Validation (2)" from_port="performance 1" to_port="performance"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_performance" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Optimize Parameters RF" to_port="input 1"/> <connect from_op="Optimize Parameters RF" from_port="performance" to_port="output 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.6.000" expanded="true" height="124" name="Optimize Parameters xgboost" width="90" x="112" y="34"> <list key="parameters"> <parameter key="Gradient Boosted Trees.number_of_trees" value="[100;1000;5;linear]"/> <parameter key="Gradient Boosted Trees.maximal_depth" value="2,4,7"/> </list> <parameter key="error_handling" value="fail on error"/> <parameter key="log_performance" value="true"/> <parameter key="log_all_criteria" value="false"/> <parameter key="synchronize" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="concurrency:cross_validation" compatibility="9.6.000" expanded="true" height="145" name="Cross Validation" width="90" x="246" y="34"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="10"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="h2o:gradient_boosted_trees" compatibility="9.3.001" expanded="true" height="103" name="Gradient Boosted Trees" width="90" x="112" y="34"> <parameter key="number_of_trees" value="100"/> <parameter key="reproducible" value="false"/> <parameter key="maximum_number_of_threads" value="4"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="maximal_depth" value="4"/> <parameter key="min_rows" value="10.0"/> <parameter key="min_split_improvement" value="0.0"/> <parameter key="number_of_bins" value="20"/> <parameter key="learning_rate" value="0.1"/> <parameter key="sample_rate" value="1.0"/> <parameter key="distribution" value="AUTO"/> <parameter key="early_stopping" value="false"/> <parameter key="stopping_rounds" value="1"/> <parameter key="stopping_metric" value="AUTO"/> <parameter key="stopping_tolerance" value="0.001"/> <parameter key="max_runtime_seconds" value="0"/> <list key="expert_parameters"/> </operator> <connect from_port="training set" to_op="Gradient Boosted Trees" to_port="training set"/> <connect from_op="Gradient Boosted Trees" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.6.000" expanded="true" height="82" name="Apply Model" width="90" x="112" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.6.000" expanded="true" height="82" name="Performance" width="90" x="246" y="34"> <parameter key="main_criterion" value="accuracy"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="true"/> <parameter key="kappa" value="false"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="false"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> </operator> <connect from_port="model" to_op="Apply Model" to_port="model"/> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/> <connect from_op="Performance" from_port="performance" to_port="performance 1"/> <connect from_op="Performance" from_port="example set" to_port="test set results"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Cross Validation" to_port="example set"/> <connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_performance" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Optimize Parameters xgboost" to_port="input 1"/> <connect from_op="Optimize Parameters xgboost" from_port="performance" to_port="output 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.6.000" expanded="true" height="124" name="Optimize Parameters deep learning" width="90" x="112" y="34"> <list key="parameters"> <parameter key="Deep Learning.epochs" value="[0.0;1.7976931348623157E308;10;linear]"/> <parameter key="Deep Learning.learning_rate" value="[0.0;1.0;10;linear]"/> </list> <parameter key="error_handling" value="fail on error"/> <parameter key="log_performance" value="true"/> <parameter key="log_all_criteria" value="false"/> <parameter key="synchronize" value="false"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="concurrency:cross_validation" compatibility="9.6.000" expanded="true" height="145" name="Cross Validation (3)" width="90" x="246" y="34"> <parameter key="split_on_batch_attribute" value="false"/> <parameter key="leave_one_out" value="false"/> <parameter key="number_of_folds" value="10"/> <parameter key="sampling_type" value="automatic"/> <parameter key="use_local_random_seed" value="false"/> <parameter key="local_random_seed" value="1992"/> <parameter key="enable_parallel_execution" value="true"/> <process expanded="true"> <operator activated="true" class="h2o:deep_learning" compatibility="9.3.001" expanded="true" height="82" name="Deep Learning" width="90" x="112" y="34"> <parameter key="activation" value="Rectifier"/> <enumeration key="hidden_layer_sizes"> <parameter key="hidden_layer_sizes" value="50"/> <parameter key="hidden_layer_sizes" value="50"/> </enumeration> <enumeration key="hidden_dropout_ratios"/> <parameter key="reproducible_(uses_1_thread)" value="true"/> <parameter key="use_local_random_seed" value="true"/> <parameter key="local_random_seed" value="1992"/> <parameter key="epochs" value="5.393079404586947E307"/> <parameter key="compute_variable_importances" value="false"/> <parameter key="train_samples_per_iteration" value="-2"/> <parameter key="adaptive_rate" value="true"/> <parameter key="epsilon" value="1.0E-8"/> <parameter key="rho" value="0.99"/> <parameter key="learning_rate" value="0.6"/> <parameter key="learning_rate_annealing" value="1.0E-6"/> <parameter key="learning_rate_decay" value="1.0"/> <parameter key="momentum_start" value="0.0"/> <parameter key="momentum_ramp" value="1000000.0"/> <parameter key="momentum_stable" value="0.0"/> <parameter key="nesterov_accelerated_gradient" value="true"/> <parameter key="standardize" value="true"/> <parameter key="L1" value="1.0E-5"/> <parameter key="L2" value="0.0"/> <parameter key="max_w2" value="10.0"/> <parameter key="loss_function" value="Automatic"/> <parameter key="distribution_function" value="AUTO"/> <parameter key="early_stopping" value="false"/> <parameter key="stopping_rounds" value="1"/> <parameter key="stopping_metric" value="AUTO"/> <parameter key="stopping_tolerance" value="0.001"/> <parameter key="missing_values_handling" value="MeanImputation"/> <parameter key="max_runtime_seconds" value="0"/> <list key="expert_parameters"/> <list key="expert_parameters_"/> </operator> <connect from_port="training set" to_op="Deep Learning" to_port="training set"/> <connect from_op="Deep Learning" from_port="model" to_port="model"/> <portSpacing port="source_training set" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_through 1" spacing="0"/> </process> <process expanded="true"> <operator activated="true" class="apply_model" compatibility="9.6.000" expanded="true" height="82" name="Apply Model (3)" width="90" x="112" y="34"> <list key="application_parameters"/> <parameter key="create_view" value="false"/> </operator> <operator activated="true" class="performance_classification" compatibility="9.6.000" expanded="true" height="82" name="Performance (3)" width="90" x="246" y="34"> <parameter key="main_criterion" value="accuracy"/> <parameter key="accuracy" value="true"/> <parameter key="classification_error" value="true"/> <parameter key="kappa" value="false"/> <parameter key="weighted_mean_recall" value="false"/> <parameter key="weighted_mean_precision" value="false"/> <parameter key="spearman_rho" value="false"/> <parameter key="kendall_tau" value="false"/> <parameter key="absolute_error" value="false"/> <parameter key="relative_error" value="false"/> <parameter key="relative_error_lenient" value="false"/> <parameter key="relative_error_strict" value="false"/> <parameter key="normalized_absolute_error" value="false"/> <parameter key="root_mean_squared_error" value="false"/> <parameter key="root_relative_squared_error" value="false"/> <parameter key="squared_error" value="false"/> <parameter key="correlation" value="false"/> <parameter key="squared_correlation" value="false"/> <parameter key="cross-entropy" value="false"/> <parameter key="margin" value="false"/> <parameter key="soft_margin_loss" value="false"/> <parameter key="logistic_loss" value="false"/> <parameter key="skip_undefined_labels" value="true"/> <parameter key="use_example_weights" value="true"/> <list key="class_weights"/> </operator> <connect from_port="model" to_op="Apply Model (3)" to_port="model"/> <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/> <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Performance (3)" to_port="labelled data"/> <connect from_op="Performance (3)" from_port="performance" to_port="performance 1"/> <connect from_op="Performance (3)" from_port="example set" to_port="test set results"/> <portSpacing port="source_model" spacing="0"/> <portSpacing port="source_test set" spacing="0"/> <portSpacing port="source_through 1" spacing="0"/> <portSpacing port="sink_test set results" spacing="0"/> <portSpacing port="sink_performance 1" spacing="0"/> <portSpacing port="sink_performance 2" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Cross Validation (3)" to_port="example set"/> <connect from_op="Cross Validation (3)" from_port="performance 1" to_port="performance"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_performance" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Optimize Parameters deep learning" to_port="input 1"/> <connect from_op="Optimize Parameters deep learning" from_port="performance" to_port="output 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> <portSpacing port="sink_output 2" spacing="0"/> </process> </operator> <connect from_port="input 1" to_op="Select Subprocess" to_port="input 1"/> <connect from_op="Select Subprocess" from_port="output 1" to_port="performance"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_performance" spacing="0"/> <portSpacing port="sink_model" spacing="0"/> <portSpacing port="sink_output 1" spacing="0"/> </process> <description align="center" color="transparent" colored="false" width="126">Comparing RF, xgboost and deep learning algorithms</description> </operator> <connect from_port="input 1" to_op="Optimize Parameters (Grid)" to_port="input 1"/> <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="source_input 2" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
Best,
Harshit7
Answers