The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Stacking with bagging as meta learner
djafarsidik
Member Posts: 7 Learner I
Hi..
I am newbie,
I would like to ask regarding Stacking method in Rapidminer.
So what I want to do is making stacking by using decision tree and naive bayes as base learner and for meta learner I want to use bagging with decision tree in inner process.
Thank you in advance.
I am newbie,
I would like to ask regarding Stacking method in Rapidminer.
So what I want to do is making stacking by using decision tree and naive bayes as base learner and for meta learner I want to use bagging with decision tree in inner process.
For validation I use 10 fold cross validation but I want to get performance result per each fold beside overall result.
More or less the scheme is as this
This is my design
<?xml version="1.0" encoding="UTF-8"?><process version="9.4.001"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.4.001" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="SYSTEM"/><br> <process expanded="true"><br> <operator activated="true" class="retrieve" compatibility="9.4.001" expanded="true" height="68" name="Retrieve bandung_L2" width="90" x="112" y="238"><br> <parameter key="repository_entry" value="//Thesis/data/bandung_L2"/><br> </operator><br> <operator activated="true" class="concurrency:cross_validation" compatibility="9.4.001" expanded="true" height="145" name="Cross Validation" width="90" x="380" y="187"><br> <parameter key="split_on_batch_attribute" value="false"/><br> <parameter key="leave_one_out" value="false"/><br> <parameter key="number_of_folds" value="10"/><br> <parameter key="sampling_type" value="stratified sampling"/><br> <parameter key="use_local_random_seed" value="false"/><br> <parameter key="local_random_seed" value="1992"/><br> <parameter key="enable_parallel_execution" value="true"/><br> <process expanded="true"><br> <operator activated="true" class="stacking" compatibility="9.4.001" expanded="true" height="68" name="Stacking" width="90" x="112" y="34"><br> <parameter key="keep_all_attributes" value="true"/><br> <parameter key="keep_confidences" value="false"/><br> <process expanded="true"><br> <operator activated="true" class="naive_bayes" compatibility="9.4.001" expanded="true" height="82" name="Naive Bayes" width="90" x="112" y="85"><br> <parameter key="laplace_correction" value="true"/><br> </operator><br> <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.4.001" expanded="true" height="103" name="Decision Tree" width="90" x="112" y="187"><br> <parameter key="criterion" value="gain_ratio"/><br> <parameter key="maximal_depth" value="10"/><br> <parameter key="apply_pruning" value="true"/><br> <parameter key="confidence" value="0.1"/><br> <parameter key="apply_prepruning" value="true"/><br> <parameter key="minimal_gain" value="0.01"/><br> <parameter key="minimal_leaf_size" value="2"/><br> <parameter key="minimal_size_for_split" value="4"/><br> <parameter key="number_of_prepruning_alternatives" value="3"/><br> </operator><br> <connect from_port="training set 1" to_op="Naive Bayes" to_port="training set"/><br> <connect from_port="training set 2" to_op="Decision Tree" to_port="training set"/><br> <connect from_op="Naive Bayes" from_port="model" to_port="base model 1"/><br> <connect from_op="Decision Tree" from_port="model" to_port="base model 2"/><br> <portSpacing port="source_training set 1" spacing="0"/><br> <portSpacing port="source_training set 2" spacing="0"/><br> <portSpacing port="source_training set 3" spacing="0"/><br> <portSpacing port="sink_base model 1" spacing="0"/><br> <portSpacing port="sink_base model 2" spacing="0"/><br> <portSpacing port="sink_base model 3" spacing="0"/><br> </process><br> <process expanded="true"><br> <operator activated="true" class="bagging" compatibility="9.4.001" expanded="true" height="82" name="Bagging" width="90" x="112" y="34"><br> <parameter key="sample_ratio" value="0.9"/><br> <parameter key="iterations" value="10"/><br> <parameter key="average_confidences" value="true"/><br> <parameter key="use_local_random_seed" value="false"/><br> <parameter key="local_random_seed" value="1992"/><br> <process expanded="true"><br> <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.4.001" expanded="true" height="103" name="Decision Tree (2)" width="90" x="313" y="85"><br> <parameter key="criterion" value="gain_ratio"/><br> <parameter key="maximal_depth" value="10"/><br> <parameter key="apply_pruning" value="true"/><br> <parameter key="confidence" value="0.1"/><br> <parameter key="apply_prepruning" value="true"/><br> <parameter key="minimal_gain" value="0.01"/><br> <parameter key="minimal_leaf_size" value="2"/><br> <parameter key="minimal_size_for_split" value="4"/><br> <parameter key="number_of_prepruning_alternatives" value="3"/><br> </operator><br> <connect from_port="training set" to_op="Decision Tree (2)" to_port="training set"/><br> <connect from_op="Decision Tree (2)" from_port="model" to_port="model"/><br> <portSpacing port="source_training set" spacing="0"/><br> <portSpacing port="sink_model" spacing="0"/><br> </process><br> </operator><br> <connect from_port="stacking examples" to_op="Bagging" to_port="training set"/><br> <connect from_op="Bagging" from_port="model" to_port="stacking model"/><br> <portSpacing port="source_stacking examples" spacing="0"/><br> <portSpacing port="sink_stacking model" spacing="0"/><br> </process><br> </operator><br> <connect from_port="training set" to_op="Stacking" to_port="training set"/><br> <connect from_op="Stacking" from_port="model" to_port="model"/><br> <portSpacing port="source_training set" spacing="0"/><br> <portSpacing port="sink_model" spacing="0"/><br> <portSpacing port="sink_through 1" spacing="0"/><br> </process><br> <process expanded="true"><br> <operator activated="true" class="apply_model" compatibility="9.4.001" expanded="true" height="82" name="Apply Model" width="90" x="112" y="85"><br> <list key="application_parameters"/><br> <parameter key="create_view" value="false"/><br> </operator><br> <operator activated="true" class="performance_binominal_classification" compatibility="9.4.001" expanded="true" height="82" name="Performance" width="90" x="246" y="136"><br> <parameter key="manually_set_positive_class" value="false"/><br> <parameter key="main_criterion" value="first"/><br> <parameter key="accuracy" value="true"/><br> <parameter key="classification_error" value="false"/><br> <parameter key="kappa" value="false"/><br> <parameter key="AUC (optimistic)" value="false"/><br> <parameter key="AUC" value="false"/><br> <parameter key="AUC (pessimistic)" value="false"/><br> <parameter key="precision" value="true"/><br> <parameter key="recall" value="true"/><br> <parameter key="lift" value="false"/><br> <parameter key="fallout" value="false"/><br> <parameter key="f_measure" value="false"/><br> <parameter key="false_positive" value="false"/><br> <parameter key="false_negative" value="false"/><br> <parameter key="true_positive" value="false"/><br> <parameter key="true_negative" value="false"/><br> <parameter key="sensitivity" value="false"/><br> <parameter key="specificity" value="false"/><br> <parameter key="youden" value="false"/><br> <parameter key="positive_predictive_value" value="false"/><br> <parameter key="negative_predictive_value" value="false"/><br> <parameter key="psep" value="false"/><br> <parameter key="skip_undefined_labels" value="true"/><br> <parameter key="use_example_weights" value="true"/><br> </operator><br> <connect from_port="model" to_op="Apply Model" to_port="model"/><br> <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/><br> <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/><br> <connect from_op="Performance" from_port="performance" to_port="performance 1"/><br> <portSpacing port="source_model" spacing="0"/><br> <portSpacing port="source_test set" spacing="0"/><br> <portSpacing port="source_through 1" spacing="0"/><br> <portSpacing port="sink_test set results" spacing="0"/><br> <portSpacing port="sink_performance 1" spacing="0"/><br> <portSpacing port="sink_performance 2" spacing="0"/><br> </process><br> </operator><br> <connect from_op="Retrieve bandung_L2" from_port="output" to_op="Cross Validation" to_port="example set"/><br> <connect from_op="Cross Validation" from_port="model" to_port="result 1"/><br> <connect from_op="Cross Validation" from_port="performance 1" to_port="result 2"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> <portSpacing port="sink_result 3" spacing="0"/><br> </process><br> </operator><br></process>
Kindly please advice is my design correct for above purpose (design & data attached) ?, and is it possible to display/store testing example process per each fold ? any comment highly appreciate.
Thank you in advance.
0
Best Answers
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 UnicornHi @djafarsidik,
1/ Extract performance and example set of each fold of the X-validation :
It's very easy. You have to put 2 Store operators in the Testing part of your Cross Validation operator and
use the macro %{execution_count} to name the different files.
See the process_1.rmp in attached file.
2/ Meta-learner(s)
It's difficult for me to check the set-up of your process because I don't understand which meta learner technique you want to use.
In your process you have a mix Stacking/Bagging but from my point of view, the schema you shared is showing a Voting meta-learner...
If it is the case, you have to use the Vote operator and inside this operator put :
- a Decision tree model and
- A Naive Bayes model
See the Process_2.rmp for implementation in attached file.
I hope this helps,
Regards,
Lionel5 -
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn@djafarsidik,
I don't see any incorrect things in your design.
However, I would use the data-science "methodology" :
I would build all the envisaged models (simple Voting, simple Bagging, simple Stacking, your set-up) and retain only the best one (the highest performance).
Hope this helps,
Regards,
Lionel5
Answers
Stacking and bagging are two different approaches to ensemble modeling. Bagging builds multiple independent models using the same base learner (think random forest built from underlying decision trees) and Stacking uses a master model to decide between different underlying models depending on local performance. Vote is simpler than both of those and allows you to use multiple models everywhere and then just combine their predictions. Your picture appears to depict the Vote approach.
Also Naive Bayes is a very "generalized" model with no tuning parameters and is not likely to vary significantly from subset to subset and is not usually used in any kind of bagging approach.
Finally, why do you want the performance for each fold separately in any case? Is your dataset very small and you anticipate large potential deviation in performance from fold to fold? Looking at fold specific performance is not going to help you compare or evaluate the individual learners you are using in the ensemble.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts