implement this algorithm in rapidminer

Melody · July 2017

Hi, I want to implement an algorithm in the RapidMiner like this, but I do not know how? please guide me

Thomas_Ott · July 2017

Based on your graph you will need a Read opeator to load in your data, a Set Role operator to set your label, then a Sample operator, a Cross Validation(CV) operator, and a Stacking operator on the training side of CV operator. You embed the different machine learners in the Stacking operator.

Melody · July 2017

Hi, Thank you for your reply.
For the sampler operator, should I use the bootstrap operator or bagging?

This error occurred for the operation I used. What is this error?What should I do?

Thomas_Ott · July 2017

Well that depends on what you want to do with sampling as you balance your classes. Is it better to bootstrap (aka unsample) or downsample? Have you considered weighting them using a Generate Weight (stratification)?

Your other error means that you can't deliver and example set (EXA) from that operator, rather you need an operator that delivers a model (MOD). Something like a Naive Bayes or Decision Tree, etc

Melody · July 2017

I want to use an optimal model to achieve higher ranking accuracy in unbalanced data in an ensemble algorithm by combining two ensemble bagging and boosting and using a genetic programming model as a learning algorithm for classifying unbalanced data.If I just to use bagging for sampling and give data for training in boosting. It makes a better model by weight.

I want to use genetic programming to improve this model.How do you think I can make this model? Is this idea feasible?

Thomas_Ott · July 2017

Yup, you can do that in RapidMiner. Post your process when you're ready and we can troubleshoot.

Melody · July 2017

Thank you,

Post my process here or email you?

Thomas_Ott · July 2017

Please post it to the thread, thanks.

Melody · July 2017

Hi, Mr. Ott.

Is my processing correct?
Do you think this complies with the model I explained?
Is sampling done in the same way?
How can the minority class (positive) specifically weigh more to see more in the prediction?

Thomas_Ott · July 2017

See I'm guess that the positive class is the minority class. I would handle it by overweighting the minority class and underweigthing the majority class. Something like this.

Then i would use a Cross Validation (not Split Validation) in the Optimize Weights.

<?xml version="1.0" encoding="UTF-8"?><process version="7.5.003">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.5.003" expanded="true" name="Process">
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="false" class="read_excel" compatibility="7.5.003" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
        <parameter key="excel_file" value="D:\thesis96\dataset\Main.DataSet\Glass2.xlsx"/>
        <parameter key="imported_cell_range" value="A1:J215"/>
        <parameter key="encoding" value="SYSTEM"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value=" RI.true.real.attribute"/>
          <parameter key="1" value=" Na.true.real.attribute"/>
          <parameter key="2" value=" Mg.true.real.attribute"/>
          <parameter key="3" value=" Al.true.real.attribute"/>
          <parameter key="4" value=" Si.true.real.attribute"/>
          <parameter key="5" value=" K.true.real.attribute"/>
          <parameter key="6" value=" Ca.true.real.attribute"/>
          <parameter key="7" value=" Ba.true.real.attribute"/>
          <parameter key="8" value=" Fe.true.real.attribute"/>
          <parameter key="9" value="class.true.nominal.label"/>
        </list>
      </operator>
      <operator activated="false" class="bagging" compatibility="7.5.003" expanded="true" height="82" name="Bagging" width="90" x="179" y="34">
        <process expanded="true">
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.5.003" expanded="true" height="82" name="Decision Tree" width="90" x="246" y="34"/>
          <connect from_port="training set" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
        </process>
      </operator>
      <operator activated="false" class="sample_model_based" compatibility="7.5.003" expanded="true" height="82" name="Sample (Model-Based)" width="90" x="313" y="34"/>
      <operator activated="true" class="retrieve" compatibility="7.5.003" expanded="true" height="68" name="Retrieve Golf" width="90" x="112" y="187">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" breakpoints="after" class="generate_weight_stratification" compatibility="7.5.003" expanded="true" height="82" name="Generate Weight (Stratification)" width="90" x="246" y="187"/>
      <operator activated="true" class="optimize_weights_evolutionary" compatibility="7.5.003" expanded="true" height="103" name="Optimize Weights (2)" width="90" x="514" y="34">
        <parameter key="population_size" value="100"/>
        <parameter key="maximum_number_of_generations" value="40"/>
        <parameter key="use_early_stopping" value="true"/>
        <parameter key="show_population_plotter" value="true"/>
        <parameter key="selection_scheme" value="roulette wheel"/>
        <parameter key="p_crossover" value="0.2"/>
        <parameter key="crossover_type" value="shuffle"/>
        <process expanded="true">
          <operator activated="true" class="concurrency:cross_validation" compatibility="7.5.003" expanded="true" height="145" name="Cross Validation" width="90" x="447" y="34">
            <parameter key="number_of_folds" value="3"/>
            <process expanded="true">
              <operator activated="true" class="adaboost" compatibility="7.5.003" expanded="true" height="82" name="AdaBoost (2)" width="90" x="112" y="34">
                <process expanded="true">
                  <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="7.5.003" expanded="true" height="82" name="Decision Tree (3)" width="90" x="246" y="34"/>
                  <connect from_port="training set" to_op="Decision Tree (3)" to_port="training set"/>
                  <connect from_op="Decision Tree (3)" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                </process>
              </operator>
              <connect from_port="training set" to_op="AdaBoost (2)" to_port="training set"/>
              <connect from_op="AdaBoost (2)" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.5.003" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_classification" compatibility="7.5.003" expanded="true" height="82" name="Performance (2)" width="90" x="246" y="34">
                <parameter key="classification_error" value="true"/>
                <parameter key="weighted_mean_recall" value="true"/>
                <parameter key="weighted_mean_precision" value="true"/>
                <parameter key="root_mean_squared_error" value="true"/>
                <parameter key="root_relative_squared_error" value="true"/>
                <list key="class_weights"/>
              </operator>
              <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_test set results" spacing="0"/>
              <portSpacing port="sink_performance 1" spacing="0"/>
              <portSpacing port="sink_performance 2" spacing="0"/>
            </process>
          </operator>
          <connect from_port="example set" to_op="Cross Validation" to_port="example set"/>
          <connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_performance" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Bagging" to_port="training set"/>
      <connect from_op="Bagging" from_port="model" to_op="Sample (Model-Based)" to_port="model"/>
      <connect from_op="Bagging" from_port="example set" to_op="Sample (Model-Based)" to_port="example set input"/>
      <connect from_op="Retrieve Golf" from_port="output" to_op="Generate Weight (Stratification)" to_port="example set input"/>
      <connect from_op="Generate Weight (Stratification)" from_port="example set output" to_op="Optimize Weights (2)" to_port="example set in"/>
      <connect from_op="Optimize Weights (2)" from_port="example set out" to_port="result 1"/>
      <connect from_op="Optimize Weights (2)" from_port="weights" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Melody · July 2017

I used cross vallidation, but why is the number of error predictions in the confusion matrix not equal to the number of displayed errors of optimize weight and wrong prediction negative and positive? Or am I wrong?

Not compatible with the confusion matrix for visualization.How to be corrected?

How can I get the tree out of this output process?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

implement this algorithm in rapidminer

Answers