"Weka-style Experimenter"

lmsasu · April 2011

Hello,

Is there any approach to perform experiments, as Weka allows inside the Experimenter interface? I would like to specifiy multiple input data sources and a set of models and then RM should perform model building and finally automatically compare the systems' performances?

Thanks,
Lucian

land · April 2011

Hi Lucian,

there's no fixed program for that, but you can easily build a process that does this. This reflects the problem that it isn't appropriate most of the time to compare just the results of a single learner, but you have to take into account the preprocessing. Some need to have normalized data, some can't cope with missing data or nominal values. That's why we have an operator for selecting a complete subprocess. You can use the Loop Parameter to loop over all subprocesses and of course you can use the Loop Collection to loop over any collection of objects, including data sets.

Just a small example process that shows what you can do. If you combine this with for example macros or logging operations, this becomes really powerful.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.008" expanded="true" name="Process">
    <process expanded="true" height="507" width="806">
      <operator activated="true" class="retrieve" compatibility="5.1.008" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="retrieve" compatibility="5.1.008" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="120">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="collect" compatibility="5.1.008" expanded="true" height="94" name="Collect" width="90" x="246" y="30"/>
      <operator activated="true" class="loop_collection" compatibility="5.1.008" expanded="true" height="76" name="Loop Collection" width="90" x="380" y="30">
        <process expanded="true" height="525" width="806">
          <operator activated="true" class="loop_parameters" compatibility="5.1.008" expanded="true" height="76" name="Loop Parameters" width="90" x="112" y="30">
            <list key="parameters">
              <parameter key="Select Subprocess.select_which" value="[1.0;3;3;linear]"/>
            </list>
            <process expanded="true" height="525" width="806">
              <operator activated="true" class="select_subprocess" compatibility="5.1.008" expanded="true" height="76" name="Select Subprocess" width="90" x="45" y="30">
                <parameter key="select_which" value="3"/>
                <process expanded="true" height="525" width="378">
                  <operator activated="true" class="x_validation" compatibility="5.1.008" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
                    <description>A cross-validation evaluating a decision tree model.</description>
                    <process expanded="true" height="507" width="369">
                      <operator activated="true" class="decision_tree" compatibility="5.1.008" expanded="true" height="76" name="Decision Tree" width="90" x="144" y="30"/>
                      <connect from_port="training" to_op="Decision Tree" to_port="training set"/>
                      <connect from_op="Decision Tree" from_port="model" to_port="model"/>
                      <portSpacing port="source_training" spacing="0"/>
                      <portSpacing port="sink_model" spacing="0"/>
                      <portSpacing port="sink_through 1" spacing="0"/>
                    </process>
                    <process expanded="true" height="507" width="369">
                      <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                        <list key="application_parameters"/>
                      </operator>
                      <operator activated="true" class="performance" compatibility="5.1.008" expanded="true" height="76" name="Perf(Decision Tree)" width="90" x="211" y="30"/>
                      <connect from_port="model" to_op="Apply Model" to_port="model"/>
                      <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
                      <connect from_op="Apply Model" from_port="labelled data" to_op="Perf(Decision Tree)" to_port="labelled data"/>
                      <connect from_op="Perf(Decision Tree)" from_port="performance" to_port="averagable 1"/>
                      <portSpacing port="source_model" spacing="0"/>
                      <portSpacing port="source_test set" spacing="0"/>
                      <portSpacing port="source_through 1" spacing="0"/>
                      <portSpacing port="sink_averagable 1" spacing="0"/>
                      <portSpacing port="sink_averagable 2" spacing="0"/>
                    </process>
                  </operator>
                  <connect from_port="input 1" to_op="Validation" to_port="training"/>
                  <connect from_op="Validation" from_port="averagable 1" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
                <process expanded="true" height="525" width="378">
                  <operator activated="true" class="x_validation" compatibility="5.1.008" expanded="true" height="112" name="Validation (2)" width="90" x="45" y="30">
                    <description>A cross-validation evaluating a decision tree model.</description>
                    <process expanded="true" height="507" width="369">
                      <operator activated="true" class="naive_bayes" compatibility="5.1.008" expanded="true" height="76" name="Naive Bayes" width="90" x="45" y="30"/>
                      <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
                      <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
                      <portSpacing port="source_training" spacing="0"/>
                      <portSpacing port="sink_model" spacing="0"/>
                      <portSpacing port="sink_through 1" spacing="0"/>
                    </process>
                    <process expanded="true" height="507" width="369">
                      <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="30">
                        <list key="application_parameters"/>
                      </operator>
                      <operator activated="true" class="performance" compatibility="5.1.008" expanded="true" height="76" name="Perf (NaiveBayes)" width="90" x="211" y="30"/>
                      <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
                      <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
                      <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Perf (NaiveBayes)" to_port="labelled data"/>
                      <connect from_op="Perf (NaiveBayes)" from_port="performance" to_port="averagable 1"/>
                      <portSpacing port="source_model" spacing="0"/>
                      <portSpacing port="source_test set" spacing="0"/>
                      <portSpacing port="source_through 1" spacing="0"/>
                      <portSpacing port="sink_averagable 1" spacing="0"/>
                      <portSpacing port="sink_averagable 2" spacing="0"/>
                    </process>
                  </operator>
                  <connect from_port="input 1" to_op="Validation (2)" to_port="training"/>
                  <connect from_op="Validation (2)" from_port="averagable 1" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
                <process expanded="true" height="525" width="235">
                  <operator activated="true" class="x_validation" compatibility="5.1.008" expanded="true" height="112" name="Validation (3)" width="90" x="45" y="30">
                    <description>A cross-validation evaluating a decision tree model.</description>
                    <process expanded="true" height="507" width="453">
                      <operator activated="true" class="nominal_to_binominal" compatibility="5.1.008" expanded="true" height="94" name="Nominal to Binominal" width="90" x="45" y="30"/>
                      <operator activated="true" class="nominal_to_numerical" compatibility="5.1.008" expanded="true" height="94" name="Nominal to Numerical" width="90" x="179" y="30"/>
                      <operator activated="true" class="linear_regression" compatibility="5.1.008" expanded="true" height="94" name="Linear Regression" width="90" x="313" y="30"/>
                      <operator activated="true" class="group_models" compatibility="5.1.008" expanded="true" height="112" name="Group Models" width="90" x="313" y="165"/>
                      <connect from_port="training" to_op="Nominal to Binominal" to_port="example set input"/>
                      <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
                      <connect from_op="Nominal to Binominal" from_port="preprocessing model" to_op="Group Models" to_port="models in 1"/>
                      <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
                      <connect from_op="Nominal to Numerical" from_port="preprocessing model" to_op="Group Models" to_port="models in 2"/>
                      <connect from_op="Linear Regression" from_port="model" to_op="Group Models" to_port="models in 3"/>
                      <connect from_op="Linear Regression" from_port="weights" to_port="through 1"/>
                      <connect from_op="Group Models" from_port="model out" to_port="model"/>
                      <portSpacing port="source_training" spacing="0"/>
                      <portSpacing port="sink_model" spacing="0"/>
                      <portSpacing port="sink_through 1" spacing="0"/>
                      <portSpacing port="sink_through 2" spacing="0"/>
                    </process>
                    <process expanded="true" height="507" width="369">
                      <operator activated="true" class="apply_model" compatibility="5.1.008" expanded="true" height="76" name="Apply Model (3)" width="90" x="45" y="30">
                        <list key="application_parameters"/>
                      </operator>
                      <operator activated="true" class="performance" compatibility="5.1.008" expanded="true" height="76" name="Perf (LinearRegression)" width="90" x="211" y="30"/>
                      <connect from_port="model" to_op="Apply Model (3)" to_port="model"/>
                      <connect from_port="test set" to_op="Apply Model (3)" to_port="unlabelled data"/>
                      <connect from_op="Apply Model (3)" from_port="labelled data" to_op="Perf (LinearRegression)" to_port="labelled data"/>
                      <connect from_op="Perf (LinearRegression)" from_port="performance" to_port="averagable 1"/>
                      <portSpacing port="source_model" spacing="0"/>
                      <portSpacing port="source_test set" spacing="0"/>
                      <portSpacing port="source_through 1" spacing="0"/>
                      <portSpacing port="source_through 2" spacing="0"/>
                      <portSpacing port="sink_averagable 1" spacing="0"/>
                      <portSpacing port="sink_averagable 2" spacing="0"/>
                    </process>
                  </operator>
                  <connect from_port="input 1" to_op="Validation (3)" to_port="training"/>
                  <connect from_op="Validation (3)" from_port="averagable 1" to_port="output 1"/>
                  <portSpacing port="source_input 1" spacing="0"/>
                  <portSpacing port="source_input 2" spacing="0"/>
                  <portSpacing port="sink_output 1" spacing="0"/>
                  <portSpacing port="sink_output 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="multiply" compatibility="5.1.008" expanded="true" height="94" name="Multiply" width="90" x="425" y="30"/>
              <connect from_port="input 1" to_op="Select Subprocess" to_port="input 1"/>
              <connect from_op="Select Subprocess" from_port="output 1" to_op="Multiply" to_port="input"/>
              <connect from_op="Multiply" from_port="output 1" to_port="performance"/>
              <connect from_op="Multiply" from_port="output 2" to_port="result 1"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_result 1" spacing="0"/>
              <portSpacing port="sink_result 2" spacing="0"/>
            </process>
          </operator>
          <connect from_port="single" to_op="Loop Parameters" to_port="input 1"/>
          <connect from_op="Loop Parameters" from_port="result 1" to_port="output 1"/>
          <portSpacing port="source_single" spacing="0"/>
          <portSpacing port="sink_output 1" spacing="0"/>
          <portSpacing port="sink_output 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="flatten_collection" compatibility="5.1.008" expanded="true" height="60" name="Flatten Collection" width="90" x="514" y="30"/>
      <connect from_op="Retrieve" from_port="output" to_op="Collect" to_port="input 1"/>
      <connect from_op="Retrieve (2)" from_port="output" to_op="Collect" to_port="input 2"/>
      <connect from_op="Collect" from_port="collection" to_op="Loop Collection" to_port="collection"/>
      <connect from_op="Loop Collection" from_port="output 1" to_op="Flatten Collection" to_port="collection"/>
      <connect from_op="Flatten Collection" from_port="flat" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

With kind regards,
Sebastian Land

lmsasu · April 2011

Thanks, Sebastian.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Weka-style Experimenter"

Answers