Relative Contribution of Variables

DDelen · February 2016

I want to measure the relative contribution of each input variable to the prediction power/accuracy of a model (any classification or regression model). In some commercial tools like SPSS Modeler this is done automatically by a process so called leave-one-out. In each iteration one input variable is left out of the modeling and the model is tested on holdout sample (or via x-validation), the accuracy is recorded (e.g., variable left out = A, accuracy 82%). This process is repeated for each input variable. At the end you have a list of accuracies for each variable's-absence from the model. The lower the accuracy, the higher the contribution/importance of the variable that is left out. Once done, this accuracies can be converted/inversed into relative importance measures (can also be normalized), and shown using a horizontal bar chart illustrating the relative contribution of all variables.

I tried to do this in RapidMiner 7.0 with Loop Attributes note. It did not work! I could not set it up properly because I am not all that familiar with RapidMiner procedures like loop operators. The short descriptions were not sufficient enough for me to understand and use them properly for this process.

Can anyone create a simple process for a small data set like Golf and Decision Trees and X-Validation for the variable contribution procedure I described, and post it here so that we all can learn/benefit from it?

Thank you.

earmijo · February 2016

I'll get you started. The first operator computes the AUC for all variables. Then a Loop-attributes operator drops one by one and computes the AUC again.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.5.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="6.5.002" expanded="true" height="60" name="Golf" width="90" x="45" y="75">
        <parameter key="repository_entry" value="//Samples/data/Golf"/>
      </operator>
      <operator activated="true" class="x_validation" compatibility="6.5.002" expanded="true" height="112" name="Validation" width="90" x="179" y="75">
        <parameter key="use_local_random_seed" value="true"/>
        <process expanded="true">
          <operator activated="true" class="naive_bayes" compatibility="6.5.002" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="30"/>
          <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
          <portSpacing port="source_training" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="6.5.002" expanded="true" height="76" name="Apply Model" width="90" x="45" y="75">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance_binominal_classification" compatibility="6.5.002" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_averagable 1" spacing="0"/>
          <portSpacing port="sink_averagable 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="loop_attributes" compatibility="6.5.002" expanded="true" height="94" name="Loop Attributes" width="90" x="380" y="210">
        <parameter key="attributes" value="Humidity|Temperature|"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="include_special_attributes" value="true"/>
        <process expanded="true">
          <operator activated="true" class="select_attributes" compatibility="6.5.002" expanded="true" height="76" name="Select Attributes" width="90" x="179" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="%{loop_attribute}"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="6.5.002" expanded="true" height="112" name="Validation (2)" width="90" x="380" y="30">
            <parameter key="use_local_random_seed" value="true"/>
            <process expanded="true">
              <operator activated="true" class="naive_bayes" compatibility="6.5.002" expanded="true" height="76" name="Naive Bayes (2)" width="90" x="112" y="30"/>
              <connect from_port="training" to_op="Naive Bayes (2)" to_port="training set"/>
              <connect from_op="Naive Bayes (2)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="6.5.002" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="75">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_binominal_classification" compatibility="6.5.002" expanded="true" height="76" name="Performance (2)" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <connect from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Validation (2)" to_port="training"/>
          <connect from_op="Validation (2)" from_port="averagable 1" to_port="result 1"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_example set" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Golf" from_port="output" to_op="Validation" to_port="training"/>
      <connect from_op="Validation" from_port="training" to_op="Loop Attributes" to_port="example set"/>
      <connect from_op="Validation" from_port="averagable 1" to_port="result 1"/>
      <connect from_op="Loop Attributes" from_port="result 1" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="90"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

DDelen · February 2016

Indeed. This is a very good start. Now all I need to do is to collect the results into a table (and then plot a barchart). Do you know if it is possible? Thank you!

earmijo · February 2016

It is not my best effort but it gets the job done. Other people may suggest ways of simplifying it. Export the results to CSV or Excel and do the bar graphs there.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="6.5.002">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="6.5.002" expanded="true" height="60" name="Retrieve Golf" width="90" x="45" y="30">
        <parameter key="repository_entry" value="Golf"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="6.5.002" expanded="true" height="76" name="AllVariables" width="90" x="179" y="30">
        <process expanded="true">
          <operator activated="true" class="x_validation" compatibility="6.5.002" expanded="true" height="112" name="Validation" width="90" x="45" y="75">
            <parameter key="use_local_random_seed" value="true"/>
            <process expanded="true">
              <operator activated="true" class="naive_bayes" compatibility="6.5.002" expanded="true" height="76" name="Naive Bayes" width="90" x="112" y="30"/>
              <connect from_port="training" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="6.5.002" expanded="true" height="76" name="Apply Model" width="90" x="45" y="75">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_binominal_classification" compatibility="6.5.002" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="performance_to_data" compatibility="6.5.002" expanded="true" height="76" name="Performance to Data" width="90" x="179" y="210"/>
          <operator activated="true" class="generate_empty_attribute" compatibility="6.5.002" expanded="true" height="76" name="Generate Empty Attribute" width="90" x="313" y="210">
            <parameter key="name" value="nombre"/>
            <parameter key="value_type" value="text"/>
          </operator>
          <operator activated="true" class="set_data" compatibility="6.5.002" expanded="true" height="76" name="Set Data" width="90" x="447" y="210">
            <parameter key="example_index" value="1"/>
            <parameter key="attribute_name" value="nombre"/>
            <parameter key="value" value="All Variables"/>
            <list key="additional_values"/>
          </operator>
          <operator activated="true" class="remember" compatibility="6.5.002" expanded="true" height="60" name="Remember" width="90" x="514" y="120">
            <parameter key="name" value="performances"/>
          </operator>
          <connect from_port="in 1" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="training" to_port="out 1"/>
          <connect from_op="Validation" from_port="averagable 1" to_op="Performance to Data" to_port="performance vector"/>
          <connect from_op="Performance to Data" from_port="example set" to_op="Generate Empty Attribute" to_port="example set input"/>
          <connect from_op="Generate Empty Attribute" from_port="example set output" to_op="Set Data" to_port="example set input"/>
          <connect from_op="Set Data" from_port="example set output" to_op="Remember" to_port="store"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="loop_attributes" compatibility="6.5.002" expanded="true" height="76" name="Loop Attributes" width="90" x="313" y="30">
        <parameter key="attributes" value="Humidity|Temperature|"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="include_special_attributes" value="true"/>
        <process expanded="true">
          <operator activated="true" class="select_attributes" compatibility="6.5.002" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="%{loop_attribute}"/>
            <parameter key="invert_selection" value="true"/>
          </operator>
          <operator activated="true" class="recall" compatibility="6.5.002" expanded="true" height="60" name="Recall" width="90" x="313" y="30">
            <parameter key="name" value="performances"/>
          </operator>
          <operator activated="true" class="x_validation" compatibility="6.5.002" expanded="true" height="112" name="Validation (2)" width="90" x="179" y="30">
            <parameter key="use_local_random_seed" value="true"/>
            <process expanded="true">
              <operator activated="true" class="naive_bayes" compatibility="6.5.002" expanded="true" height="76" name="Naive Bayes (2)" width="90" x="112" y="30"/>
              <connect from_port="training" to_op="Naive Bayes (2)" to_port="training set"/>
              <connect from_op="Naive Bayes (2)" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="6.5.002" expanded="true" height="76" name="Apply Model (2)" width="90" x="45" y="75">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance_binominal_classification" compatibility="6.5.002" expanded="true" height="76" name="Performance (2)" width="90" x="179" y="30"/>
              <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
              <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
              <connect from_op="Performance (2)" from_port="performance" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="performance_to_data" compatibility="6.5.002" expanded="true" height="76" name="Performance to Data (2)" width="90" x="45" y="165"/>
          <operator activated="true" class="generate_empty_attribute" compatibility="6.5.002" expanded="true" height="76" name="Generate Empty Attribute (2)" width="90" x="179" y="165">
            <parameter key="name" value="nombre"/>
            <parameter key="value_type" value="text"/>
          </operator>
          <operator activated="true" class="set_data" compatibility="6.5.002" expanded="true" height="76" name="Set Data (2)" width="90" x="313" y="165">
            <parameter key="example_index" value="1"/>
            <parameter key="attribute_name" value="nombre"/>
            <parameter key="value" value="%{loop_attribute}"/>
            <list key="additional_values"/>
          </operator>
          <operator activated="true" class="union" compatibility="6.5.002" expanded="true" height="76" name="Union" width="90" x="447" y="120"/>
          <operator activated="true" class="remember" compatibility="6.5.002" expanded="true" height="60" name="Remember (2)" width="90" x="581" y="120">
            <parameter key="name" value="performances"/>
          </operator>
          <connect from_port="example set" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Validation (2)" to_port="training"/>
          <connect from_op="Recall" from_port="result" to_op="Union" to_port="example set 1"/>
          <connect from_op="Validation (2)" from_port="averagable 1" to_op="Performance to Data (2)" to_port="performance vector"/>
          <connect from_op="Performance to Data (2)" from_port="example set" to_op="Generate Empty Attribute (2)" to_port="example set input"/>
          <connect from_op="Generate Empty Attribute (2)" from_port="example set output" to_op="Set Data (2)" to_port="example set input"/>
          <connect from_op="Set Data (2)" from_port="example set output" to_op="Union" to_port="example set 2"/>
          <connect from_op="Union" from_port="union" to_op="Remember (2)" to_port="store"/>
          <portSpacing port="source_example set" spacing="0"/>
          <portSpacing port="sink_example set" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="recall" compatibility="6.5.002" expanded="true" height="60" name="Recall (2)" width="90" x="447" y="30">
        <parameter key="name" value="performances"/>
      </operator>
      <connect from_op="Retrieve Golf" from_port="output" to_op="AllVariables" to_port="in 1"/>
      <connect from_op="AllVariables" from_port="out 1" to_op="Loop Attributes" to_port="example set"/>
      <connect from_op="Recall (2)" from_port="result" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

DDelen · February 2016

This is great! Thank you.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Relative Contribution of Variables

Answers