Extract example set from Crossvalidation operator?

Fred12 · September 2016

hi

I want to extract the example set from training and testdata separately from the inner cross-validation operator process.. is that somehow possible? because I have only performance vector outputs...

Telcontar120 · September 2016

I don't believe this is possible. But I'm also not sure what you really intended by this request, because by definition, k-fold cross-validation requires that every example will appear once in a test dataset (and the other k-1 times it appears in the training sets).

As you already know, the model produced by cross-validation is based on the entire dataset. The cross-validation procedure is simply designed to estimate how the model might perform on unseen data in a more statistically robust way than the older approach of a static two-way split into a training versus testing set. So why would you need to extract the specific example sets used in cross-validation? The entire dataset is ultimately used both for training and testing in cross-validation.

If you really need to do this, then I think you are going to have to set up a kind of manual cross-validation by creating static segments and then building the model and running the test statistics on each segment separately using loops. But it seems like a lot of effort to build manually what cross-validation already does automatically.

bhupendra_patil · September 2016

you add store on the training and test side, and then some macro logic.

Not sure why would you want to store it, but hopefully this attached example will give some ideas

earmijo · September 2016

May I ask why do you want to do that?

1) Because I want to use the predictions

Then you can use the operator X-Prediction.

2) Because you want to something else

A possibility here is to define yourself the k-different samples outside Rapidminer and then define a Batch Variable. After that you can use Batch-X-VAlidation

MartinLiebig · September 2016

Hi,

a few remarks:

RapidMiner has two operators, the operator X-Validation to get a performance and the operator X-Prediction to get a scored sample. Sadly there is no built in operator to do both things at once. I am using the attached building block for this

Why shouldn't i do this? Well, to be honest it is very dangerous to do this. People tend to have a look at the scored data set and built new variables which solve issues with single examples. This is obvious overtraining by hand and should be treated with care or should better be avoided.

Why should i do this? Well, i am personally using it in regression problems to get a scatterplot true vs predicted. In this scatter plot you can see biases or biases in some regions, nonlinearities and so on. This is I think very useful.

~Martin

<?xml version="1.0" encoding="UTF-8"?><process version="7.2.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.2.001" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="7.2.001" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="subprocess" compatibility="7.2.001" expanded="true" height="124" name="X-Val with X-Pred" width="90" x="313" y="30">
        <process expanded="true">
          <operator activated="true" class="x_validation" compatibility="7.2.001" expanded="true" height="112" name="Validation" width="90" x="45" y="30">
            <parameter key="sampling_type" value="shuffled sampling"/>
            <process expanded="true">
              <operator activated="true" class="linear_regression" compatibility="7.2.001" expanded="true" height="94" name="Linear Regression" width="90" x="45" y="30"/>
              <connect from_port="training" to_op="Linear Regression" to_port="training set"/>
              <connect from_op="Linear Regression" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true">
              <operator activated="true" class="apply_model" compatibility="7.1.001" expanded="true" height="76" name="Apply Model" width="90" x="45" y="30">
                <list key="application_parameters"/>
              </operator>
              <operator activated="true" class="performance" compatibility="7.2.001" expanded="true" height="76" name="Performance" width="90" x="179" y="30"/>
              <operator activated="true" class="handle_exception" compatibility="7.2.001" expanded="true" height="76" name="Handle Exception" width="90" x="179" y="165">
                <process expanded="true">
                  <operator activated="true" class="recall" compatibility="7.2.001" expanded="true" height="60" name="Recall" width="90" x="112" y="30">
                    <parameter key="name" value="labeledData"/>
                  </operator>
                  <operator activated="true" class="append" compatibility="7.2.001" expanded="true" height="94" name="Append" width="90" x="246" y="120"/>
                  <connect from_port="in 1" to_op="Append" to_port="example set 2"/>
                  <connect from_op="Recall" from_port="result" to_op="Append" to_port="example set 1"/>
                  <connect from_op="Append" from_port="merged set" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="72"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
                <process expanded="true">
                  <connect from_port="in 1" to_port="out 1"/>
                  <portSpacing port="source_in 1" spacing="0"/>
                  <portSpacing port="source_in 2" spacing="0"/>
                  <portSpacing port="sink_out 1" spacing="0"/>
                  <portSpacing port="sink_out 2" spacing="0"/>
                </process>
              </operator>
              <operator activated="true" class="remember" compatibility="7.2.001" expanded="true" height="60" name="Remember" width="90" x="313" y="165">
                <parameter key="name" value="labeledData"/>
              </operator>
              <connect from_port="model" to_op="Apply Model" to_port="model"/>
              <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
              <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
              <connect from_op="Performance" from_port="example set" to_op="Handle Exception" to_port="in 1"/>
              <connect from_op="Handle Exception" from_port="out 1" to_op="Remember" to_port="store"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">A cross-validation evaluating a linear regression model.</description>
          </operator>
          <operator activated="true" class="recall" compatibility="7.2.001" expanded="true" height="60" name="Recall (2)" width="90" x="179" y="120">
            <parameter key="name" value="labeledData"/>
          </operator>
          <connect from_port="in 1" to_op="Validation" to_port="training"/>
          <connect from_op="Validation" from_port="model" to_port="out 1"/>
          <connect from_op="Validation" from_port="averagable 1" to_port="out 2"/>
          <connect from_op="Recall (2)" from_port="result" to_port="out 3"/>
          <portSpacing port="source_in 1" spacing="0"/>
          <portSpacing port="source_in 2" spacing="0"/>
          <portSpacing port="sink_out 1" spacing="0"/>
          <portSpacing port="sink_out 2" spacing="0"/>
          <portSpacing port="sink_out 3" spacing="0"/>
          <portSpacing port="sink_out 4" spacing="0"/>
        </process>
      </operator>
      <connect from_op="Retrieve Sonar" from_port="output" to_op="X-Val with X-Pred" to_port="in 1"/>
      <connect from_op="X-Val with X-Pred" from_port="out 2" to_port="result 1"/>
      <connect from_op="X-Val with X-Pred" from_port="out 3" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Extract example set from Crossvalidation operator?

Answers