The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

How to apply a Score dataset with no Target values to a model in RapidMiner

SeyhanSeyhan Member Posts: 6 Contributor II
edited November 2018 in Help
Hi All,

I was wondering, if anybody know how to apply a score dataset to a data mining model in RapidMiner.

I can easy use train and test datasets for classification accuracy of a data mining model.

But I do not know how a score dataset  with no values of target attribute.

I have a score dataset but do not know if there is a way to apply the score dataset to a DM model in Rapid. Unfortunately, I can not skip the score dataset part and must use it to make sure the model I created works well.

Regards,

Seyhan
:-\

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    this should be quite simple: As long as the score dataset consists of the same regular attributes (special attributes like the label are not needed) you can simply feed it into a apply model operator. If you load your previously trained model into this operator, too, it will calculate the scores for you.
    Here's a little example:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="370" width="614">
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Training Data" width="90" x="45" y="30"/>
          <operator activated="true" class="generate_data" expanded="true" height="60" name="Score Data" width="90" x="112" y="255"/>
          <operator activated="true" class="linear_regression" expanded="true" height="76" name="Linear Regression" width="90" x="246" y="30"/>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="514" y="30">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Training Data" from_port="output" to_op="Linear Regression" to_port="training set"/>
          <connect from_op="Score Data" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Linear Regression" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
      Sebastian
  • SeyhanSeyhan Member Posts: 6 Contributor II
    Thanks.

    But I still could not run it xml you added. I am not an expert on RapidMiner and do not know half. But I added the xml of my model.

    I will appreciate if you let me where to add score into the model applier, since there is no subsection of the applier.

    Regards,

    Seyhan

    Code

    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="C:\PAKDD2010\sample_modelV3.csv"/>
            <parameter key="label_name" value="TARGET_LABEL"/>
        </operator>
        <operator name="Bootstrapping" class="Bootstrapping">
        </operator>
        <operator name="Nominal2Numerical" class="Nominal2Numerical">
        </operator>
        <operator name="XValidation" class="XValidation" expanded="yes">
            <operator name="AdaBoost" class="AdaBoost" expanded="yes">
                <operator name="KernelNaiveBayes" class="KernelNaiveBayes">
                </operator>
            </operator>
            <operator name="OperatorChain" class="OperatorChain" expanded="yes">
                <operator name="ModelApplier" class="ModelApplier">
                    <parameter key="keep_model" value="true"/>
                    <list key="application_parameters">
                    </list>
                    <parameter key="create_view" value="true"/>
                </operator>
                <operator name="Performance" class="Performance">
                </operator>
                <operator name="ResultWriter" class="ResultWriter">
                    <parameter key="result_file" value="G:\Rapping\model_results.csv"/>
                </operator>
            </operator>
        </operator>
    </operator>
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    here's your modified process. You can use the model output of the XValidation to get a model trained on the complete data set that was forwarded into the X-Validation.
    Please take a look at the XValidation and Adaboost. You forgot to connect the outputs of the operators with the subprocesses endpoints.

    I would strongly recommend to take a look at all the sample processes delivered with RapidMiner and the videos linked on our website to get an understanding how RapidMiner works.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Root">
        <process expanded="true" height="460" width="966">
          <operator activated="true" class="read_csv" expanded="true" height="60" name="CSVExampleSource" width="90" x="45" y="30"/>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="180" y="30">
            <parameter key="name" value="TARGET_LABEL"/>
            <parameter key="target_role" value="label"/>
          </operator>
          <operator activated="true" class="sample_bootstrapping" expanded="true" height="76" name="Bootstrapping" width="90" x="315" y="30"/>
          <operator activated="true" class="nominal_to_numerical" expanded="true" height="94" name="Nominal2Numerical" width="90" x="450" y="30"/>
          <operator activated="true" class="x_validation" expanded="true" height="112" name="XValidation" width="90" x="585" y="30">
            <process expanded="true" height="460" width="458">
              <operator activated="true" class="adaboost" expanded="true" height="76" name="AdaBoost" width="90" x="45" y="30">
                <process expanded="true" height="460" width="966">
                  <operator activated="true" class="naive_bayes_kernel" expanded="true" height="76" name="KernelNaiveBayes" width="90" x="45" y="30"/>
                  <connect from_port="training set" to_op="KernelNaiveBayes" to_port="training set"/>
                  <connect from_op="KernelNaiveBayes" from_port="model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                </process>
              </operator>
              <connect from_port="training" to_op="AdaBoost" to_port="training set"/>
              <connect from_op="AdaBoost" from_port="model" to_port="model"/>
              <portSpacing port="source_training" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_through 1" spacing="0"/>
            </process>
            <process expanded="true" height="460" width="458">
              <operator activated="true" class="apply_model" expanded="true" height="76" name="ModelApplier" width="90" x="45" y="30">
                <list key="application_parameters"/>
                <parameter key="create_view" value="true"/>
              </operator>
              <operator activated="true" class="performance" expanded="true" height="76" name="Performance" width="90" x="180" y="30"/>
              <operator activated="true" class="write_as_text" expanded="true" height="76" name="ResultWriter" width="90" x="319" y="30">
                <parameter key="result_file" value="G:\Rapping\model_results.csv"/>
              </operator>
              <connect from_port="test set" to_op="ModelApplier" to_port="unlabelled data"/>
              <connect from_op="ModelApplier" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
              <connect from_op="Performance" from_port="performance" to_op="ResultWriter" to_port="input 1"/>
              <connect from_op="ResultWriter" from_port="input 1" to_port="averagable 1"/>
              <portSpacing port="source_model" spacing="0"/>
              <portSpacing port="source_test set" spacing="0"/>
              <portSpacing port="source_through 1" spacing="0"/>
              <portSpacing port="sink_averagable 1" spacing="0"/>
              <portSpacing port="sink_averagable 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="read_csv" expanded="true" height="60" name="Score Set" width="90" x="45" y="165"/>
          <operator activated="true" class="apply_model" expanded="true" height="76" name="Apply Model" width="90" x="782" y="165">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="CSVExampleSource" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Bootstrapping" to_port="example set input"/>
          <connect from_op="Bootstrapping" from_port="example set output" to_op="Nominal2Numerical" to_port="example set input"/>
          <connect from_op="Nominal2Numerical" from_port="example set output" to_op="XValidation" to_port="training"/>
          <connect from_op="XValidation" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="XValidation" from_port="averagable 1" to_port="result 1"/>
          <connect from_op="Score Set" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
    Greetings,
     Sebastian
Sign In or Register to comment.