Generating new data out of modeling results

Dolf · January 2013

Hi,

I am a new user of Rapidminer and I have less experience in data mining. I am looking for a way to select lets say the best 20 % of a customer base, based on the results of a decision tree, a neural net or something.

Is there an operator which is able to write a new table considering the result of a previous modeling operator? The ideal case would be an operator, which sets something like a scoring attribute so that I can generate a new table and manually select the top rated data sets.

I would be grateful for any help.

Dolf · January 2013

I'll try to exemplify my request. Let's say I have a data table containing several attributes about 1000 customers. Now I would like to know the probability, if they will buy a special product. I want to choose the best 20 % with the best response probability, based on a decision tree.

In cases like this KNIME decision tree gives me the option to append columns with normalized class distribution to the table, so that I can choose my best 20 % out of it. In contrast with KNIME, the decision tree operator of rapidminer delivers only a tree which doesn't solve my problem.

MariusHelf · January 2013

Hi,

in RapidMiner we separate the steps of model creation (training) and model application. From your description it seems that so far you only did the training step which results in a decision tree. Now you can apply that decision on new data.
The result will be an example set with three additional attributes: the prediction (e.g. true or false), and so-called confidences. The confidence is a measure of how sure or confident the model is, that the input data is of a certain class.

Please have a look at the attached process for a basic example of model training and application. In addition to the aforementioned operators, the process uses the Split operator to divide the input data into a set for training and a set for application.

For a deeper understanding of RapidMiner's concept I would like to direct your attention to our video tutorials and other documentation resources on our website at http://rapid-i.com . You'll find all the documentation in the Documentation menu on top of the website.

Best regards,
Marius

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.3.000" expanded="true" name="Process">
    <process expanded="true" height="549" width="567">
      <operator activated="true" class="retrieve" compatibility="5.3.000" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="120">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="split_data" compatibility="5.3.000" expanded="true" height="94" name="Split Data" width="90" x="179" y="120">
        <enumeration key="partitions">
          <parameter key="ratio" value="0.7"/>
          <parameter key="ratio" value="0.3"/>
        </enumeration>
      </operator>
      <operator activated="true" class="decision_tree" compatibility="5.3.000" expanded="true" height="76" name="Decision Tree" width="90" x="313" y="30"/>
      <operator activated="true" class="apply_model" compatibility="5.3.000" expanded="true" height="76" name="Apply Model" width="90" x="447" y="120">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Retrieve Sonar" from_port="output" to_op="Split Data" to_port="example set"/>
      <connect from_op="Split Data" from_port="partition 1" to_op="Decision Tree" to_port="training set"/>
      <connect from_op="Split Data" from_port="partition 2" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Decision Tree" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 1"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="90"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

Dolf · January 2013

Hi Marius,

thank you very much.

I have an another problem. In my example code I discretize attribute 1 creating 10 bins. If I switch to the results view and activate the bars chart with these 10 bins in the x-axis, the bars are listed in alphabetical order (range1, range10, range2...). If I use the replace operator and replace the values manually, e.g. range1 with 01, it replaces also range10 with 010 :-[

Is there a way to put the bars into correct order? Or better: is there a way to name the bins right from the start as I want (as I can do it in KNIME ;D)?

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.008">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.008" expanded="true" name="Process">
    <process expanded="true" height="549" width="681">
      <operator activated="true" class="retrieve" compatibility="5.2.008" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="210">
        <parameter key="repository_entry" value="//Samples/data/Sonar"/>
      </operator>
      <operator activated="true" class="discretize_by_bins" compatibility="5.2.008" expanded="true" height="94" name="Discretize" width="90" x="246" y="210">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="attribute_1"/>
        <parameter key="number_of_bins" value="10"/>
      </operator>
      <connect from_op="Retrieve Sonar" from_port="output" to_op="Discretize" to_port="example set input"/>
      <connect from_op="Discretize" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Generating new data out of modeling results

Answers