"Help - Clustering?"

JEdward · May 2011

I'm very new to this datamining lark so apologies in advance.

I have a example set containing only "yes" data & I have been asked to score records in a new example set based on their similarity to records in the "yes" set. ??? - I don't really know what I'm doing, but I have a feeling clustering might be involved somehow. So far though all I have done is create clusters using the "yes" set and then labelled the new records with a prediction on which cluster they would fall into.
Not quite what I'm after; the desired result is to give each record a label from 1 to 10 indicating how close that record is a match it is to the "yes" set.

Any pointers would be appreciated.
Thanks,
JEdward

IngoRM · May 2011

Hi,

well, this sounds (if I got it right) like a scenario where a 1-class modeling might be most appropriate. You could try the 1-class SVM offered by RapidMiner. First you model the "yes"-data set and afterwards you just apply the trained model on your prediction data set. Afterwards you can rescale the predictions from [0-1] to [1-10] and round it to integers. That's it.

Cheers,
Ingo

JEdward · May 2011

Thanks Ingo,
That sounds exactly what I'm looking for, I'll give it a try.

JEdward.

JEdward · May 2011

Hello,

On trying to store the labelled data to the repository I receive a 'ConcurrentModificationException' error.
I think this is caused by the ApplyModel process creating two special attributes 'confidence(inside)' and 'prediction(LabelT)' as this is the only thing that changes between the original dataset.

Can anyone point me in the right direction to resolve this?

Thanks,
JEdward.

land · May 2011

Hi,
please post the process as well as the stack trace for this exception. We will see if we can help you.

Greetings,
Sebastian

JEdward · May 2011

Hi Sebastian,

Here's the process attached.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
  <context>
    <input>
      <location>ProcessApplic</location>
      <location>Model</location>
    </input>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <process expanded="true" height="612" width="710">
      <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="246" y="210">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.1.006" expanded="true" height="60" name="Store" width="90" x="447" y="165">
        <parameter key="repository_entry" value="LabelledData"/>
      </operator>
      <connect from_port="input 1" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_port="input 2" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Store" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="source_input 2" spacing="0"/>
      <portSpacing port="source_input 3" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Not sure what you mean by stack trace. Is this it? (copied from the log window).


May 20, 2011 10:25:12 AM INFO: Process //RapidMinerLocalRepository/Process/3_ApplyModel starts
May 20, 2011 10:26:57 AM SEVERE: Process failed: Cannot store data in repository at entry 'LabelledData'. Reason: Cannot store data at 'U:\RapidMinerRepository\Process\LabelledData.ioo': java.util.ConcurrentModificationException.
May 20, 2011 10:26:57 AM SEVERE: Here:           Process[1] (Process)
           subprocess 'Main Process'
             +- Apply Model[1] (Apply Model)
       ==>   +- Store[1] (Store)

Thanks,
JEdward

JEdward · May 2011

Hi,

I have solved the problem by changing the process to rename the attribute confidence(inside).

Could it be that the brackets in the name that caused the store operator problems? I had to write the field names into the Rename & SelectAttributes operators because they are not available from the menus & drop down lists after being created by Apply Model.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
  <context>
    <input>
      <location>ProcessApplic</location>
      <location>Model</location>
    </input>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
    <process expanded="true" height="612" width="710">
      <operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="112" y="165">
        <list key="application_parameters"/>
      </operator>
      <operator activated="true" class="rename" compatibility="5.1.006" expanded="true" height="76" name="Rename (2)" width="90" x="246" y="210">
        <parameter key="old_name" value="confidence(inside)"/>
        <parameter key="new_name" value="confidence"/>
        <list key="rename_additional_attributes"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="5.1.006" expanded="true" height="76" name="Select Attributes" width="90" x="447" y="255">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="prediction(LabelT)"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="store" compatibility="5.1.006" expanded="true" height="60" name="Store" width="90" x="581" y="210">
        <parameter key="repository_entry" value="LabelledData"/>
      </operator>
      <connect from_port="input 1" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_port="input 2" to_op="Apply Model" to_port="model"/>
      <connect from_op="Apply Model" from_port="labelled data" to_op="Rename (2)" to_port="example set input"/>
      <connect from_op="Rename (2)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Store" to_port="input"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="source_input 2" spacing="0"/>
      <portSpacing port="source_input 3" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
    </process>
  </operator>
</process>

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Help - Clustering?"

Answers