The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"How to get the standard deviation from clustered data?"

stever1kstever1k Member Posts: 10 Contributor II
edited May 2019 in Help
Hi,

after clustering my data, the data has the following format:

id A B C Cluster
a x y z  0
.. .... .... 1
.. .... .... 1
.. .... .... 2
.. .... .... 0
.. .... ....
.. .... .... N

So the cluster algorithm found several clusters and created a new column with the attribute cluster. I now want to calculate the standard deviation for Cluster 0 for the attributes A B and C, the same for cluster 1 up to N. Any ideas how this works?

cordially,
Stever
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Stever,
    this is a typical situation for using the aggregation operator. You can group the examples by the cluster and then calculate a aggregation function over each attribute. I have done this in this process:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="gaussian mixture clusters"/>
        </operator>
        <operator name="KMeans" class="KMeans">
            <parameter key="k" value="3"/>
        </operator>
        <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
            <parameter key="name" value="cluster"/>
        </operator>
        <operator name="Aggregation" class="Aggregation">
            <list key="aggregation_attributes">
              <parameter key="att1" value="standard_deviation"/>
              <parameter key="att2" value="standard_deviation"/>
              <parameter key="att3" value="standard_deviation"/>
              <parameter key="att4" value="standard_deviation"/>
              <parameter key="att5" value="standard_deviation"/>
            </list>
            <parameter key="group_by_attributes" value="cluster"/>
        </operator>
    </operator>
    It should be easy to adapt it onto your needs.

    Greetings,
      Sebastian
  • stever1kstever1k Member Posts: 10 Contributor II
    thanks a lot Sebastian, that is EXACTLY what I'm looking for. My problem was, that I was searching for suitable operator inside the preprocession->attributres tree instead of the olap!

    best regards,
    Stever
Sign In or Register to comment.