The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
KMeans - different output on the same data
Hi,
Please find the below code:
If I re-run the code, the same examplesets are generated with the same cluster centroids and groupings for the first and second examplesets. Why is is this? I then clicked the option, "use_local_random_seed" in KMeans. This made the cluster centroids and grouping look identical for both the data.
Questions:
1. What actually happens by the usage of "use_local_random_seed"?
2. The cluster centroids and groupings of the first and second examplesets are always the same irrespective of how many times we run it. But the KMeans applied on the same data in a single run is always different. Does this mean, RM when detects a KMeans operator for the first time applies a seed "A" and for the second time "B" always?
3. How do we choose the "use_local_random_seed"? What are its minimum and maximum values?
4. For simplicity, one can consider the below code too.
Many Thanks,
Shubha.
Please find the below code:
<operator name="Root" class="Process" expanded="yes">I have generated a dataset(required for KMeans) and made a copy of the same, so now I have 2 copies of the same dataset. I then apply the "KMeans" operator on both the examplesets. But the cluster centroids and also the cluster groupings are different for two examplesets. Why is this? Is it dependent on some seed value?
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="sum classification"/>
<parameter key="number_examples" value="50"/>
</operator>
<operator name="IOMultiplier" class="IOMultiplier">
<parameter key="io_object" value="ExampleSet"/>
</operator>
<operator name="KMeans" class="KMeans">
</operator>
<operator name="IOSelector" class="IOSelector">
<parameter key="io_object" value="ExampleSet"/>
<parameter key="select_which" value="2"/>
</operator>
<operator name="KMeans (2)" class="KMeans">
</operator>
</operator>
If I re-run the code, the same examplesets are generated with the same cluster centroids and groupings for the first and second examplesets. Why is is this? I then clicked the option, "use_local_random_seed" in KMeans. This made the cluster centroids and grouping look identical for both the data.
Questions:
1. What actually happens by the usage of "use_local_random_seed"?
2. The cluster centroids and groupings of the first and second examplesets are always the same irrespective of how many times we run it. But the KMeans applied on the same data in a single run is always different. Does this mean, RM when detects a KMeans operator for the first time applies a seed "A" and for the second time "B" always?
3. How do we choose the "use_local_random_seed"? What are its minimum and maximum values?
4. For simplicity, one can consider the below code too.
<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSetGenerator" class="ExampleSetGenerator">
<parameter key="target_function" value="sum classification"/>
<parameter key="number_examples" value="50"/>
</operator>
<operator name="KMeans" class="KMeans" breakpoints="before">
</operator>
<operator name="KMeans (2)" class="KMeans" breakpoints="before">
</operator>
</operator>
Many Thanks,
Shubha.
0
Answers
I encountered another problem with respect to KMedoids.
The code below: The operators, "KMedoids", "KMedoids (2)", "KMedoids (3)" and "KMedoids (4)" all have same options, though "KMedoids" behave different than the other KMedoids operators.
"KMedoids" have centroid value of Cluster 0 as The "KMedoids (2)", "KMedoids (3)" and "KMedoids (4)" has centroid values of cluster 0 as: After "KMedoids (4)", I tried introducing other KMedoids operators, still the same centorid vaues of "KMedoids (2)" prevail.
So, then I introduced "KMedoids (5)" by COPYING "KMedoids" operator. Then to my surprise, the centroid values of cluster 0 is same as "KMedoids", instead of "KMedoids (2)"....
But identically "KMedoids", "KMedoids (2)", "KMedoids (3)", "KMedoids (4)" and "KMedoids (5)" are all the same with local random seed -1.
Thanks,
Shubha
Thanks,
Shubha.