The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
K-Means and Optimizing K
Dear All,
I tried to find something similar in example setups but didn't find something similar.
I am trying to figure out how to perform optimization of K-Means (finding the optimal number of k) through cross-validation. I tried using an XValidation operator but i cannot get it to work. Here is my setup which i wish to change :
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="/data-binary.csv"/>
<parameter key="label_name" value="class"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="KMeans" class="KMeans">
<parameter key="k" value="12"/>
<parameter key="max_runs" value="50"/>
<parameter key="max_optimization_steps" value="500"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="8"/>
</operator>
<operator name="ClusterModelWriter" class="ClusterModelWriter">
<parameter key="cluster_model_file" value="/models/clusterout.clm"/>
</operator>
<operator name="ClusterCentroidEvaluator" class="ClusterCentroidEvaluator">
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
<operator name="ClusterModelReader" class="ClusterModelReader">
<parameter key="cluster_model_file" value="/models/clusterout.clm"/>
</operator>
</operator>
Could someone please help?
I tried to find something similar in example setups but didn't find something similar.
I am trying to figure out how to perform optimization of K-Means (finding the optimal number of k) through cross-validation. I tried using an XValidation operator but i cannot get it to work. Here is my setup which i wish to change :
<operator name="Root" class="Process" expanded="yes">
<operator name="CSVExampleSource" class="CSVExampleSource">
<parameter key="filename" value="/data-binary.csv"/>
<parameter key="label_name" value="class"/>
</operator>
<operator name="CorrelationMatrix" class="CorrelationMatrix">
</operator>
<operator name="OperatorChain" class="OperatorChain" expanded="yes">
<operator name="KMeans" class="KMeans">
<parameter key="k" value="12"/>
<parameter key="max_runs" value="50"/>
<parameter key="max_optimization_steps" value="500"/>
<parameter key="use_local_random_seed" value="true"/>
<parameter key="local_random_seed" value="8"/>
</operator>
<operator name="ClusterModelWriter" class="ClusterModelWriter">
<parameter key="cluster_model_file" value="/models/clusterout.clm"/>
</operator>
<operator name="ClusterCentroidEvaluator" class="ClusterCentroidEvaluator">
<parameter key="keep_example_set" value="true"/>
</operator>
</operator>
<operator name="ClusterModelReader" class="ClusterModelReader">
<parameter key="cluster_model_file" value="/models/clusterout.clm"/>
</operator>
</operator>
Could someone please help?
0
Answers
the problem is, that unsupervised learning can't really do any performance estimation. That's why it's called unsupervised: We simply don't know what's the true solution. So we cannot compare a clustering to another and say: Hey, that's one the true and the other ons is rubbish.
That's why you are running into problems.
But there are existing some measures which are heuristics for the goodness of clustering, but keep in mind, that heuristics may lead to non optimal solutions. You can enter these heuristics as you enter performance evaluators of regression and classification.
Here's a small sample example for RapidMiner 5.0, that will show you how this works and that heuristics may fail:
Greetings,
Sebastian