The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"De-normalizing K-means Centroids"
DancingSheep
Member Posts: 9 Contributor II
Hello,
I'm using k-means and I've got a problem.
I need to cluster some data after normalizing it, but then I would like to see the centroids as if they were from the de-normalized set.
I've already seen this topic http://rapid-i.com/rapidforum/index.php/topic,3613.msg13557.html#msg13557, but it doesn't work. While the data set gets de-normalized, the centroids stay the same.
Here is my code so far:
I'm using k-means and I've got a problem.
I need to cluster some data after normalizing it, but then I would like to see the centroids as if they were from the de-normalized set.
I've already seen this topic http://rapid-i.com/rapidforum/index.php/topic,3613.msg13557.html#msg13557, but it doesn't work. While the data set gets de-normalized, the centroids stay the same.
Here is my code so far:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
<process expanded="true" height="521" width="681">
<operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
<parameter key="csv_file" value="/Users/GiO/Desktop/csv/AlmostFull.csv"/>
<parameter key="column_separators" value=","/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.1.006" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
<parameter key="name" value="favgame"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.1.006" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="value_type"/>
<parameter key="value_type" value="polynominal"/>
<parameter key="invert_selection" value="true"/>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="5.1.006" expanded="true" height="94" name="Replace Missing Values" width="90" x="447" y="30">
<parameter key="attribute_filter_type" value="no_missing_values"/>
<parameter key="invert_selection" value="true"/>
<parameter key="default" value="zero"/>
<list key="columns"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.1.006" expanded="true" height="94" name="Normalize" width="90" x="45" y="165"/>
<operator activated="true" class="k_means" compatibility="5.1.006" expanded="true" height="76" name="Clustering" width="90" x="179" y="165">
<parameter key="k" value="3"/>
</operator>
<operator activated="true" class="denormalize" compatibility="5.1.006" expanded="true" height="76" name="De-Normalize" width="90" x="45" y="300"/>
<operator activated="true" class="apply_model" compatibility="5.1.006" expanded="true" height="76" name="Apply Model" width="90" x="313" y="300">
<list key="application_parameters"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Normalize" from_port="preprocessing model" to_op="De-Normalize" to_port="model input"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="De-Normalize" from_port="model output" to_op="Apply Model" to_port="model"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="252"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
Extract the cluster centroids and then apply the de-normalising step to the output from this.
regards
Andrew
Could you provide the code for the new connections?
Thanks
Andrew
Thanks for your help!
@awchisholm
I have similar queries. I use normalization operator and got the decent regression model performance. when i de-normalized it and check the performance, i found, it is same as model trained without using normalization operator. I think ideally it should give better result than the model trained without normalization.