The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Calculating K-means Performance"
DancingSheep
Member Posts: 9 Contributor II
Hello,
I have the following flow which uses a simple k-means and works perfectly (for my purpose at least!).
I have no idea if I'm doing something wrong or if that's the supposed result. Could you give any suggestions?
Thanks
EDIT: I couldn't care less for sorting the result anymore, feel free to delete it when working with my code.
I have the following flow which uses a simple k-means and works perfectly (for my purpose at least!).
<?xml version="1.0" encoding="UTF-8" standalone="no"?>Now I'd like to calculate its performance, but connecting Cluster Count Performance to Extract Cluster Prototypes' exa port gave a -0.000 for every cluster.
<process version="5.1.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.006" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="parallelize_main_process" value="false"/>
<process expanded="true" height="521" width="681">
<operator activated="true" class="read_csv" compatibility="5.1.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
<parameter key="csv_file" value="/Users/GiO/Desktop/csv/AlmostFull.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="trim_lines" value="false"/>
<parameter key="use_quotes" value="true"/>
<parameter key="quotes_character" value="""/>
<parameter key="escape_character_for_quotes" value="\"/>
<parameter key="skip_comments" value="false"/>
<parameter key="comment_characters" value="#"/>
<parameter key="parse_numbers" value="true"/>
<parameter key="decimal_character" value="."/>
<parameter key="grouped_digits" value="false"/>
<parameter key="grouping_character" value=","/>
<parameter key="date_format" value=""/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations"/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<parameter key="encoding" value="SYSTEM"/>
<list key="data_set_meta_data_information"/>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.1.006" expanded="true" height="76" name="Set Role" width="90" x="179" y="30">
<parameter key="name" value="favgame"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="5.1.006" expanded="true" height="76" name="Select Attributes" width="90" x="313" y="30">
<parameter key="attribute_filter_type" value="value_type"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="polynominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="5.1.006" expanded="true" height="94" name="Replace Missing Values" width="90" x="447" y="30">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="no_missing_values"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="true"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default" value="zero"/>
<list key="columns"/>
</operator>
<operator activated="true" class="k_means" compatibility="5.1.006" expanded="true" height="76" name="Clustering" width="90" x="246" y="390">
<parameter key="add_cluster_attribute" value="true"/>
<parameter key="add_as_label" value="false"/>
<parameter key="remove_unlabeled" value="false"/>
<parameter key="k" value="3"/>
<parameter key="max_runs" value="10"/>
<parameter key="max_optimization_steps" value="100"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
<operator activated="true" class="extract_prototypes" compatibility="5.1.006" expanded="true" height="76" name="Extract Cluster Prototypes" width="90" x="380" y="300"/>
<operator activated="true" class="sort" compatibility="5.1.006" expanded="true" height="76" name="Sort" width="90" x="380" y="390">
<parameter key="attribute_name" value="cluster"/>
<parameter key="sorting_direction" value="increasing"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_op="Extract Cluster Prototypes" to_port="model"/>
<connect from_op="Clustering" from_port="clustered set" to_op="Sort" to_port="example set input"/>
<connect from_op="Extract Cluster Prototypes" from_port="example set" to_port="result 1"/>
<connect from_op="Extract Cluster Prototypes" from_port="model" to_port="result 2"/>
<connect from_op="Sort" from_port="example set output" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="270"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="54"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
I have no idea if I'm doing something wrong or if that's the supposed result. Could you give any suggestions?
Thanks
EDIT: I couldn't care less for sorting the result anymore, feel free to delete it when working with my code.
Tagged:
0
Answers
You can use the "map clustering on labels" operator to see how close the clusters are to what they should be. You can then feed the result to a performance operator to get a confusion matrix. Cluster count performance should be used before extracting prototypes to count clusters but in this case, it will always return the value of k in the k-means operator.
Here's an example using the iris data set. regards
Andrew
Now, I have an issue with Map Clustering on Labels: I have 7 possible labels and only k = 3 (I'd like to try 3 <= k <= 6). Any solution for this?
EDIT: Wait! I might have found what I was looking for... Time to check!
I need to check the performance of cluster prediction; here is what I made: It doesn't work, but I don't understand the error.
Difficult to tell but it's probably because you are setting the cluster to a label.
Delete the "set role(2)" operator
regards
Andrew
There is already a label - favgame. The clustering algorithm creates clusters and the "Map clustering on labels" operator tries to map the labels to the clusters. If there is no attribute with the role cluster, it cannot work.
regards
Andrew