💬0 Comments

🔥0 Discussions

👤0 Members

🔌0 Online

The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

How to evaluate the best algorithm preform clusters?

halaalrobassy

halaalrobassy Member Posts: 16

Contributor II

April 2019 edited June 2019 in Help

i have dataset and i want to cluster one feature to three clusters, i choose kmeans, kmedoid and xmean algorithms to preform this clustering then i want to evaluate which algorithm will perform better clustering.
i put the three algorithms in loop parameter but i couldn't know where can i put the cluster distance performance operator . i want to see the avg centroid and Davis bouldin measures for each model and according to them then choose the bset model will perform the best clustering .

<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">

<context>

<input/>

<output/>

<macros/>

</context>

<operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Process">

<parameter key="logverbosity" value="init"/>

<parameter key="random_seed" value="2001"/>

<parameter key="send_mail" value="never"/>

<parameter key="notification_email" value=""/>

<parameter key="process_duration_for_mail" value="30"/>

<parameter key="encoding" value="SYSTEM"/>

<process expanded="true">

<operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Engineering_majors1" width="90" x="45" y="34">

<parameter key="repository_entry" value="../data/Engineering_majors1"/>

</operator>

<operator activated="true" class="select_attributes" compatibility="9.2.001" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85">

<parameter key="attribute_filter_type" value="single"/>

<parameter key="attribute" value="c_cons_sum"/>

<parameter key="attributes" value=""/>

<parameter key="use_except_expression" value="false"/>

<parameter key="value_type" value="attribute_value"/>

<parameter key="use_value_type_exception" value="false"/>

<parameter key="except_value_type" value="time"/>

<parameter key="block_type" value="attribute_block"/>

<parameter key="use_block_type_exception" value="false"/>

<parameter key="except_block_type" value="value_matrix_row_start"/>

<parameter key="invert_selection" value="false"/>

<parameter key="include_special_attributes" value="false"/>

</operator>

<operator activated="true" class="normalize" compatibility="7.5.003" expanded="true" height="103" name="Normalize" width="90" x="313" y="136">

<parameter key="return_preprocessing_model" value="false"/>

<parameter key="create_view" value="true"/>

<parameter key="attribute_filter_type" value="single"/>

<parameter key="attribute" value="c_cons_sum"/>

<parameter key="attributes" value=""/>

<parameter key="use_except_expression" value="false"/>

<parameter key="value_type" value="numeric"/>

<parameter key="use_value_type_exception" value="false"/>

<parameter key="except_value_type" value="real"/>

<parameter key="block_type" value="value_series"/>

<parameter key="use_block_type_exception" value="false"/>

<parameter key="except_block_type" value="value_series_end"/>

<parameter key="invert_selection" value="false"/>

<parameter key="include_special_attributes" value="false"/>

<parameter key="method" value="range transformation"/>

<parameter key="min" value="0.0"/>

<parameter key="max" value="1.0"/>

<parameter key="allow_negative_values" value="false"/>

</operator>

<operator activated="true" class="loop_parameters" compatibility="6.0.003" expanded="true" height="103" name="Loop Parameters" width="90" x="447" y="136">

<list key="parameters">

<parameter key="Select Subprocess.select_which" value="[1.0;4;4;linear]"/>

</list>

<parameter key="error_handling" value="fail on error"/>

<parameter key="synchronize" value="false"/>

<process expanded="true">

<operator activated="true" class="multiply" compatibility="9.2.001" expanded="true" height="82" name="Multiply" width="90" x="45" y="85"/>

<operator activated="true" class="select_subprocess" compatibility="9.2.001" expanded="true" height="103" name="Select Subprocess" width="90" x="246" y="85">

<parameter key="select_which" value="4"/>

<process expanded="true">

<operator activated="true" class="concurrency:k_means" compatibility="9.2.001" expanded="true" height="82" name="Clustering" width="90" x="45" y="85">

<parameter key="add_cluster_attribute" value="true"/>

<parameter key="add_as_label" value="true"/>

<parameter key="remove_unlabeled" value="false"/>

<parameter key="k" value="5"/>

<parameter key="max_runs" value="10"/>

<parameter key="determine_good_start_values" value="true"/>

<parameter key="measure_types" value="BregmanDivergences"/>

<parameter key="mixed_measure" value="MixedEuclideanDistance"/>

<parameter key="nominal_measure" value="NominalDistance"/>

<parameter key="numerical_measure" value="EuclideanDistance"/>

<parameter key="divergence" value="SquaredEuclideanDistance"/>

<parameter key="kernel_type" value="radial"/>

<parameter key="kernel_gamma" value="1.0"/>

<parameter key="kernel_sigma1" value="1.0"/>

<parameter key="kernel_sigma2" value="0.0"/>

<parameter key="kernel_sigma3" value="2.0"/>

<parameter key="kernel_degree" value="3.0"/>

<parameter key="kernel_shift" value="1.0"/>

<parameter key="kernel_a" value="1.0"/>

<parameter key="kernel_b" value="0.0"/>

<parameter key="max_optimization_steps" value="100"/>

<parameter key="use_local_random_seed" value="false"/>

<parameter key="local_random_seed" value="1992"/>

</operator>

<connect from_port="input 1" to_op="Clustering" to_port="example set"/>

<connect from_op="Clustering" from_port="cluster model" to_port="output 1"/>

<connect from_op="Clustering" from_port="clustered set" to_port="output 2"/>

<portSpacing port="source_input 1" spacing="0"/>

<portSpacing port="source_input 2" spacing="0"/>

<portSpacing port="sink_output 1" spacing="0"/>

<portSpacing port="sink_output 2" spacing="0"/>

<portSpacing port="sink_output 3" spacing="0"/>

</process>

<process expanded="true">

<operator activated="true" class="k_medoids" compatibility="7.5.003" expanded="true" height="82" name="K-Medoids" width="90" x="45" y="187">

<parameter key="add_cluster_attribute" value="true"/>

<parameter key="add_as_label" value="true"/>

<parameter key="remove_unlabeled" value="false"/>

<parameter key="k" value="3"/>

<parameter key="max_runs" value="10"/>

<parameter key="max_optimization_steps" value="100"/>

<parameter key="use_local_random_seed" value="true"/>

<parameter key="local_random_seed" value="1992"/>

<parameter key="measure_types" value="MixedMeasures"/>

<parameter key="mixed_measure" value="MixedEuclideanDistance"/>

<parameter key="nominal_measure" value="NominalDistance"/>

<parameter key="numerical_measure" value="EuclideanDistance"/>

<parameter key="divergence" value="GeneralizedIDivergence"/>

<parameter key="kernel_type" value="radial"/>

<parameter key="kernel_gamma" value="1.0"/>

<parameter key="kernel_sigma1" value="1.0"/>

<parameter key="kernel_sigma2" value="0.0"/>

<parameter key="kernel_sigma3" value="2.0"/>

<parameter key="kernel_degree" value="3.0"/>

<parameter key="kernel_shift" value="1.0"/>

<parameter key="kernel_a" value="1.0"/>

<parameter key="kernel_b" value="0.0"/>

</operator>

<connect from_port="input 1" to_op="K-Medoids" to_port="example set"/>

<connect from_op="K-Medoids" from_port="cluster model" to_port="output 1"/>

<connect from_op="K-Medoids" from_port="clustered set" to_port="output 2"/>

<portSpacing port="source_input 1" spacing="0"/>

<portSpacing port="source_input 2" spacing="0"/>

<portSpacing port="sink_output 1" spacing="0"/>

<portSpacing port="sink_output 2" spacing="0"/>

<portSpacing port="sink_output 3" spacing="0"/>

</process>

<process expanded="true">

<operator activated="true" class="x_means" compatibility="9.2.001" expanded="true" height="82" name="X-Means" width="90" x="45" y="34">

<parameter key="add_cluster_attribute" value="true"/>

<parameter key="add_as_label" value="true"/>

<parameter key="remove_unlabeled" value="false"/>

<parameter key="k_min" value="3"/>

<parameter key="k_max" value="60"/>

<parameter key="determine_good_start_values" value="true"/>

<parameter key="measure_types" value="NumericalMeasures"/>

<parameter key="mixed_measure" value="MixedEuclideanDistance"/>

<parameter key="nominal_measure" value="NominalDistance"/>

<parameter key="numerical_measure" value="EuclideanDistance"/>

<parameter key="divergence" value="GeneralizedIDivergence"/>

<parameter key="kernel_type" value="radial"/>

<parameter key="kernel_gamma" value="1.0"/>

<parameter key="kernel_sigma1" value="1.0"/>

<parameter key="kernel_sigma2" value="0.0"/>

<parameter key="kernel_sigma3" value="2.0"/>

<parameter key="kernel_degree" value="3.0"/>

<parameter key="kernel_shift" value="1.0"/>

<parameter key="kernel_a" value="1.0"/>

<parameter key="kernel_b" value="0.0"/>

<parameter key="clustering_algorithm" value="KMeans"/>

<parameter key="max_runs" value="10"/>

<parameter key="max_optimization_steps" value="100"/>

<parameter key="use_local_random_seed" value="false"/>

<parameter key="local_random_seed" value="1992"/>

</operator>

<connect from_port="input 1" to_op="X-Means" to_port="example set"/>

<connect from_op="X-Means" from_port="cluster model" to_port="output 1"/>

<connect from_op="X-Means" from_port="clustered set" to_port="output 2"/>

<portSpacing port="source_input 1" spacing="0"/>

<portSpacing port="source_input 2" spacing="0"/>

<portSpacing port="sink_output 1" spacing="0"/>

<portSpacing port="sink_output 2" spacing="0"/>

<portSpacing port="sink_output 3" spacing="0"/>

</process>

</operator>

<operator activated="true" class="cluster_distance_performance" compatibility="9.2.001" expanded="true" height="103" name="Performance" width="90" x="380" y="85">

<parameter key="main_criterion" value="Avg. within centroid distance"/>

<parameter key="main_criterion_only" value="false"/>

<parameter key="normalize" value="false"/>

<parameter key="maximize" value="false"/>

</operator>

<connect from_port="input 1" to_op="Multiply" to_port="input"/>

<connect from_op="Multiply" from_port="output 1" to_op="Select Subprocess" to_port="input 1"/>

<connect from_op="Select Subprocess" from_port="output 1" to_op="Performance" to_port="cluster model"/>

<connect from_op="Select Subprocess" from_port="output 2" to_op="Performance" to_port="example set"/>

<connect from_op="Performance" from_port="performance" to_port="performance"/>

<connect from_op="Performance" from_port="example set" to_port="result 1"/>

<connect from_op="Performance" from_port="cluster model" to_port="result 2"/>

<portSpacing port="source_input 1" spacing="0"/>

<portSpacing port="source_input 2" spacing="0"/>

<portSpacing port="sink_performance" spacing="0"/>

<portSpacing port="sink_result 1" spacing="0"/>

<portSpacing port="sink_result 2" spacing="0"/>

<portSpacing port="sink_result 3" spacing="0"/>

</process>

</operator>

<connect from_op="Retrieve Engineering_majors1" from_port="output" to_op="Select Attributes" to_port="example set input"/>

<connect from_op="Select Attributes" from_port="example set output" to_op="Normalize" to_port="example set input"/>

<connect from_op="Normalize" from_port="example set output" to_op="Loop Parameters" to_port="input 1"/>

<connect from_op="Loop Parameters" from_port="result 1" to_port="result 1"/>

<connect from_op="Loop Parameters" from_port="result 2" to_port="result 2"/>

<portSpacing port="source_input 1" spacing="0"/>

<portSpacing port="sink_result 1" spacing="0"/>

<portSpacing port="sink_result 2" spacing="0"/>

<portSpacing port="sink_result 3" spacing="0"/>

</process>

</operator>

</process>

0

Best Answers

Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

April 2019 Solution Accepted

Just use the Cluster Distance Performance operator after each cluster method and it will give you the metrics you seek.
Of course, the value chosen for k for any of these will have a big impact on the cluster metrics. Do you have an a priori value you are using? Or do you need to do this evaluation over a range of different possible k-values (in which case you may want to build some loops and do some logging of results).
Additionally you should recall that K-means and X-means are actually the same algorithm, with x-means seeking for the "best" value of k automatically. So they won't give you different metrics if the k-value is the same for both.

Brian T.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
7
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

April 2019 Solution Accepted

There are no other major parameters for optimizing k-means other than the selection of k, so if you have that known in advance then I don't think there is really anything else for you to do. The only other parameter is the distance measure, and that is something you also typically determine in advance which one is suitable for your use case rather than "optimizing" it. If you have both numerical and nominal attributes you will be limited to Mixed Euclidean anyways.
You should be sure to normalize your data before running any k-means or k-medoids!

Above I was referring to whether you already knew the number of clusters (k) you wanted to use. If you already know k, then you also have no need for x-means since x-means is simply k-means searching across a range of possible k values, so I would drop that from your analysis.

Brian T.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
6

Answers

halaalrobassy Member Posts: 16 Contributor II

April 2019

thanks alot @Telcontar120
yes i have a pirior cluster number for column values.just i want to ask you what do you mean by doing some logging for the result. I also want to optimize the performance of the clustering, is it possible ,if yes can you tell me how , please .

1
halaalrobassy Member Posts: 16 Contributor II

April 2019

thank you, your advices really helped me

0

Sign In or Register to comment.