Cluster Model Visualiuer after K-Means Clustering (Overview)
Hey Everybody,
I'm new to Rapidminer but I think it's an amazing tool. Please excuse my english, because I'm from germany.
As part of my master thesis I need to proof that there are cluster in a certain finance related dataset.
I use this design:
And get these results:
Perfectly fine for me until I reach the point of explaining the results.
I would like to know why there are exactly three attributes [ xxx is on average xx.xx% smaller / larger] ?
It would be perfectly fine for me, if its a standard method to use three ?
Can I adjust the number of showen attributes ?
Maybe it's obvious for you, but please help me to understand this.
Sincerely yours,
Max
Best Answer
-
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
Hi @offizielleemail,
Good question. The visualization of clustering model in AutoModel or generated from Model Visualizer will give TOP 3 Most influential factors from the input data. It is just to simplify the model explainations. It is not elegant to show everything if you have 150 variables
From the help documentation of AutoModel for Clustering:
Results: Clustering
This is the final step of Auto Model, where you can inspect the generated models together with other results. The output depends on the data and the choices you made. For example, if you deactivated the calculation of correlations or k-Means, those results will not be displayed.
Please note that the results are calculated in the background. However, you can immediately start to inspect the results as they are completed. You can stop background execution by pressing the Stop button at the bottom. Calculations which are not completed when execution is stopped won't be available. You can go back and make changes after the execution is finished or after you stopped it.
We at RapidMiner do not believe in black boxes. This is why you can always open the process which created the model and all related results. Simply click on a model result and on Open Process at the bottom of the screen. This will show you the process which performs all necessary data preprocessing and model optimization. You can use this process for deploying the model or as a starting point for further optimizations.
We will now discuss the possible results in detail below.
General
This section shows generic information which is independent of the models.
Data: the data set after it has been transformed for modeling.
Correlations: a matrix showing the correlations between Attributes.
Cluster Results
All other sections in the results menu are reserved for the cluster models. Each cluster model gets a section of its own and in general provides the entries below.
Summary: shows the size of all found clusters together with some information about the clusters and their quality.
Heat Map: identifies the most important Attributes for each cluster.
Cluster Tree: displays a decision tree describing the main differences between the clusters.
Centroid Chart: shows the values for the cluster centroids in a parallel chart.
Centroid Table: shows the values for the cluster centroids in a table.
Scatter Plot: with a choice of cluster, displays a scatter plot in terms of the two most important Attributes.
Clustered Data: displays a table with all the data, including the cluster label for each data point.HTH,
YY
1
Answers
based on your information, I wanna ask about the [ xxx is on average xx.xx% smaller / larger] things. this statement didn't reflect the quality of the attribute/cluster, right? it's just information about what attribute is the "differentiator"? and what % means?? thankyou...
regards, Merin