technical question about the combined use of clustering and classification

Belle · May 2020

Hi there! I'm a newbie to rapidminer and confronted a problem regarding the combined use of the clustering and classification.

Basically, I want to develop k-means clusters of my initial dataset and then further build models to perform the classification and evaluate their performance for EACH of the clusters. I know how to use the operators to perform cluster analysis and classification respectively but have no idea how to deploy the operators to combine them. I tried many ways such as placing the k-means operators before or within the cross-validation but still fail to either run it successfully or get the performance result of each cluster. Can anyone help?
Any response would be greatly appreciated

Thank you!

lionelderkrikor · May 2020

Hi @Belle,

Are you using one of the performances operators dedicated to clustering (A priori the Cluster Distance Performance for k-Means) :

Image: https://us.v-cdn.net/6030995/uploads/editor/xk/9yy939uv5l78.png

Regards,

Lionel

Belle · May 2020

Hi @lionelderkrikor,

Thank you for your replay

And yeah, I tried "Cluster Distance Performance" in my process but found out it was just for evaluating the cluster (e.g. telling me the Davies-Bouldin index of the cluster) while the result I want is to see the performance (say, accuracy) in each cluster. Do I misunderstand those operators?

Thanks!

lionelderkrikor · May 2020

@Belle

I think you have to Generate a "prediction attribute" from your clustering results to perform the correspondence between
the cluster(s) results and the classes of your label.

EDIT :
I'm using the Iris Dataset. To be more precise on the methodology , I 'm clustering the different examples, and then label each cluster using the majority label of the labelled examples in that cluster.

You can see what I mean by opening and running the process in attached file.

Hope this helps,

Regards,

Lionel

Belle · May 2020

Hi @lionelderkrikor,

Big thanks for your explanation and example!

But I came up with two questions regarding your provided process:

1. In the training section of the cross-validation operator, it uses simply one clustering operator to train the model. I am wondering why we don't need to put any model for classification (e.g. decision tree or neural net) as the whole dataset contains the labelled attribute, which should thus be used as supervised learning? ( In my imagination, if I want to do classification in each of the clusters, I should have used both clustering operator and classification model?)

2. In the testing section of the cross-validation operator, you use generate attribute to assign the label to each cluster. Does that mean that instead of assigning the label using the classification model, we should assign the label manually (where, I found some inconsistency, e.g. cluster 0 contains both Iris-versicolor & Iris-virginica, but you only assign the cluster 0 to Iris-versicolor?)?

Thank you so much!

Belle

Telcontar120 · May 2020

Take a look at Map Clusters to Labels operator. It will do what you are looking for (I think) but you need to have the same number of classes in your label as you have clusters.

lionelderkrikor · May 2020

Hi @Belle,

To answer to your question :

Does that mean that instead of assigning the label using the classification model, we should assign the label manually

It is effectively what I tried to do manually/ "handcraft" in the process I shared in my previous post. This operation is performed automatically by the Map Clusters on Labels operator as said by @Telcontar120, but I was not aware of this operator.
I can say in conclusion that I learn new things everyday on RapidMiner...

Thanks for sharing this operator, Brian !

Regards,

Lionel

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

technical question about the combined use of clustering and classification

Answers