The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"confusion matrix in rapidminer for clustering"

SamiRamiSamiRami Member Posts: 5 Contributor I
edited May 2019 in Help
Hi ... 
In rapidminer, how can I compute the confusion matrix for the "clustering results" (assuming the actual classes are provided with the data, in order to evaluate the performance of a clustering algorithm, say k-medoid ?
Thanks.
Tagged:

Answers

  • kypexinkypexin RapidMiner Certified Analyst, Member Posts: 291 Unicorn
    Hi @SamiRami

    Confusion matrix is not actually applicable to clustering, since its purpose to show difference between model predictions and actual value of target variable in supervised classification algorithms, while clustering is an unsupervised algorithm by its nature.

    However, if you have data labelled with actual classes (or clusters) plus predicted class value (cluster value produced by a model), you can use PERFORMANCE (CLASSIFICATION) operator to generate confusion matrix. 
  • SamiRamiSamiRami Member Posts: 5 Contributor I
    edited February 2019
    I have the actual + the predicted classes. I need to evaluate the performance of the clustering algorithm (external evaluation with confusion matrix, precision, recall, f-measure ...)
    I am not sure the PERFORMANCE (CLASSIFICATION) could solve my issue (although some of its outputs are "weighted mean recall" and "weighted mean precision". This process as I think is for bi-classes.
    How can I measure  the clustering performance for multiclasses by the external validity indexes? 
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Clustering is of course unsupervised, so it isn't built in order to predict a specific label.  Having said that, this subsequent question is not uncommon so RapidMiner includes an operator that allows you to do this, called "Map Clustering on Labels".  It is designed for this purpose but you'll need to have the same number of clusters as you have classes you are trying to predict.  Take a look at at the sample process for more help.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • kypexinkypexin RapidMiner Certified Analyst, Member Posts: 291 Unicorn
    Hi @SamiRami

    I'd add one concern here, technically you can actually use PERFORMANCE (CLASSIFICATION) operator on an arbitrary dataset, you only need to be sure that there's an attribute of type 'label', which indicates actual class, and another attribute of type 'prediction', which indicates model predicted class. If you already have a dataset representing this, you can use SET ROLE operator to define label and prediction columns respectively.
  • SamiRamiSamiRami Member Posts: 5 Contributor I
    I am just star testing rapid miner...
    Can you please provide me the processes needed in sequence along with its parameters setting.
    Appreciate it ... 
  • kypexinkypexin RapidMiner Certified Analyst, Member Posts: 291 Unicorn
    Hi @SamiRami

    It could be easier to help you if you could share here actual dataset on which you want to produce confusion matrix and evaluate performance metrics.
Sign In or Register to comment.