The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to perform precision and recall with k-means and DBSCAN algorithms?
Hi all,
I want to perform the precision/recall method on a K-means and DBSCAN algorithm. I've added a target label(Workaround) to the sample data set. Because of the map clustering on labels, i'm only able to set k=2. With other numbers it doesn't work because it has to match the amount of labels. Is there another way in RM to perform precision/recall on clustering algorithms without the map clustering so i can play with the number of k?
I'm hoping that somebody can help me out. Thanks in advance
Regards,
Patrick
I want to perform the precision/recall method on a K-means and DBSCAN algorithm. I've added a target label(Workaround) to the sample data set. Because of the map clustering on labels, i'm only able to set k=2. With other numbers it doesn't work because it has to match the amount of labels. Is there another way in RM to perform precision/recall on clustering algorithms without the map clustering so i can play with the number of k?
I'm hoping that somebody can help me out. Thanks in advance
Regards,
Patrick
0
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornTo get a supervised ML performance metric like precision or recall for an unsupervised ML method like clustering, you need to map them to labels so they can be evaluated as predictions versus a known actual state. So if you have only 2 label values, then you can only have two clusters to use the "Map Clustering on Labels" operator (because it will treat those clusters as label values so they can be mapped).
You could theoretically do this with more than two clusters, but you would then need to map the extra clusters manually to your two labels, so in the end you would still be effectively measuring the performance of only two clusters (or "superclusters" since they are just combinations of smaller clusters).
Alternatively you could increase the number of label values, so if you had three label values then you could support 3 clusters, etc.
1
Answers
you can always set the roles of your attributes yourself. Set up one attribute with the role "label", another with "prediction", and Performance should work on the example set. It might need confidences, too, for some measures like AUC. So you might want to generate those using Generate Attributes.
Regards,
Balázs
Try rerunning your k-means with k=2 and then doing the clustering mapping. That should allow you to get he performance metrics you want.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts