The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
[Solved] Add clustering label to dataset
aryan_hosseinza
Member Posts: 74 Contributor II
in Help
Hi everybody ,
I am doing clustering in RapidMiner , my original dataset doesn't have any attribute for cluster id , but I want the the clustering attribute to be added to my dataset,
How can I do that ? because after clustering , I have access to the model not the dataset ,
Thanks in advance
I am doing clustering in RapidMiner , my original dataset doesn't have any attribute for cluster id , but I want the the clustering attribute to be added to my dataset,
How can I do that ? because after clustering , I have access to the model not the dataset ,
Thanks in advance
0
Answers
for most of our clustering operator you should see two output ports which offer the clustering model and the clustered set, which is you input example set + the cluster id.
If you just have the clustering model use the "Apply Model" operator to apply the model on a dataset and generate your desired clustering attribute.
Best Regards
Marcin
Another question is : When I extract cluster prototype , my label attribute vanishes , what is the reason ? how can I preserve it
If you want the clustered data (including the label, if it has been in the data before the clustering), just use the second output of the Clustering operator.
Best, Marius
1. Join the extracted prototypes with the original data, use all attributes as key attributes:
2. Set the role of the label to regular before clustering. Warning: that way the label will be considered for clustering. This is not always what you want.
Happy Mining!
~Marius
But there's a problem , the dataset is very large (550K of examples with more than 700 attributes) , so joining is not applicable I guess ,
Do you have any idea how I can down sample such a large dataset ? maybe another way not extracting prototypes
I appreciate your help,
Thanks