The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
RapidMiner 5 documentation?
Hi all.
My first message in this forum.
I am a beginner in Data mining too. I have been studing the teorical part and now I would like to try my first practical sample…
To the best of my knowledge, there is no rapidminer manual for version 5, right?
Please, do you know any places with tutorials/documentation for version 5?
I have a data set with animal attributes where Animal_Type (e.g mammal, bird, fish, etc) is the attribute class to be used for prediction.
I would like to use clustering to see the rapidminer tool grouping animals and create a model for prediction.
Note: I know that decision tree are common used but for now I would like to use clustering.
I already have a rapid miner object that opens the csv data set (ReadCSV) and a ChangeAttributeRole to set the Animal_type as the “label attribute” and I can see the output when I run the process.
Then I added the k-means clustering algorithm with k=2 (after adding nominal to numerical object) and I see the final two clusters with animals. But as initial dataset has four different classes this is not correct. So I configured the clustering algorithm to have k = 4 and I run it again. The animals are now grouped in four clusters but the clusters do not make sense. Why does clustering algorithm does not chooses the aminal_type as the field to split them into different clusters (I now that there is a centroid). So I am thining that k-means is not the right algorithm. Please, could you highlight my way on this?
Next, I would like to make the same but using decision trees. (It seems that it is easy to use decision trees on rapidminer) and then find out which method has more accuracy (clustering or decision trees). Which is the right rapidminer object to compute models accuracy (clustering vs decision tree) for this data set? A simple project would be very appreciated.
Thank you.
I.M.
My first message in this forum.
I am a beginner in Data mining too. I have been studing the teorical part and now I would like to try my first practical sample…
To the best of my knowledge, there is no rapidminer manual for version 5, right?
Please, do you know any places with tutorials/documentation for version 5?
I have a data set with animal attributes where Animal_Type (e.g mammal, bird, fish, etc) is the attribute class to be used for prediction.
I would like to use clustering to see the rapidminer tool grouping animals and create a model for prediction.
Note: I know that decision tree are common used but for now I would like to use clustering.
I already have a rapid miner object that opens the csv data set (ReadCSV) and a ChangeAttributeRole to set the Animal_type as the “label attribute” and I can see the output when I run the process.
Then I added the k-means clustering algorithm with k=2 (after adding nominal to numerical object) and I see the final two clusters with animals. But as initial dataset has four different classes this is not correct. So I configured the clustering algorithm to have k = 4 and I run it again. The animals are now grouped in four clusters but the clusters do not make sense. Why does clustering algorithm does not chooses the aminal_type as the field to split them into different clusters (I now that there is a centroid). So I am thining that k-means is not the right algorithm. Please, could you highlight my way on this?
Next, I would like to make the same but using decision trees. (It seems that it is easy to use decision trees on rapidminer) and then find out which method has more accuracy (clustering or decision trees). Which is the right rapidminer object to compute models accuracy (clustering vs decision tree) for this data set? A simple project would be very appreciated.
Thank you.
I.M.
Tagged:
0
Answers
Regarding the documentation: We will have a brand new documentation soon, but it will be in German. The English version will follow.
1st question (Why does kMeans not use animal_type); You marked animal_type as the label. The clustering algorithm does not see the label attribute at all, it only uses the regular attributes. After all, telling the clustering algorithm what the clustering actually is beforehand would not make much sense, would it?
2nd question (How to compare clustering and prediction): If you do things like this, always think whether it makes sense. You can use the "Map Clutering to Labels" operator to turn the cluster attribute into a "best fitting" prediction attribute. After that, you can use the regular performance operators to compute whatever performance measure you are interested in.
Cheers,
Simon
sorry for bringing up old threads:
any news on the new (german) documentation?
yes, it's currently being layouted for the final publication and translation. I will keep you informed as soon as we have published it.
Greetings,
Sebastian
we expect the translation to be finished (beginning of) July.
Cheers,
Ingo