The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Replace missing values with average in each cluster
painfulover
Member Posts: 1 Learner I
in Help
Hello,
I'm new to Rapidminer and I would like to replace missing values based on clustering, which means I have used k-means on columns which have no missing values and divide the original exampleset into 5 clusters. Now I would like know how to replace each row's missing values by the averages of the cluster it belongs to instead of the averages of whole attributes. I can only find the way to do the latter by the operator [replace missing values].
Thank you very much.
I'm new to Rapidminer and I would like to replace missing values based on clustering, which means I have used k-means on columns which have no missing values and divide the original exampleset into 5 clusters. Now I would like know how to replace each row's missing values by the averages of the cluster it belongs to instead of the averages of whole attributes. I can only find the way to do the latter by the operator [replace missing values].
Thank you very much.
Tagged:
0
Best Answers
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
This process is a bit involved. You get a "cluster model" from the clustering operator that you can apply to the data with missing values. However, you need to choose an operator that can work with missing values itself. Then you would aggregate the clustered original data (the non-missing data), grouping by the cluster to get the averages. You can join the result with the missing values and use e. g. Loop Attributes to fill in the missing values using Generate Attributes with a formula like if(missing(%{attr}), eval("average(" + %{attr} + ")"), %{attr}).
It is much easier to use the Impute Missing Values operator that automates the selection of missing values, building a model for predicting them (you can select the model type) and putting the predicted values into the missing cells. There is an example process in the operator help that shows you how to use it, with k-NN as the example learning algorithm.
Regards,
Balázs1 -
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data ScientistHi,I would just use Group into Collection and create a collection (list) of example sets, where each example set only contains one cluster. You can then use Loop Collection and in there use Replace Missing to replace it with the respective means. Afterwards you just Append the resulting collection again.Cheers,Martin- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1