The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Impute Missing Values
Working with my students on dealing with missing and imbalanced data in RM we found that Impute Missing Values operator, used in in the Tutorial Process for that operator, removes the label role from the class attribute (of the Labor-Negotiations dataset) and transfers it to duration attribute.
You can easily check the attributes and their roles on the k-NN (or any othe learner inside the operator) outside input and inside input.
I was not able to explain such a behaviour (although of course it is easy to work it out using Set Role twice).
Does anybody know the formal explanation?
0
Best Answers
-
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistAs explained in the first reply, the learner is built iteratively on each column. When you impute for column A, it automatically set column A as label because you need to predict the missing values in column A. When you impute column B in the next round, the learner will use non-missing values of column B as label to predict the missing values in column B. Repeat this (set different column as label in each step) for column C, column D, column E,…, until you finish imputing missing values in all columns.
More explanation and implementation details can be found on the GitHub open source page here
https://github.com/rapidminer/rapidminer-studio/blob/master/src/main/java/com/rapidminer/operator/preprocessing/filter/MissingValueImputation.java
What is role? Check this out https://community.rapidminer.com/discussion/54761/roles-and-labels-a-quick-guide0 -
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistYou should insert break points before the learner inside the nest and check the refreshed metadata in each iteration. In your sceeenshot, the metadata is valid for the first iteration Only.0
-
mlubicz Member, University Professor Posts: 17 University ProfessorThank you for your comprehensive replies, including directing me to the operator code on the GitHub, which clarifies a lot, particularly "* setting one of the regular attributes to label under the assumption that all * attributes are from the same type".
I think we could set the question as solved from practical point of view (although it could be interesting to investigate the case when the above assumption is not hold while a learner accepts attributes of a specific type, like DT for a selected criterion; maybe at least an explanation in the IMV operator description in Help would be helpful, if not enabling the IMV to impute missing values for attributes of that specific type)
0
Answers
Check out the operator info here