Replace missing values based on another attributes
Hey all,
I am new to Rapidminer and i have a question regarding data preparation. It looks a bit like this question, but i can't figure out how to apply this to my situation.
I have a dataset regarding accidents, and this datset contains the following attributes:
Total Fatal Injuries Total Serious Injuries Total Minor Injuries Total Uninjured
Missing Missing Missing 1
Missing Missing 2 Missing
2 2 Missing Missing
1 Missing Missing Missing
Missing Missing Missing Missing
I would like to fill in the missing values with ''0'' only if one of the four attributes contains a value.
I am not very experience with RapidMiner, and I'm learning through a book called ''Data Mining for the Masses''. Unfortunately, the book won't go into detail in these kind of problems.
I already tried to use the Generate attributes operater with the following code, but I am not skilled enough to get it to work:
if([ Total Uninjured ]>0, if(missing([ Total Fatal Injuries ])), then(replace(0)))
I tried to tell the program that if ''Total Uninjured'' is greater than ''0'' and ''Total Fatal Injuries'' is missing, replace ''Total Fatal Injuries'' with 0.
Any help would be greatly appreciated!
Answers
Hi @1640607mortel!
You can filter the examples with custom filters to pick only those examples that have at least one non missing value, replace missing values on those and then append the unmatched examples (i.e. all missing). Here is an example process. I added an ID to keep track of where the originally "all missing" rows were.
I hope that helped!
Cheers
Jan
Hi,
I'm in a computer without RM but from memory I think that the Generate Aggregation operator will do the trick: it can create a new attribute that says whether the other 4 attributes are missing or not. Then you can use it to filter.
Regards,
Sebastian