The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"discarding attributes with many missing values"
Hi there
Just enquiring if there is a pre-processing operator that discards attributes having more missing values than a specified threshold (given as a percentage for instance).
Thanks!
Dan
Tagged:
0
Answers
I think you can use the Remove Useless Attributes if the missing values exceed the number of same nominal values.
Anyway you could post a feature request on our bugtracker, since I think a dedicated "less than x% missing values" filter makes absolutely sense.
Greetings,
Sebastian
you are right, such an operator would be nice. I have uploaded a process with our new Community Extension which performs exactly the desired task. It is called "Discard Attribute with More than x% Missing Values (Loops + Macros)" and you can download and execute the process with a few clicks after having installed our new myExperiment Community Extension from the help menu of RapidMiner.
This process loops over all attributes and calculates the fraction of missings for each attribute. If this fration is larger than the fraction defined in the first "Set Macro" operator (macro: max_unknown), the attribute will be removed from the example set.
Cheers,
Ingo
Thanks,
-Gagi
in fact Ingo uploaded a complete process not a single operator. You can download the Community Extension as usual with the update manager and you don't have to sign into the community itself to download public available processes.
Greetings,
Sebastian
For some reason I did not see the list of public processes. This will help a lot.
While this works it seems very cumbersome, is there any way to extract meta data and filter based on number of missing values?
Thanks,
-Gagi
I guess Ingo wouldn't have posted this process if an easier way existed without coding either on your or our side. If you find an easier solution or if you extend RapidMiner on your own, please keep the community informed about this issue.
Greetings,
Sebastian
I must be missing something, would transposing the data and applying Ingo's stuff not work?
Just a thought.
Ciao
Thank for your suggestion. However, two problem are met:
1. community extention is intalled, however, no operator are added.
2. At https://www.myexperiment.org/workflows/1276/versions/1.html, only txt are downloaded, and can not open as xml.