The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Remove attributes with missing values exceeding a given threshold (percentage)
Hi, I'm new to Rapid Miner. I'm trying to do something very simple but I'm stuck with it. Given my data collection with many attributes I want to remove columns in which there are more than a given percentage of missing values (because I would not be able to use fixed values or infer their values). I tried the Remove Useless Attributes node but still I have columns with almost 90% of missing values so it didn't work as I wanted. Can you help me achieve what I want? It should be something trivial, I remember in Knime there was a specific option in the filter node to specify the percentage threshold.
Thank you!
Tagged:
2
Answers
There are probably a few different ways of doing it, but the easiest I can come up with is using the "Remove Useless Attributes" operator. Please take a look at the example process below (just copy it and paste it into your XML panel, then click the green checkmark):
It is very easy with TURBO PREP :
- Open your dataset with Turbo Prep
- Click on CLEANSE
- Click on REMOVE LOW QUALITY
- Set the Max missing (%)
- Click on COMMIT CLEANSE
Hope this helps,
Regards,
Lionel
If you don't have access to TURBO PREP, your task can be easily performed by a very simple Python script.
To execute this process, you will need to :
- Install Python on your computer.
- Install the Python Scripting extension from the MarketPlace.
- Set the Max Missing (%) values in a attribute (for this set the threshold called thr in the Set Macros operator).
The Process :
Lionel
there is a operator in toolbox called Select Attributes (Missings) or something like that which does the trick.
BR
Martin
Dortmund, Germany
The operator is called : Filter Attributes with Missing Values.
Thanks,
Regards,
Lionel