The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
select attributes with at least one entry
sgenzer
Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Hi...having a brain freeze with what should be a very simple thing. I have a data set with 250+ nominal and numerical attributes, many of which are completely empty (all missing values). Others have perhaps one or two entries with the rest missing. How do I eliminate all attributes, nominal and numerical, that only have ALL missing values? I have tried every combination of operators (including Ingo's workaround posted a while ago) to no avail.
Scott
0
Best Answer
-
MartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
Hi,
have a look at the attached process. You can filter on >X or <X number of missings.
~Martin
<?xml version="1.0" encoding="UTF-8"?><process version="7.4.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.4.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="7.4.000" expanded="true" height="68" name="Retrieve Golf" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Samples/data/Golf"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="7.4.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
<list key="function_descriptions">
<parameter key="only_missing" value="str(0/0)"/>
</list>
</operator>
<operator activated="true" class="aggregate" compatibility="7.4.000" expanded="true" height="82" name="Aggregate" width="90" x="380" y="136">
<parameter key="use_default_aggregation" value="true"/>
<parameter key="default_aggregation_function" value="count (ignoring missings)"/>
<list key="aggregation_attributes">
<parameter key="Wind" value="count (ignoring missings)"/>
</list>
</operator>
<operator activated="true" class="rename_by_replacing" compatibility="7.4.000" expanded="true" height="82" name="Rename by Replacing" width="90" x="514" y="34">
<parameter key="replace_what" value="countWithOutMissings\((.+)\)"/>
<parameter key="replace_by" value="$1"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="7.4.000" expanded="true" height="82" name="Select Attributes" width="90" x="648" y="34">
<parameter key="attribute_filter_type" value="numeric_value_filter"/>
<parameter key="numeric_condition" value=">1"/>
</operator>
<operator activated="true" class="data_to_weights" compatibility="7.4.000" expanded="true" height="82" name="Data to Weights" width="90" x="782" y="34"/>
<operator activated="true" class="select_by_weights" compatibility="7.4.000" expanded="true" height="103" name="Select by Weights" width="90" x="916" y="136"/>
<connect from_op="Retrieve Golf" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Aggregate" to_port="example set input"/>
<connect from_op="Aggregate" from_port="example set output" to_op="Rename by Replacing" to_port="example set input"/>
<connect from_op="Aggregate" from_port="original" to_op="Select by Weights" to_port="example set input"/>
<connect from_op="Rename by Replacing" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Data to Weights" to_port="example set"/>
<connect from_op="Data to Weights" from_port="weights" to_op="Select by Weights" to_port="weights"/>
<connect from_op="Select by Weights" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>- Sr. Director Data Solutions, Altair RapidMiner -
Dortmund, Germany1
Answers
Remove Useless Attributes doesn't work on missings? What if you first run a Replace Missing Values with a constant value and then use it?
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
that works very nicely. Thanks, Martin.
Scott
yes tried that for a while. It does not appear to work because of the mixture of nominal and numerical atts...Martin's solution takes care of that.