How to compare data before and after missing values handling?
Dear everyone,
I'm learning RapidMiner using a NBA dataset from data.world. I noticed that there are missing data in the 3P% column. The way I filterd out these 11 rows was clicking missing_attritubes in the top-right.
So I used Raplace Missing Values to set missing data to 0. The process worked successfully but what I want to know is: How could I show only these 11 rows after replacing missing to 0? Because after replacing, I can't filter data by selecting missing_attritubes.
Can anyone help me on this case? I've been stucked for several days... Do I need to do any change in my process or there are other solutions?
My process:
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve" width="90" x="45" y="34">
<parameter key="repository_entry" value="//PredictNBARookie/Data/nba_logreg"/>
</operator>
</process>
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<operator activated="true" class="replace_missing_values" compatibility="8.0.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
<parameter key="return_preprocessing_model" value="false"/>
<parameter key="create_view" value="false"/>
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="3P%"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="default" value="zero"/>
<list key="columns"/>
</operator>
</process>
Thanks in advance!
Best, Lee
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
Hi @EdisonLee,
I used the Generate Attribute operator to create a copy of your attribute 3P% named 3P% back_up.
and then I used the Join Operator to join this created attribute to your dataset.
Here the results after filtering :
lolol
You can find the process here :
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="34">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\NBA_missing_values\nba_logreg.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="windows-1252"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Name.true.polynominal.attribute"/>
<parameter key="1" value="GP.true.integer.attribute"/>
<parameter key="2" value="MIN.true.real.attribute"/>
<parameter key="3" value="PTS.true.real.attribute"/>
<parameter key="4" value="FGM.true.real.attribute"/>
<parameter key="5" value="FGA.true.real.attribute"/>
<parameter key="6" value="FG%.true.real.attribute"/>
<parameter key="7" value="3P Made.true.real.attribute"/>
<parameter key="8" value="3PA.true.real.attribute"/>
<parameter key="9" value="3P%.true.real.attribute"/>
<parameter key="10" value="FTM.true.real.attribute"/>
<parameter key="11" value="FTA.true.real.attribute"/>
<parameter key="12" value="FT%.true.real.attribute"/>
<parameter key="13" value="OREB.true.real.attribute"/>
<parameter key="14" value="DREB.true.real.attribute"/>
<parameter key="15" value="REB.true.real.attribute"/>
<parameter key="16" value="AST.true.real.attribute"/>
<parameter key="17" value="STL.true.real.attribute"/>
<parameter key="18" value="BLK.true.real.attribute"/>
<parameter key="19" value="TOV.true.real.attribute"/>
<parameter key="20" value="TARGET_5Yrs.true.real.attribute"/>
</list>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="8.1.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="246" y="34">
<list key="columns">
<parameter key="3P%" value="zero"/>
</list>
</operator>
<operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID" width="90" x="648" y="34"/>
<operator activated="true" class="generate_attributes" compatibility="8.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="447" y="187">
<list key="function_descriptions">
<parameter key="3P%_back_up" value="[3P%]"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="581" y="187">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="3P%_back_up"/>
</operator>
<operator activated="true" class="generate_id" compatibility="8.1.000" expanded="true" height="82" name="Generate ID (2)" width="90" x="715" y="187"/>
<operator activated="true" class="concurrency:join" compatibility="8.1.000" expanded="true" height="82" name="Join" width="90" x="849" y="34">
<list key="key_attributes"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="original" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
<connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Does this process answer to your need ?
Regards,
Lionel
1
Answers
Hi @lionelderkrikor
Thank you for helping me. This is a very nice way to achieve my goal. I can easily understand how you did that. But I don't know why I couldn't let your process run on my computer. How should I connect operators?
Thanks,
Lee
HI @EdisonLee,
It's weird, it's seems that the Join operator is considered as "deprecated" by RapidMiner.
Try the following operations :
- Delete this Join operator.
- Search the Join operator thanks to the operator search box.
- Drag and drop the Join operator in the process window.
- Connect manually the Join operator to the two Generate ID operators.
I hope it helps,
Best regards,
Lionel
Dear @lionelderkrikor
The process worked after I followed your instructions. Your solution really solves my question. Thanks again to give me different thought to do data processing in RapidMiner. :smileyhappy:
Best Regards,
Lee