Create the right data out of a warehouse return dataset.
first of all excuse my bad English. The purpose of my question is that we have to do a of thesis including a data analysis with rapid miner.
I have a dataset with 20146 Customers, the set includes about 60 attributes but just 3 of them are relevant. Let me try to explain it to you in this way. The whole data set is about return rates in warehouse trade. In simple words, how many articles did a customer order and how many of them is he returning instead of buying.
He gave us the following parameters, >40% high return rate, <18 % low return rate, and the 22% in the middle are neither high or low. So we have 3 different classes of customers. they are supposed to be classified with the value : H - high return rate, N - low return rate, - U - unidentified
We have the customer number, the delivered amount of products and the returned amount of products for each customer.
The OUTCOME data needs to be like that: <Customer Number>, <Class (H/N/U)>
230823, N
230824, H
230825, U
I managed to create a data set that includes the customer number and N and H but I can't define the 22% class that is between high and low. I tried with if function, generate attributes, and so on. Another problem is that when I try to do it with generate attributes it just gives me true or false and that doesn't help me really much.
Does anyone has an idea of how to solve this? I am pretty desperate. I hope you can help me
regards
<?xml version="1.0" encoding="UTF-8"?><process version="9.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.1.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.1.000" expanded="true" height="68" name="Retrieve retouren_train" width="90" x="45" y="34">
<parameter key="repository_entry" value="//Local Repository/retouren_train"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="179" y="85">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="RETOUREN_MENGE|LIEFER_MENGE|KDNR"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="9.1.000" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="85">
<list key="function_descriptions">
<parameter key="Niedrigretournierer" value="RETOUREN_MENGE/LIEFER_MENGE*100<18"/>
<parameter key="Hochretournierer" value="RETOUREN_MENGE/LIEFER_MENGE*100>40"/>
</list>
<parameter key="keep_all" value="true"/>
</operator>
<operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace" width="90" x="648" y="187">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Hochretournierer"/>
<parameter key="attributes" value="Hochretournierer"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="true"/>
<parameter key="replace_what" value="true"/>
<parameter key="replace_by" value="H"/>
</operator>
<operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (2)" width="90" x="246" y="238">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Niedrigretournierer"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="replace_what" value="true"/>
<parameter key="replace_by" value="N"/>
</operator>
<operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (3)" width="90" x="313" y="340">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Hochretournierer"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="replace_what" value="false"/>
<parameter key="replace_by" value="N"/>
</operator>
<operator activated="true" class="replace" compatibility="9.1.000" expanded="true" height="82" name="Replace (4)" width="90" x="514" y="238">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="Niedrigretournierer"/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
<parameter key="replace_what" value="false"/>
<parameter key="replace_by" value="H"/>
</operator>
<operator activated="true" class="set_role" compatibility="9.1.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="391">
<parameter key="attribute_name" value="KDNR"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles">
<parameter key="Hochretournierer" value="prediction"/>
<parameter key="Niedrigretournierer" value="prediction"/>
</list>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.1.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="581" y="391">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value="KDNR|Niedrigretournierer"/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="attribute_value"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="time"/>
<parameter key="block_type" value="attribute_block"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="value_matrix_row_start"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="rename" compatibility="9.1.000" expanded="true" height="82" name="Rename" width="90" x="681" y="289">
<parameter key="old_name" value="Niedrigretournierer"/>
<parameter key="new_name" value="Einteilung Hoch-/Niedrigretournierer"/>
<list key="rename_additional_attributes"/>
</operator>
<connect from_op="Retrieve retouren_train" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_op="Replace" to_port="example set input"/>
<connect from_op="Replace" from_port="example set output" to_op="Replace (2)" to_port="example set input"/>
<connect from_op="Replace (2)" from_port="example set output" to_op="Replace (3)" to_port="example set input"/>
<connect from_op="Replace (3)" from_port="example set output" to_op="Replace (4)" to_port="example set input"/>
<connect from_op="Replace (4)" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Answers
Thanks for sharing the process. I can not run the process to check the logics without input data. But from your formula, I guess a nested if() statement will work for your case.
HTH!
YY
An other solution is to use the Discretize by User Specification operator :
Regards,
Lionel
He gave us a 2nd dataset but in that data set the attributes "RETOUREN_MENGE" and "LIEFER_MENGE" are missing but we have like 69 other attributes. So we have 2 datasets one already has the outcome data that we want to have and the 2nd dataset doesn't contain this data. So he wants us to to a prognosis from the first dataset onto the 2nd one. So basically we need to find out some kind of correlation or s.th similar and transfer that knowledge onto the 2nd dataset so I can get the same outcome. I attached the 2 datasets maybe you can give me a hint on how to do that?