Delete Examples with 2 missing attributes
Hello,
I know how to delete missing values of a column in different ways. However, I only want to remove the Examples which have two missing attributes:
Size Item 1 Item 2
1 ? milk
2 cookie milk
2 ? ?
2 cookie chocolate
2 cookie crackers
2 cookie ?
2 cookie raspberries
After that, I would like to combine the two tables to know the percentage of how often cookies and milk occure together and which is the absolute frequency from the occurence of cookie and milk.
How can I use FP-Growth for this?
Thank you in advance !
<?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="9.0.002" expanded="true" height="68" name="Read Excel" width="90" x="112" y="136">
<parameter key="excel_file" value="\\ADS.DLH.DE\LHuser$\LHT\HAM98\U717465\Documents\02_Data_Mining\01_rapidminer\closed_events_q-star.xlsx"/>
<list key="annotations"/>
<parameter key="date_format" value="MMM d, yyyy h:mm:ss a z"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Event ID.true.polynominal.attribute"/>
<parameter key="1" value="Event Title.true.polynominal.attribute"/>
<parameter key="2" value="Event Description.true.polynominal.attribute"/>
<parameter key="3" value="Event resp\. dept\..true.polynominal.attribute"/>
<parameter key="4" value="Risk Level.true.polynominal.attribute"/>
<parameter key="5" value="Severity Level.true.polynominal.attribute"/>
<parameter key="6" value="Severity Driver.true.polynominal.attribute"/>
<parameter key="7" value="Closed date event.true.date_time.attribute"/>
<parameter key="8" value="Total Event Time.true.integer.attribute"/>
<parameter key="9" value="Total Investigation Time.true.integer.attribute"/>
<parameter key="10" value="Total Implement\. Time.true.integer.attribute"/>
<parameter key="11" value="Resp\. for coordination.true.polynominal.attribute"/>
<parameter key="12" value="Resp\. for investigation.true.polynominal.attribute"/>
<parameter key="13" value="Source.true.polynominal.attribute"/>
<parameter key="14" value="Event type.true.polynominal.attribute"/>
<parameter key="15" value="Investigation type.true.polynominal.attribute"/>
<parameter key="16" value="Related requirements.true.polynominal.attribute"/>
<parameter key="17" value="CNQ.true.integer.attribute"/>
<parameter key="18" value="A/C Reg.true.polynominal.attribute"/>
<parameter key="19" value="Engine type.true.polynominal.attribute"/>
<parameter key="20" value="PNR.true.polynominal.attribute"/>
<parameter key="21" value="Customer/ Operator.true.polynominal.attribute"/>
<parameter key="22" value="MOR relevant.true.polynominal.attribute"/>
<parameter key="23" value="Repetitive Event.true.polynominal.attribute"/>
<parameter key="24" value="Reason for no or discont\. Investigation.true.polynominal.attribute"/>
<parameter key="25" value="Implemented CA/PA.true.polynominal.attribute"/>
<parameter key="26" value="Implemented Correction.true.polynominal.attribute"/>
<parameter key="27" value="Date of report.true.date_time.attribute"/>
<parameter key="28" value="Coordination closed date.true.date_time.attribute"/>
<parameter key="29" value="Investigation closed date.true.date_time.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="false"/>
</operator>
<operator activated="true" class="select_attributes" compatibility="9.0.002" expanded="true" height="82" name="Select Attributes (2)" width="90" x="313" y="136">
<parameter key="attribute_filter_type" value="subset"/>
<parameter key="attributes" value="Risk Level|Severity Level"/>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Select Attributes (2)" to_port="example set input"/>
<connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Best Answer
-
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn
Hello, @t_liebe,
Use the Filter Examples operator with the following configuration:
Notice that at the bottom, on your left hand, there is a Match all option. You must select it, as it's an AND operator. Otherwise, that will filter data where the records have one or the other attribute as well.
3
Answers
Thank you for your quick answer !