deleting data based on 2 conditions - not just filter

thesoletraveler · May 2018

Hi there

I have only started using Rapidminer so am quite basic - apologies.

I would like to reduce my data to include only data that meets specific conditions. When I use the filter operators they are returning the data that i want removed. My dilemma is this. I want to delete all data that is a Yes, or a 1 (in first column) and has a value greater than 150 (2nd column). With the filter it returns the data for 80 of the 750 entries that meets those conditions but I want this data deleted and to keep the 670 entries not the 80. I hope I'm being clear.

Thank you I have spent hours trying different operators and searching the community and Youtube.

lionelderkrikor · May 2018

Hi again @thesoletraveler,

If I understand good, you just have to apply the 2 filters conditions and check invert filter :

Does this process answer to your need :

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="8.2.000" expanded="true" height="68" name="Read CSV" width="90" x="112" y="85">
        <parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Tests_Rapidminer\Double_filters\diabetes.csv"/>
        <parameter key="column_separators" value=","/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <parameter key="encoding" value="windows-1252"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Pregnancies.true.integer.attribute"/>
          <parameter key="1" value="Glucose.true.integer.attribute"/>
          <parameter key="2" value="BloodPressure.true.integer.attribute"/>
          <parameter key="3" value="SkinThickness.true.integer.attribute"/>
          <parameter key="4" value="Insulin.true.integer.attribute"/>
          <parameter key="5" value="BMI.true.real.attribute"/>
          <parameter key="6" value="DiabetesPedigreeFunction.true.real.attribute"/>
          <parameter key="7" value="Age.true.integer.attribute"/>
          <parameter key="8" value="Outcome.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="filter_examples" compatibility="8.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="246" y="85">
        <parameter key="invert_filter" value="true"/>
        <list key="filters_list">
          <parameter key="filters_entry_key" value="Outcome.eq.1"/>
          <parameter key="filters_entry_key" value="Insulin.ge.150"/>
        </list>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Filter Examples" to_port="example set input"/>
      <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

I hope it helps,

Regards,

Lionel

lionelderkrikor · May 2018

Hi @thesoletraveler,

Can you share your dataset and explain with an example what you want to obtain, please ?

Regards,

Lionel

thesoletraveler · May 2018

Hi

In the outcome column of the raw data set you will see either 1 (true) or 0 (false) which I know how to switch. In the insulin column there are numerous values. Based on literature (and my directive), it is argued that anyone with insulin over 150 is on insulin therapy. So when building a model to predict diabetes I want to eliminate anyone who is 1 or true (they have diabetes) with an insulin reading above 150 as they already have diabetes and are receiving therapy, therefore not an accurate data set to include in building my model.

As mentioned previously, I want to delete these 70+ data results so I have data from approx 680 (needing additional work in themselves) who may or may not have diabetes and who are not receiving insulin therapy.

I believe it adds far more value to my model if insulin results are more relevant. Thanks in advance for your assistance!

C

MartinLiebig · May 2018

Hi @thesoletraveler

another way to solve this is the expression option of Filter Examples, which is very flexbile and powerful.

Best,

Martin

thesoletraveler · May 2018

Thank you so much Lionel. So much to learn in this program! I really appreciate this community and their time which has been helpful for me in my analytics study. All the best.

C

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

deleting data based on 2 conditions - not just filter

Best Answer

Answers