The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Dealing with Imbalanced Data

earmijoearmijo Member Posts: 271 Unicorn
edited November 2018 in Help
I'm studying the consequences of imbalanced data. I'm trying to replicate some earlier papers on the topic (e.g. Japkowicz 2002).

This is what I need to do, but I'm stuck:

1) Take the original dataset

2) Split it according to the value of the label  (call the two new example sets : Common and Rare).

3) Resample (bootstrap) the Rare ExampleSet until it has the same size as the Common ExampleSet.

4) Join the resampled Rare with the old Common.

I can do it outside Rapid-I, but I was wondering if it can be done with a few operators.

Thanks in advance for any help,

\E

Answers

  • earmijoearmijo Member Posts: 271 Unicorn
    Almost inmediately after posting my question I found a way to do it. It is not very elegant and I'm sure it is not very useful if the dataset is huge, but  it works fine for me. It is an example of oversampling the small class.  I'll share it with you:
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ChurnReductionExampleSetGenerator" class="ChurnReductionExampleSetGenerator">
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="IOSelector" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label = terminate"/>
        </operator>
        <operator name="Bootstrapping" class="Bootstrapping">
            <parameter key="sample_ratio" value="13.28"/>
        </operator>
        <operator name="IOSelector (2)" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="select_which" value="2"/>
        </operator>
        <operator name="ExampleFilter (2)" class="ExampleFilter">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label = ok"/>
        </operator>
        <operator name="ExampleSetMerge" class="ExampleSetMerge">
        </operator>
    </operator>
  • haddockhaddock Member Posts: 849 Maven
    Actually this issue has already been covered several times, once even by me..

    http://rapid-i.com/rapidforum/index.php/topic,1246.msg4786.html#msg4786
Sign In or Register to comment.