The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Specifying Prior Probabilities

tobybtobyb Member Posts: 11 Contributor II
edited November 2018 in Help
Is there a way to specify prior probabilities in Rapid Miner?  For example let's say I have a dataset that has 80% of one class and 20% of another class.  A subset is created that has 50% of both classes.  I would like to be able to specify that the prior probabilities were 80% and 20%.

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    You could do this by filtering and counting using data macros, but a quick and sneaky fix sometimes has its place, like this...
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="EqualLabelWeighting" class="EqualLabelWeighting">
        </operator>
    </operator>
    Good weekend to all!

  • keithkeith Member Posts: 157 Maven
    haddock wrote:

    Hi there,

    You could do this by filtering and counting using data macros, but a quick and sneaky fix sometimes has its place, like this...
    <process omitted>

    I'm probably missing something obvious, but it seems like this is backwards.  The original question was about data with a true (prior) probability of 80/20, but with the minority label oversampled such that the training data was 50/50.  Wouldn't EqualLabelWeighting be more like taking an 80/20 sample to a 50/50 prior?

    Keith


  • haddockhaddock Member Posts: 849 Maven
    Hi Keith,

    Have you not heard? Backwards is the new forwards! Perhaps I should have been more explicit; we can use the fact that we know the number of classes and the 'equal weight' number to keep track of the original distribution. In the binominal case we simply divide 0.5 by the weight to produce the count, like this...
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="EqualLabelWeighting" class="EqualLabelWeighting">
        </operator>
        <operator name="AttributeConstruction" class="AttributeConstruction">
            <list key="function_descriptions">
              <parameter key="Count" value="0.5/weight"/>
            </list>
        </operator>
    </operator>
Sign In or Register to comment.