The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Optimise Parameters operator

neildugganneilduggan Member Posts: 18 Contributor II
Hi

I am using the Optimise Parameters operator on some Decision Tree analysis - I have a query on what parameters to select within this operator:

Inside the Optimise Parameters operator, I have an X-Validation operator and in this I have Decision Tree on the training side and an Apply Model operator & a Performance operator on the testing side.

The results from the Performance operator are something like this:

For example, the results are:
Accuracy:
                          True Yes                True No                  Class Precision
Predicted Yes:      100                            1                                99.01%
Predicted No:        40                            460                            92.00%
Class recall        71.43%                    99.78%                 

In the In the Optimise Parameters operator, I have selected the DT operators (accuracy, gini_index, gain_ration and Information_gain) as the parameters to optimise but I'm not sure if this is correct? Should I be choosing something in the Performance operator? Ideally, I would like to get a result which balances the values of 71.43% & 99.78% as much as possible in the example above.

Any advice appreciated

Thanks

Neil

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,

    the optimize operator is optimizing for the main criterion of your performance operator. I guess it will optimize for the accuracy. if you want to optimize for something else, you need to use another value here (or define the specific value yourself. Data to Performance is helpful).

    In your case the problem might be class balance. Have you considered weights? And have you considered to change to AUC as measure?

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MBA_Data_MinerMBA_Data_Miner Member Posts: 21 Contributor II
    I'd be curious to see how a weighting scheme would work for balancing data. I have used sampling to balance datasets before but not weights.
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,

    Use a generate Attribute operator and create a new attribute. Set the weight to 1 for the one class and to 10 for the other. Afterwards set role of this attribute to weight. Then every example of the second class counts 10x.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.