The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Optimization Grid with Random Forest - Not Working.

CarlNCarlN Member Posts: 6 Contributor II
RapidMiner Unicorns 🦄,

I trying to run a optimization grid with our my Random Forest model and I am getting an error.  It's stating that gain_ratio criterion cannot be used for numeric labels (see pictures below).  I checked all my parameters and I am not using gain_ratio in the optimization grid (see pictures below).  So, specifically how you used a optimization grid with cross validation, and random forest predicting a real number in RapidMiner? 

Can you send an basic working example of this workflow process with with good documented comments explaining each step.



Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    can you show us your optimization settings? Likely you use least_square there.

    Also: Be careful using Explain Pred in the X-Val. This can take enormous amount of time.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • CarlNCarlN Member Posts: 6 Contributor II
    Please see below.  Also, I am sending the results of the optimization to a log.  Let me know what this issue is or an example workflow process of how this works in RapidMiner.




  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    you have a numeric label and try to vary the gain metric between [information_gain,gain_ratio,gini_index,accuracy]. This has to not work, since those are all metrics which don't work on numeric labels.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • CarlNCarlN Member Posts: 6 Contributor II
    Okay, thanks for the explanation, but the solution is not clear from your response. 

    Specifically what configuration/setup tasks are needed to make the grid optimization operator work and simply find the optimal parameters for Random Forest model?  Do you have a sample workflow of how this can work?
  • CarlNCarlN Member Posts: 6 Contributor II
    Okay, thanks for the explanation, but the solution is not clear from your response. 

    Specifically what configuration/setup tasks are needed to make the grid optimization operator work and simply find the optimal parameters for Random Forest model?  Do you have a sample workflow of how this can work?
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    Just select correct and applicable settings for the optimization. Leave the criterion alone (it has to be least_square for numerical prediction) and optimize parameters like the number of trees and the maximum depth. 

    Regards,
    Balázs
  • CarlNCarlN Member Posts: 6 Contributor II
    I am using least_square in the Random Forest decision tree and it's still giving me an error (see below).  I still don't understand why it's not working.  Please educate me on the specific, step by step, how-to instructions to make this work.  Thank you much.


Sign In or Register to comment.