The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Auto Model: Performance is worse when auto feature selection / generation turned on?

cramsdencramsden Member Posts: 42 Learner III
Hello, I am new to the machine learning world am self teaching myself by playing around with rapid miner studio.  I have just noticed something that doesn't seem to make sense to me and am hoping someone could explain it to me. 

I put the same data set in auto model and at first ran it with 'automatic feature selection / generation' turned off, then I ran it again with feature selection/generation turned on.  

When 'automatic feature selection / generation' was turned on the performance of the model was worse than when it was off.  I am a bit confused why adding feature selection / generation would potentially make a model worse, if there aren't any features that improve the performance of the model would they not just be rejected and the original model would come out, so the performance should only be the same or better?

Again I am very new to this and am just a bit confused here,  any help would be greatly appreciated!

Thank you

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    just a thought: Might be that the FS is actually overtraining the model and thus the testing error is worse.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • cramsdencramsden Member Posts: 42 Learner III
    My thought was if the FS didn't make any improvements it would just stick with the basic model?   Maybe I am confused about how the training / testing all works though.  Again I am very new to this and just playing around
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    i usually tend to think about model fitting and feature selection as one thing which needs to be seen together. So adding FS to your model increases the 'degrees of freedom' of your model generation method. More degrees of freedom means more options to overfit.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • cramsdencramsden Member Posts: 42 Learner III
    Ah, Ok so the results of the models I am seeing, the correlation, rmse etc.  are all based off of a subset of the data that is held back? 

    I was thinking it was picking the best model based on those results, so that must be where I was confused.
Sign In or Register to comment.