The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Ideal ratio with respect to scoring dataset and training dataset

AbiAbi Member Posts: 1 Learner III

Like the 70 - 30 ratio for trainig and testing, is there a suggested ratio for the datasets of training and scoring?

(This is so as to reduce the training data to the correct proportion for best scoring)

Best Answers

Answers

  • hbajpaihbajpai Member Posts: 102 Unicorn
    Hey @Abi,

    Scoring typical is real time rather than batch. I assume you mean train, dev/hold-out and test sets ratio. Thumb rule is, If the number of rows is less than 100k it could be 60%,20%,20% or 70%,15%,15%. But if you have 1 million or more rows, it could 98%,1%,1% or even 99.5%,0.4%,0,1%.

    As far as reducing the total rows goes, a trick is to train the model on the whole data post your validation of the final model. 


    Best,
    Harshit
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Scoring typical is real time rather than batch.
    I would challenge you on this. In Customer Analytics its often fine to do scorings once a day / once a week.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • varunm1varunm1 Member Posts: 1,207 Unicorn
    edited April 2020
    Totally agree with @Telcontar120 on CV. If one cannot afford to implement CV due to time constraints, huge data or specific needs, then other validation similar to AM can be used
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.