The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Is it a must to use cross validation for higher accuracy as ive not learned before ?

Joannach0ngJoannach0ng Member Posts: 7 Learner I
edited August 2019 in Help
what is the meaning of training and testing for split validation ?thank you for your help!

Answers

  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Hello @Joannach0ng

    Split Validation: In split validation, we divide the data set into X: Y (train: test) ratio such as 70 percent of data for training and 30 percent for testing. The model will be trained on the 70 percent dataset where it will try to learn patterns in data to come to a decision-making point that says which data sample belongs to which class (label). Then once the training is done, to test the performance of an algorithm on unseen data, 30 percent of data is used to make predictions based on training.

    Advantages: 1. Computationally inexpensive, 2. Time-efficient for very large datasets. 
    Disadvantages: 1. Performance metrics might vary (huge variations) if test set changes 2. Models might overfit during training 

    Cross-Validation: In cross-validation, the total data sets are divided into multiple folds. For example, in a 5 fold cross-validation the dataset is divided into 5 subsets (20% data in each subset). Once divided, the model is trained on first four folds and then tested on last fold and the performance metrics of this are stored, the model again is trained on last four folds and tested on the first fold and the metrics are stored and this happens until all the folds (subsets) of data are tested and the final performance is the aggregation of all subset testings. 

    Advantage: 1. Reliable performance 2. Reduces overfitting
    Disadvantages: Computationally expensive and time taking

    Accuracy: Cross-validation doesn't increase or decrease accuracy but it is a reliable method to estimate the model. There are chances your performance might improve or severely reduced based on the data. 

    For an in-depth understanding or cross-validation please refer to below thread.

    https://community.rapidminer.com/discussion/55112/cross-validation-and-its-outputs-in-rm-studio

    Hope this helps
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.