The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Is it a must to use cross validation for higher accuracy as ive not learned before ?
Joannach0ng
Member Posts: 7 Learner I
what is the meaning of training and testing for split validation ?thank you for your help!
Tagged:
0
Answers
Split Validation: In split validation, we divide the data set into X: Y (train: test) ratio such as 70 percent of data for training and 30 percent for testing. The model will be trained on the 70 percent dataset where it will try to learn patterns in data to come to a decision-making point that says which data sample belongs to which class (label). Then once the training is done, to test the performance of an algorithm on unseen data, 30 percent of data is used to make predictions based on training.
Advantages: 1. Computationally inexpensive, 2. Time-efficient for very large datasets.
Disadvantages: 1. Performance metrics might vary (huge variations) if test set changes 2. Models might overfit during training
Cross-Validation: In cross-validation, the total data sets are divided into multiple folds. For example, in a 5 fold cross-validation the dataset is divided into 5 subsets (20% data in each subset). Once divided, the model is trained on first four folds and then tested on last fold and the performance metrics of this are stored, the model again is trained on last four folds and tested on the first fold and the metrics are stored and this happens until all the folds (subsets) of data are tested and the final performance is the aggregation of all subset testings.
Advantage: 1. Reliable performance 2. Reduces overfitting
Disadvantages: Computationally expensive and time taking
Accuracy: Cross-validation doesn't increase or decrease accuracy but it is a reliable method to estimate the model. There are chances your performance might improve or severely reduced based on the data.
For an in-depth understanding or cross-validation please refer to below thread.
https://community.rapidminer.com/discussion/55112/cross-validation-and-its-outputs-in-rm-studio
Hope this helps
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing