The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
test and train data set
abeetbhat1995
Member Posts: 6 Contributor I
should i make two data sets if i want to use algorithms ..and if i want to make dataset on my own should i create a single excel file or two excel files having one of them as training dataset and the other one as test data set and what difference should i keep in training dataset and the test dataset if these are two different files
Tagged:
0
Answers
Hi @abeetbhat1995,
1.You can create :
- one excel file with the training set in the sheet n°1 and the test set in the sheet n°2 (in this case in the 2 Read Excel operators,
don't forget to specify the number of the sheet).
or
- two excel files (one for the training set and the second for the test set)
2. Your training set and test set have to contain the same attributes and your training set have to contain the label in addition.
Example :
training set : test set :
Att1 Att2 Att3 label Att1 Att2 Att3
a b c 2 z y x
j k l 3 t u v
m n o 4 g h i
3. an example of simple fictive process :
Regards,
Lionel
You may want to look at the training video series on modeling and validation on this page: https://rapidminer.com/training/videos/
RapidMiner has a lot of built-in functionality around model validation that you should take advantage of. Cross-validation in particular is an approach that is considered "best practice" and should be part of your workflow. It does not require you to split your labeled data into separate training and testing sets.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts