The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Performance result: Training vs Test
HeikoeWin786
Member Posts: 64 Contributor I
in Help
Dear all,
I am new to rapidMiner and I wanted to perform NBC on airline dataset. I have a airline dataset with labelled data of sentiment (pos, neg, and netural). I had divided the dataset 75/25 data split and perform the text processing (i.e. nominal to text, data to document, preprocess document with tokenization, stopwords). However, when the result out in word from preprocess document operator, I found the neg,pos and netural data columns have all zero value. Then, after I implemented the NBC, I receive accuracy of 87% for training but 0.00% accuracy for the test dataset.
Can you please kindly help me to understand what I am missing here?
Thanks a lot in advance!
I am new to rapidMiner and I wanted to perform NBC on airline dataset. I have a airline dataset with labelled data of sentiment (pos, neg, and netural). I had divided the dataset 75/25 data split and perform the text processing (i.e. nominal to text, data to document, preprocess document with tokenization, stopwords). However, when the result out in word from preprocess document operator, I found the neg,pos and netural data columns have all zero value. Then, after I implemented the NBC, I receive accuracy of 87% for training but 0.00% accuracy for the test dataset.
Can you please kindly help me to understand what I am missing here?
Thanks a lot in advance!
Tagged:
0
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornIt's not really possible to diagnose the problem just from looking at this screenshot.
Here are some things you could check.
Did you perform the text pre-processing and processing before or after the split? If before, then there should be no issue, but if after, then you probably need to replicate the wordlist from the training set to the test set, otherwise the model inputs will not be consistent?
What tool did you use to get the sentiment? I recommend Extract Sentiment which is now part of the Text Processing extension.
5
Answers
Thanks a lot.
I revisit the whole process, I split the data and for test data, I used the word output from text pre-processing from train dataset. Then I received the result. But, the result for train data and test data is the same. Is this normal?
E.g. Train data --> Text preprocessing (store the word output) --> NBC
Test data --> Text preprocessing (input the word output from above step) --> NBC
The accuracy is 65% for both process, that is ideal?
thanks and regards,
Heikoe
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Thanks a lot for the explanation. Yes, I had followed the same process. And, every time my result for test and training (SVM or NBC) returns almost the same result.
I was a bit unsure if that is ideal thats why.
thanks much,
Heikoe