The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Error when applying a trained model to a new unlabeled data set
I want to apply a Naive Bayes model to a new (unlabeled) data set. The model has already been trained and tested via cross-validation. However when I try to apply the model to a brand new data set I get an error message.
Here is an overview of my process and the error I get:
The "Retrieve aggregate" is the new (unlabeled) data set, which I want to predict using my trained model.
"Process Documents from Data" contains a "Tokenize" operator.
The subprocesses within the Cross Validation operator are:
I am new to RapidMiner and I have no clue as to why I get this error
I would greatly appreciate your help as I need to carry on with my research
Here is an overview of my process and the error I get:
The "Retrieve aggregate" is the new (unlabeled) data set, which I want to predict using my trained model.
"Process Documents from Data" contains a "Tokenize" operator.
The subprocesses within the Cross Validation operator are:
I am new to RapidMiner and I have no clue as to why I get this error
I would greatly appreciate your help as I need to carry on with my research
Tagged:
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn@Stann,
Yes it is possible :
As said apply the same preprocessing steps in your test set "branch"
and connect the word output (wor) of Process Documents from Data operator of your training "branch" to the word input (wor) of your Process Documents from Data of your test set branch.
Regards,
Lionel1
Answers
The attributes have to be strictly the same in your training set and in your unlabeled test set.
Thus you have to apply strictly the same preprocessing steps to your unlabeled test set (thus you have to apply
Nominal to text and Process Documents from data operators to your test set) . Currently you are applying the raw test set to your model...
Hope this helps,
Regards,
Lionel
Having the exact same attributes would be impossible as each attribute is a token (word) which appeared in the initial text document. Since the new (unlabeled) data set contains different text documents as the training set, the attributes would always differ, because the text documents in the new data set are comprised of "new" tokens.
Having said that, is there still a way to apply the model to a new (unlabeled) set?