The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
NominalToNumerical inconsistency with different sources
pablo_admig
Member Posts: 5 Contributor II
The situation could be replicated with the Template "Apply to Test Set" having, i.e., one nominal column, and changing the kNN model for a Neural Network.
So, in order to use the Neural Network (or any alghoritm that does not support nominal attributes), I have to convert that attribute to a numerical one with the NominalToNumerical operator, and RapidMiner does a "mapping" of each category. For example, the operator reads category "Sunny" in that column and assigns the number 1, reads the category "cloudy" and assigns the number 2, and so on.
The problem comes when this mapping or conversion is not the same in Training and Test set, because I need two NominalToNumerical operators, (Training and Test set), and they are not related, so each one will convert the category into numbers following the natural order of each table. For example, if the first record of the training set has "Sunny", it will convert into 1. And if the first record of the Test set has "Cloudy", it will convert into 1 as well ! So for the neural network Cloudy=Sunny, turning this into a serious problem.
I want to know if it has a solution into the RapidMiner enviorment.
Thanks in advance,
Pablo.
So, in order to use the Neural Network (or any alghoritm that does not support nominal attributes), I have to convert that attribute to a numerical one with the NominalToNumerical operator, and RapidMiner does a "mapping" of each category. For example, the operator reads category "Sunny" in that column and assigns the number 1, reads the category "cloudy" and assigns the number 2, and so on.
The problem comes when this mapping or conversion is not the same in Training and Test set, because I need two NominalToNumerical operators, (Training and Test set), and they are not related, so each one will convert the category into numbers following the natural order of each table. For example, if the first record of the training set has "Sunny", it will convert into 1. And if the first record of the Test set has "Cloudy", it will convert into 1 as well ! So for the neural network Cloudy=Sunny, turning this into a serious problem.
I want to know if it has a solution into the RapidMiner enviorment.
Thanks in advance,
Pablo.
0
Answers
yes, there is a solution: you don't have to worry about this as far as I know
The neural net model, as all other models, keeps the header information of the input example set used for training. This information also contains the information about the used mapping, i.e. the fact that "Sunny" was assigned to "1" and so on. During model application, the incoming values of the test set like "1" will first be translated to "Cloudy" (since this was the transformation used in the test set) and "Cloudy" will then be transformed again based on the training header information to "2" before the model actually is applied. So there is actually no serious problem - at least as long as no bug is preventing this automatic nominal mapping as it used to has a couple of years ago
If you want to transform the values yourself in order to make it absolutely sure without having to rely on the automatic mechanism described above, you could of course first use the operator "Map" to map the nominal values to "nominal" numbers and afterwards use "Parse Numbers" in order to transform them to real numbers. But I would actually not bother with this.
Cheers,
Ingo
I test in detail that with a simple example. And it's right, the prediction is the same. However, if I see the outputs of the conversions, in Training and Test set (with the label from the model), I could see the "inconsistency". That is, if I see the numbers instead of categorical values and their associated label, the label calculation is consistency, columns input (transformed to numerical) in the table with the label, are not.
Is it clear?
Regards,
Pablo.
yes, I see. But be assured: Those "inconsitencies" only exist as long as the model is not applied since this would make sure that the inconsistency is resolved. So sometimes it's easier to not look into too much details
Cheers,
Ingo