The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to retain real nominal values after nominal2numerical and processing?
I use these operators for 'Training' with the i/p data set:
- ExampleSource
- Nominal2Numerical
- LinearRegression
- ModelWriter
and I use the following ones for 'Testing' with the model created in the previous execution:
- ExampleSource
- Nominal2Numerical
- ModelLoader
- ModelApplier
What happens in the output result is that the columns meant for the nominal values show the internal index numbers that were assigned for each of the nominal values.
For example, if if I have a column 'login id' with values 'A1', 'A2', 'B3' etc., the output is generated with '0', '1', '2',.. etc. Basically, I loose the real nominal values and I have to do the mapping manually to understand which output record is meant for which input record given in 'Testing' phase.
- ExampleSource
- Nominal2Numerical
- LinearRegression
- ModelWriter
and I use the following ones for 'Testing' with the model created in the previous execution:
- ExampleSource
- Nominal2Numerical
- ModelLoader
- ModelApplier
What happens in the output result is that the columns meant for the nominal values show the internal index numbers that were assigned for each of the nominal values.
For example, if if I have a column 'login id' with values 'A1', 'A2', 'B3' etc., the output is generated with '0', '1', '2',.. etc. Basically, I loose the real nominal values and I have to do the mapping manually to understand which output record is meant for which input record given in 'Testing' phase.
0
Answers
If I understand your problem correctly, then you can get arount the problem by saving a copy of the original nominal value before you use Nominal2Numerical. So your process might look like:
- ExampleSource
- AttributeCopy
- ChangeAttributeRole
- Nominal2Numerical
- ModelLoader
- ModelApplier
In AttributeCopy you can create a copy of the "login_id" and call it for example "login_id_copy" and in ChangeAttributeRole you can turn "login_id_copy" into for example an id attribute which will be unaffected by Nominal2Numerical.
Regards,
Andreas
More importantly: Do you think your process setup makes a lot of sense? A linear regression over login ids doesn't look particularly promising to me. Maybe try Nominal2Binominal.
I may have got your question wrong and RapidMiner is not confusing the internal indices between the two processes, and your point is only that you dislike the "output", i.e. the fact that your predictions are 0,1,2... rather than login ids. Well, that perfectly corresponds to the fact that linear regression on nominal data coming in disguise of numbers doesn't make much sense.
Best,
Simon
But now, I tried to remove the unwanted attribute ('login_id' which is shown with internal index values) from the output in the whole process.
I used
- AttributeFilter (attribute_name_filter, 'login_id', invert_filter=true)
- ChangeAttributeName (old_name='login_id_copy', new_name='login_id')
The 'AttributeFilter' operation works fine i.e. it removes the 'login_id' column. But, the 'ChangeAttributeName' fails with the exception: 'Cannot rename attribute. Duplicate name: login_id'. So, it looks like the attribuite which was removed in 'AttributeFilter' is still kept inside and that causes the error during the 'ChangeAttributeName' operation. How do we solve it?