The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to convert numerical values in result file back to original nominal values of input
Hung_Bui_221
Member Posts: 5 Learner I
Hello everyone! I am just a beginner whom have just started to study RM for a few months. I am having a group problem to detect the outliers of Bank Marketing Dataset. This is my process (image below).
The dataset has more than 40.000 examples and Outlier Detection Operator seems too slow for both Nominal and Numerical values so I decided to change all of Nominal values into Numerical.
After running this process, I obtained result file and I would like to convert all of the Numerical values that I changed before back to Original Nominal values like the input file. Manual converting is absolutely the last choice but I wonder if I can do it as fast as possible by using the operators of RM or something else.
Please help me to find out the best way for this case asap Thank you very much.
The dataset has more than 40.000 examples and Outlier Detection Operator seems too slow for both Nominal and Numerical values so I decided to change all of Nominal values into Numerical.
After running this process, I obtained result file and I would like to convert all of the Numerical values that I changed before back to Original Nominal values like the input file. Manual converting is absolutely the last choice but I wonder if I can do it as fast as possible by using the operators of RM or something else.
Please help me to find out the best way for this case asap Thank you very much.
0
Best Answers
-
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
Do you have an ID in your data? If not, you can also use the Generate ID operator to get one. Then you use Join to get back the original data and add the generated outlier score to that.
By the way, Local Outlier Factor is a nearest neighbor-based method, so it works best with normalized input. Use the Normalize operator before applying it, you should get better results with that. The join-based method for getting the original data is applicable there, too.
Regards,
Balázs1 -
BalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 UnicornHi!
Normalizing changes all numeric attributes to be roughly between 0 and 1 (or -1 and 1), depending on the method.
Nearest-neighbors methods compare values of different attributes with each other. This means that an attribute with large numerical values (e. g. money amounts) would dominate all the other attributes (age in years, 0/1 in nominal to numerical transformation etc.) and determine the neighborhood alone. Normalizing avoids this and gives all attributes a better chance to determine the distance calculations.
Regards,
Balázs1
Answers
After I used Normalize Operator for all attributes, the datatype and the values was changed. Such as Age, first this attribute contained the age of customers (40, 50, 60 years old...), but then the datatype and the values was changed into real (attached image).
I wonder if this affects the result. Please tell me more. Thank you again.