The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Dummy Encoding in Rapidminer
Hi, I am new to Rapidminer and building my first predictive model. While working on the feature engineering part I used dummy encoding on one of the categorical columns, it gave me columns based on the number of categories present in that column. Ideally, it should give n-1 column else multicollinearity will increase as per my understanding. Any trick to get rid off from this issue. Do I need to manually delete one of the generated columns after applying dummy encoding?
Guys, please share your thoughts.
Regards,
0
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 UnicornMost of the modern ML algorithms implemented in RapidMiner include adjustments for perfect multi-collinearity if needed, so dummy coding is actually just fine. But the Nominal To Numerical operator supports the n-1 encoding approach as well, just select the "effect coding" option in the coding type parameter instead of dummy coding and then specify the omitted categories in the resulting "comparison groups" dialog box. This is tedious for a large number of attributes, though, so if you can use dummy coding, that is preferable.
5
Answers