Dummy Encoding in Rapidminer

Adi1215 · July 2020

Hi, I am new to Rapidminer and building my first predictive model. While working on the feature engineering part I used dummy encoding on one of the categorical columns, it gave me columns based on the number of categories present in that column. Ideally, it should give n-1 column else multicollinearity will increase as per my understanding. Any trick to get rid off from this issue. Do I need to manually delete one of the generated columns after applying dummy encoding?

Guys, please share your thoughts.

Regards,

Telcontar120 · July 2020

Most of the modern ML algorithms implemented in RapidMiner include adjustments for perfect multi-collinearity if needed, so dummy coding is actually just fine. But the Nominal To Numerical operator supports the n-1 encoding approach as well, just select the "effect coding" option in the coding type parameter instead of dummy coding and then specify the omitted categories in the resulting "comparison groups" dialog box. This is tedious for a large number of attributes, though, so if you can use dummy coding, that is preferable.

Adi1215 · July 2020

Thanks for this. I'll try this out and let you know.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Dummy Encoding in Rapidminer

Guys, please share your thoughts.

Regards,

Best Answer

Answers