The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Should you normalize dummy coded variables in clustering?

CuriousCurious Member Posts: 12 Learner I
edited June 2019 in Help
Can you keep them as dummies and only normalize numeric variables?
Tagged:

Best Answer

Answers

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    I would say this depends on the normalization.  If you normalize the rest to the range between 0 and 1, you can keep them as is.  Otherwise I would personally normalize all columns the same way (e.g. z-transformation).
    Hope this helps,
    Ingo
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    i usually use PCA after dummy coding to get rid of the problem.
    Best,
    Martin 
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    @mschmitz but doesn't that get rid of your underlying attributes as well and replace them with synthetic PCs?  That's probably not a helpful feature for clustering, or at least it wouldn't be for most of the clustering projects I have worked on.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    @Telcontar120,
    i later on join the original data back to the clustering results and start to interpret from there.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.