The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Clustering Dummy Variables
mario_sark
Member Posts: 13 Contributor I
Dears,
I am working on to segment a list customers into different cluster based on different variables, but some of these variables are Dummy variables for example below is the list of variables that i will use to apply the clustering technique:
Unpaid : Yes/No (dummy)
Deposit : Continuous (Some Customers has Zero deposits)
Term Deposits: Continuous (some customer has Zero Term Deposits)
Number of returned Checks : discrete (Some Customers Has Zero)
Insurance Product : discrete (some Customer has Zero) - this can be transform into (Yes /No)
Credit Card Spending : Continuous ( Some customers has zero since they don't hold credit Cards)
Number of Product (Loans) : it can be number of Car Loan ,Personal Loan, Housing Loans, ...(some customer has zero)
What is the best algorithm in RapidMiner i can use to cluster these customers into different segments to highlight the less profitable group.
As i know K-means can hold only continuous variable, and i am afraid to normalize the dummy variables available in the data set.
Hope That you can help with this. !!
Thank you in advance,
Mario
I am working on to segment a list customers into different cluster based on different variables, but some of these variables are Dummy variables for example below is the list of variables that i will use to apply the clustering technique:
Unpaid : Yes/No (dummy)
Deposit : Continuous (Some Customers has Zero deposits)
Term Deposits: Continuous (some customer has Zero Term Deposits)
Number of returned Checks : discrete (Some Customers Has Zero)
Insurance Product : discrete (some Customer has Zero) - this can be transform into (Yes /No)
Credit Card Spending : Continuous ( Some customers has zero since they don't hold credit Cards)
Number of Product (Loans) : it can be number of Car Loan ,Personal Loan, Housing Loans, ...(some customer has zero)
What is the best algorithm in RapidMiner i can use to cluster these customers into different segments to highlight the less profitable group.
As i know K-means can hold only continuous variable, and i am afraid to normalize the dummy variables available in the data set.
Hope That you can help with this. !!
Thank you in advance,
Mario
Tagged:
0
Answers
Thank you for your reply, the list of customer that i am going to clusters is around 70,000 Customers.
I was wondering if there is any algorithm other than K-means. I
i am looking forward also to read about other possibilities.
Thank you,
Mario
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts