Questions about customer clustering/segmentation
Hello,
I’m new to rapidminer, I did all the tutorials, but when I try my own cases, its a bit difficult to find the rigth operators and parameters.
I want to cluster my customers (CustomerID) in three groups based on their transactions.
Transactionsattributes are:
Date of transaction (datatype: date)
Value of transaction (datatype: integer)
Number of transactions (datatype: integer)
I would like to give the customers with following features a higher rate (weight)
- more than one transactions
- with a higher transactionsvalue than average
- recent transactions (i.e. transactions in the last month)
Is their any possibilty to create a process in rapidminer, that reflect my requirements?
Which operator would be best for that use case?
Thanks for your help in advance and sorry for my poor english!
Franzi
Best Answer
-
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
'Generate Attribute' operator is your good friend, to achieve your goal 'to give the customers with following features a higher rate (weight)'
you can create several indicator attributes, for instance, to tag the customers who has any more than one transactions,
attribute name function expression
AnyTransaction if(Number of transactions>1, 1,0)
You can refer to the tutorial process for Generate Attribute, and get inspired by the example function expressions.
Happy RapidMining!
1
Answers
Dear Franzi,
my key question for you is: Do you want to classify/cluster by your own rules or by computer generated rules based on statistical reasoning?
In rapidminer we got a lot of operators which group customers together by their attributes. They find the rules for the grouping which are the best - given some statistical measure. Most likely they will be similar to the groups you had in mind, but not necessarly.
The operators for this would be: K-Means, K-Medoids, DBScan or maybe Agglomerative Clustering. Please be aware that all of those operators use a distance measure and thus need normalized data. You can normalize your data with teh Normalize operator.
Best,
Martin
Dortmund, Germany
Thank you a lot! The "Generate Attribute" helped me out.