The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
handling empty values
Hello,
I'm currently working on my master's thesis. Part of the work is a customer segmentation by means of a cluster analysis.
One variable for the cluster determination shall be the chronological sequence of product categories purchased. For example, customer 10 has bought as the first product category A, then the product category C and an article from the group X. That means that the changing of the purchase behavior of the customers should be included in the analysis.
But I don't know what will be the best way to map the data.
My idea was to divide this criterion in different variables. I wanted to create a new variable for each purchase of a new category made. So finally I get the variable "first selling category" to "10th selling category". This variables can take the value of the category names.
The problem with this approach is that each customer buys a different number of products. If a customer buys 3 different product categories, there will be in the first 3 columns the desired values and in the remaining columns will be no value.
Because the various clustering algorithms cannot handle missing values, I am now at a loss.
Is there another method to map the criterion or a possibility of using empty values?
I would be very happy about a tip.
Thank you in advance for your help.
Markus
I'm currently working on my master's thesis. Part of the work is a customer segmentation by means of a cluster analysis.
One variable for the cluster determination shall be the chronological sequence of product categories purchased. For example, customer 10 has bought as the first product category A, then the product category C and an article from the group X. That means that the changing of the purchase behavior of the customers should be included in the analysis.
But I don't know what will be the best way to map the data.
My idea was to divide this criterion in different variables. I wanted to create a new variable for each purchase of a new category made. So finally I get the variable "first selling category" to "10th selling category". This variables can take the value of the category names.
The problem with this approach is that each customer buys a different number of products. If a customer buys 3 different product categories, there will be in the first 3 columns the desired values and in the remaining columns will be no value.
Because the various clustering algorithms cannot handle missing values, I am now at a loss.
Is there another method to map the criterion or a possibility of using empty values?
I would be very happy about a tip.
Thank you in advance for your help.
Markus
0