The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Determing new sales regions
Hello,
I'm pretty new to data mining and I would like to hear the opinion of you experts here around, maybe you can help.
I've got a scenario where a shop owner from Hamburg wants to open another store in Berlin and wants to know which city district would be suitable for it.
I have a set of data about urban districts with values about the employment rate, ages of the inhabitants, maritial status, purchasing power,... Here's an example:
*******************
*******************
There are also a set of customer data from the store in Hamburg (CustomerNo, address, district)
My goal is to determine which district in Berlin is the most suitable for the shop owner to open another shop due to the data set about the districts and his customer data.
My approach would be:
- get the top district of the customer data (e.g. Hamburg St.Pauli)
- determine via cluster analysis which district in Berlin is similar to Hamburg St. Pauli
My questions:
1. Would a clustering analysis be a suitable way to solve this problem?
2. If so, which clustering algorithm is suitable for this kind of data?
3. if not, what other methods would be more suitable?
4. The data set with the district data has many attributes. Is a high number of attributes only a performance issue or is there a danger to get "too much data to analyse"? I have seen that there are some operators in RM5 to remove uninteresting operators.
Thanks
Edit: If this is the wrong forum for this question, I apologize.
I'm pretty new to data mining and I would like to hear the opinion of you experts here around, maybe you can help.
I've got a scenario where a shop owner from Hamburg wants to open another store in Berlin and wants to know which city district would be suitable for it.
I have a set of data about urban districts with values about the employment rate, ages of the inhabitants, maritial status, purchasing power,... Here's an example:
*******************
There are also a set of customer data from the store in Hamburg (CustomerNo, address, district)
My goal is to determine which district in Berlin is the most suitable for the shop owner to open another shop due to the data set about the districts and his customer data.
My approach would be:
- get the top district of the customer data (e.g. Hamburg St.Pauli)
- determine via cluster analysis which district in Berlin is similar to Hamburg St. Pauli
My questions:
1. Would a clustering analysis be a suitable way to solve this problem?
2. If so, which clustering algorithm is suitable for this kind of data?
3. if not, what other methods would be more suitable?
4. The data set with the district data has many attributes. Is a high number of attributes only a performance issue or is there a danger to get "too much data to analyse"? I have seen that there are some operators in RM5 to remove uninteresting operators.
Thanks
Edit: If this is the wrong forum for this question, I apologize.
0
Answers
but here I'm just trying to give my view....
I think your problem is a little bit suitable to classification one... but it doesn't matter if you want to employ clustering technique in this problem, because RM can map a cluster into a classification scheme.
bear in mind just like this,
just consider from your data, about those suitable for building a store, and those that are not in reality, then add a label (e.g suitable, not suitable). this is for training set in classification, and those unlabeled data is your testing set. (e.g. if you consider hamburg st. pauli and hamburg altona as suitable, then add label "suitable", and if you consider hamburg xxx and hamburg yyy is not suitable, add label not suitable, but it all based on reality). and then the berlin xxx data is your id on testing set.
after that you can build a model and validate your model to get the best accuracy. the best model can easily used for prediction, if i look at structure and type of your data, I think it would be wise if you employ neural net modelling, as for the suitable method for clustering... I'm sorry. I'm not good in clustering.
I hope this can help. After All I would like to say sorry for my english, I'm still learning, I'm Indonesian. ;D
regards,
Dimas Yogatama