🎉 🎉. RAPIDMINER 9.8 IS OUT!!! 🎉 🎉
RapidMiner 9.8 continues to innovate in data science collaboration, connectivity and governance
Determing new sales regions
I'm pretty new to data mining and I would like to hear the opinion of you experts here around, maybe you can help.
I've got a scenario where a shop owner from Hamburg wants to open another store in Berlin and wants to know which city district would be suitable for it.
I have a set of data about urban districts with values about the employment rate, ages of the inhabitants, maritial status, purchasing power,... Here's an example:
There are also a set of customer data from the store in Hamburg (CustomerNo, address, district)
My goal is to determine which district in Berlin is the most suitable for the shop owner to open another shop due to the data set about the districts and his customer data.
My approach would be:
- get the top district of the customer data (e.g. Hamburg St.Pauli)
- determine via cluster analysis which district in Berlin is similar to Hamburg St. Pauli
1. Would a clustering analysis be a suitable way to solve this problem?
2. If so, which clustering algorithm is suitable for this kind of data?
3. if not, what other methods would be more suitable?
4. The data set with the district data has many attributes. Is a high number of attributes only a performance issue or is there a danger to get "too much data to analyse"? I have seen that there are some operators in RM5 to remove uninteresting operators.
Edit: If this is the wrong forum for this question, I apologize.