Options

# Normalization Issue

Member Posts: 12 Learner I
Hello Rapid Miner Community,

I'm currently working on a clustering model.
I cluster different countries according to certain determinants.
However, the determinants are composed of different factors (example: Determinant: Degree of economic integration is composed of the factors: Trade Freedom and Trading across borders. The determinant transport infrastructure consists only of the factor: LPI Index).
I use the normalization operator to isolate different scale levels.
However, each determinant (degree of economic integration and transport infrastructure) should be equally weighted, since one determinant consists of more indicators than the other, it is overweighted so far.
My question to you is how I should proceed in RapidMiner in order to weight each determinant equally without having to aggregate the individual factors of a determinant.

Thank you for your support and hints.

Best regards, Carlo

• Options
RapidMiner Certified Analyst, Member Posts: 344 Unicorn
Hi @Carlo

the task seems interesting, but I don't understand what you mean by determinants. Could you provide some further explanation?

It seems that there are categories and subcategories, but I still don't get what the connection with the clustering is.

Regards,
Sebastian

• Options
Member Posts: 12 Learner I

I'm sorry, I might have expressed myself a little awkwardly.
There are altogether 5 determinants, these are quasi the main categories.
The determinants consist of a different number of factors (subcategory).
I try to illustrate it with two determinants:
• The determinant or main category transport infrastructure consists of one factor, namely the LPI index.
• The determinant or main category homogeneity of demand consists of the factors or subcategories purchasing power, market size and article turnover.
To ensure that each subcategory has the same weighting for the respective determinant, I normalize it (since purchasing power is given on a scale of 1 to 10 market size on a scale of 1 to 10 million).
In the second step (and this is my problem) I would now like to balance the main categories as well, since one determinant consists of only one factor and the other determinant of three factors, I do not know how to proceed and would be very pleased about your opinions.

I hope I explained it better this time

• Options
Moderator, Member Posts: 1,207 Unicorn
Hello Carlo,

Not sure if I understood this correctly, but if you have an issue with the number of dimensions (Attributes) per determinant, why not apply dimensionality reduction techniques like PCA?

Thanks
Regards,
Varun
https://www.varunmandalapu.com/