Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Normalization Issue

CarloCarlo Member Posts: 12 Learner I
Hello Rapid Miner Community,

I'm currently working on a clustering model.
I cluster different countries according to certain determinants.
However, the determinants are composed of different factors (example: Determinant: Degree of economic integration is composed of the factors: Trade Freedom and Trading across borders. The determinant transport infrastructure consists only of the factor: LPI Index).
I use the normalization operator to isolate different scale levels.
However, each determinant (degree of economic integration and transport infrastructure) should be equally weighted, since one determinant consists of more indicators than the other, it is overweighted so far.
My question to you is how I should proceed in RapidMiner in order to weight each determinant equally without having to aggregate the individual factors of a determinant.

Thank you for your support and hints.

Best regards, Carlo

Best Answer

Answers

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn
    Hi @Carlo

    the task seems interesting, but I don't understand what you mean by determinants. Could you provide some further explanation?

    It seems that there are categories and subcategories, but I still don't get what the connection with the clustering is.

    Regards,
    Sebastian

  • CarloCarlo Member Posts: 12 Learner I

    I'm sorry, I might have expressed myself a little awkwardly.
    There are altogether 5 determinants, these are quasi the main categories.
    The determinants consist of a different number of factors (subcategory).
    I try to illustrate it with two determinants:
    • The determinant or main category transport infrastructure consists of one factor, namely the LPI index.
    • The determinant or main category homogeneity of demand consists of the factors or subcategories purchasing power, market size and article turnover.
    To ensure that each subcategory has the same weighting for the respective determinant, I normalize it (since purchasing power is given on a scale of 1 to 10 market size on a scale of 1 to 10 million).
    In the second step (and this is my problem) I would now like to balance the main categories as well, since one determinant consists of only one factor and the other determinant of three factors, I do not know how to proceed and would be very pleased about your opinions.

    I hope I explained it better this time :smile:

  • varunm1varunm1 Member Posts: 1,207 Unicorn
    Hello Carlo,

    Not sure if I understood this correctly, but if you have an issue with the number of dimensions (Attributes) per determinant, why not apply dimensionality reduction techniques like PCA?

    Thanks
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.