NORMALIZE FOR AN ATTRIBUTE THAT TAKES A VALUE EITHER 0 OR 1

Maria_L · April 2021

Hello, everyobody!

I' ve recently started using rapid miner and educating myself in data mining - analysis.
While i was testing an example set of data that referred to a questionnaire , I spotted an attribute that took values either 0 or 1, while all the other attributes were taking values to a range from 1 to 5. I cannot exclude any of the attributes to my analysis so Im thinking to normalize. What are you suggesting that I should do?
I tried the range method from 0.0 to 1.0 for all attributes, but is it right considering my disputed attribute isnt getting values from 0 to 1, but it takes EITHER 0 OR 1.

MartinLiebig · April 2021

Hi,

sorry, but what do you expect here? The range normalization makes sure, that the smallest value in your data set is 0, and the biggst one is 1. So if you come in with an attribute which is only 0 and 1 it can only map it to 0 and 1?

Best,

Martin

BalazsBarany · April 2021

Hi @Maria_L,

you don't need to normalize the data for most machine learning algorithms. But of course you can do it even for these.

You can easily normalize the data for a range of 0 to 1. As Martin wrote, the 0/1 attribute won't change the values, others will be on the same magnitude (0, 0.2, 0.4, ... 1). This might help you understand your models better.

Of course you could do the range transformation on the 0/1 attribute and just multiply by 5.

Be careful when normalizing. You might have an attribute without the "extreme" answers (0?, 1, 5). This would then be changed in a different way - the actual maximum (e. g.) would become 1 and so on. So you would change the scale of this one attribute compared to others.

Regards,
Balázs

Maria_L · April 2021

Hi!

Thanks for replying but Im not sure I understood your answer.
My attribute takes 0 or 1 meaning in reality that refers to a question in a a questionnaire that takes EITHER YES (1) OR NO (0). All other questions in same questionnaire of the dataset are taking answers to a range 1 to 5.
So Im wondering if there's any logic to transform the YES OR NO question which is either 1 or 0 to a range 1 to 5.
Between those two options I prefer to normalize all other questions (1 to 5) to a range 0.0 to 1.0.
What do you think?

Thank you all in advance!

Maria_L · April 2021

HI @BalazsBarany,

Thank you for your kind reply! Good manners are always the best attributes!

I' m not even a week on this field and I really want to learn. Also, my background isn't a mathematical one.

So, If I understood correctly, you suggest that I could apply the range transformation to just that single attribute to a range from 1 to 5?

I also have an another question. In one other paper, I have to extract some association rules from a dataset. I'm asked to publish the 10 most powerful ones associated with one particular attribute. So, Im running a process, and it comes up that there are only two association rules which reffer strictly to this attribute like
i.e. [X= '(2.5-inf)'] ---> [Y'( 2.5.-inf)' ], [Z '(2.5-inf)'], (confidence : 0.9)

All other results came like [X= '(2.5-inf)'], [Y'( 2.5.-inf)' ] --->[Z '(2.5-inf)'], (confidence : 0.9)

So, my question is : when the attribute in dispute is X should I include in my result list just the ones like the example above or am I allowed to list all others that X is appeared in combination with other attributes?

Thank you in advance!

BalazsBarany · April 2021

Hi Maria_L,

just go over your attribute list check the minimum and maximum values and decide accordingly, using the criteria I listed.
The most important thing is to understand which attributes have to be changed and how. And afterwards, check if the transformation is according to your expectations: e. g. min is 0, max is 1, etc.

You can change the filter parameters to get more association rules, ordered by confidence for example.

It depends on the question or the problem you're trying to solve if only the X => rule is relevant or also the X, Y => rule is. What is relevant in the real world? How often is the X, Y rule seen? It's your decision how you decide which rule is more important.

Regards,
Balázs

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

NORMALIZE FOR AN ATTRIBUTE THAT TAKES A VALUE EITHER 0 OR 1

Best Answers

Answers