The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

NORMALIZE FOR AN ATTRIBUTE THAT TAKES A VALUE EITHER 0 OR 1

Maria_LMaria_L Member Posts: 3 Learner I
Hello, everyobody!

I' ve recently started using rapid miner and educating myself in data mining - analysis. 
While i was testing an example set of  data that referred to a questionnaire , I spotted an attribute  that took values either 0 or 1, while all the other attributes were taking values to a range from 1 to 5. I cannot exclude any of the attributes to my analysis so Im thinking to normalize. What are you suggesting that I should do?
I tried the range method from 0.0 to 1.0 for all attributes, but is it right considering my disputed attribute isnt getting values from 0 to 1, but it takes EITHER 0 OR 1.

Best Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,525 RM Data Scientist
    Solution Accepted
    Hi,
    sorry, but what do you expect here? The range normalization makes sure, that the smallest value in your data set is 0, and the biggst one is 1. So if you come in with an attribute which is only 0 and 1  it can only map it to 0 and 1?

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi @Maria_L,

    you don't need to normalize the data for most machine learning algorithms. But of course you can do it even for these.

    You can easily normalize the data for a range of 0 to 1. As Martin wrote, the 0/1 attribute won't change the values, others will be on the same magnitude (0, 0.2, 0.4, ... 1). This might help you understand your models better. 

    Of course you could do the range transformation on the 0/1 attribute and just multiply by 5.

    Be careful when normalizing. You might have an attribute without the "extreme" answers (0?, 1, 5). This would then be changed in a different way - the actual maximum (e. g.) would become 1 and so on. So you would change the scale of this one attribute compared to others.

    Regards,
    Balázs

Answers

  • Options
    Maria_LMaria_L Member Posts: 3 Learner I
    Hi!

    Thanks for replying but Im not sure I understood your answer.
    My attribute takes 0 or 1 meaning in reality that refers to  a question in a a questionnaire that takes EITHER YES (1) OR NO  (0). All other questions in same questionnaire of the dataset are taking answers to a range 1 to 5.
    So Im wondering if there's any logic to transform the YES OR NO question which is either 1 or 0  to a range 1 to 5.  
    Between those two options I prefer to normalize all other questions (1 to 5)  to a range 0.0 to 1.0.
    What do you think?

    Thank you all in advance!
  • Options
    Maria_LMaria_L Member Posts: 3 Learner I
    HI @BalazsBarany,

    Thank you for your kind reply! Good manners are always the best attributes!

    I' m not even a week on this field and I really want to learn. Also, my background isn't a mathematical one.

    So, If I understood correctly, you suggest that I could apply the range transformation to just that single attribute to a range  from 1 to 5?

    I also have an another question. In one other paper, I have to extract some association rules from a dataset. I'm asked to publish the 10 most powerful ones associated with one particular attribute. So, Im running a process, and it comes up that there are only two association rules which reffer strictly to this attribute like
     i.e. [X= '(2.5-inf)'] ---> [Y'( 2.5.-inf)' ], [Z '(2.5-inf)'], (confidence : 0.9)

    All other results came like [X= '(2.5-inf)'], [Y'( 2.5.-inf)' ] --->[Z '(2.5-inf)'], (confidence : 0.9)

    So, my question is : when the attribute in dispute is X should I include in my result list just the ones like the example above or am I allowed to list all others that X is appeared in combination with other attributes?

    Thank you in advance!
  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi Maria_L,

    just go over your attribute list check the minimum and maximum values and decide accordingly, using the criteria I listed.
    The most important thing is to understand which attributes have to be changed and how. And afterwards, check if the transformation is according to your expectations: e. g. min is 0, max is 1, etc.

    You can change the filter parameters to get more association rules, ordered by confidence for example.

    It depends on the question or the problem you're trying to solve if only the X => rule is relevant or also the X, Y => rule is. What is relevant in the real world? How often is the X, Y rule seen? It's your decision how you decide which rule is more important.

    Regards,
    Balázs
Sign In or Register to comment.