Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Compare attribute columns based on value ranges?

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help

hi,

I want to compare the values from 2 attribute columns from 2 different excel files.. e.g radius1 and radius2,

now I want to "identify" those as equal (meaning, their ID is the same) if they are equal in a certain range, e.g radius1 = 1.77 and radius 2 = 1.78

 

like in a formula: if radius1 = between 1.02*radius2 and 0.98*radius2, then its equal!

then I want to join all the rows based on that equal row entries if it matches above formula.

 

is it somehow possible to identify equality based on ranges like above?

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi!

     

    If you don't have too much data, you could do a Cartesian Join, then use Generate Attributes for calculating the difference and then Filter Examples for only keeping the examples with a small difference.

     

    If your example sets have many lines, Cartesian Join will create a huge data set. In that case, you might want to try this Generic Join approach with the built-in scripting:

    http://datascientist.at/2016/06/generic-joins-in-rapidminer/#english

     

    Regards,

     

    Balázs

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    If you are only interested in casewise comparison of radius1 and radius2 values, then @BalazsBarany method works equally well without the Cartesian join--just use generate attribute to calculate the difference and filter those that meet your threshhold.  But if you do want a pairwise comparison of all possible combinations of radius1 and radius2, I hope you have a small dataset!  The combinations inflate pretty quickly :-) .

     

    Best,

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.