The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Compare attribute columns based on value ranges?

Fred12Fred12 Member Posts: 344 Unicorn
edited November 2018 in Help

hi,

I want to compare the values from 2 attribute columns from 2 different excel files.. e.g radius1 and radius2,

now I want to "identify" those as equal (meaning, their ID is the same) if they are equal in a certain range, e.g radius1 = 1.77 and radius 2 = 1.78

 

like in a formula: if radius1 = between 1.02*radius2 and 0.98*radius2, then its equal!

then I want to join all the rows based on that equal row entries if it matches above formula.

 

is it somehow possible to identify equality based on ranges like above?

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn

    Hi!

     

    If you don't have too much data, you could do a Cartesian Join, then use Generate Attributes for calculating the difference and then Filter Examples for only keeping the examples with a small difference.

     

    If your example sets have many lines, Cartesian Join will create a huge data set. In that case, you might want to try this Generic Join approach with the built-in scripting:

    http://datascientist.at/2016/06/generic-joins-in-rapidminer/#english

     

    Regards,

     

    Balázs

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    If you are only interested in casewise comparison of radius1 and radius2 values, then @BalazsBarany method works equally well without the Cartesian join--just use generate attribute to calculate the difference and filter those that meet your threshhold.  But if you do want a pairwise comparison of all possible combinations of radius1 and radius2, I hope you have a small dataset!  The combinations inflate pretty quickly :-) .

     

    Best,

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.