Options

Comparing every row of an exampleset with all the rows in another

Pradyumna_26Pradyumna_26 Member Posts: 7 Contributor I
I have two examplesets, say A and B, with the same set of attribute names, and each individual row from A needs to be compared with all rows in B to be categorized based on a criteria on a particular attribute. My initial thought was to use a Loop Examples operator to iterate over the rows of A, and to retrieve B and apply Filter Examples operator within the loop (at every iteration). The problem was that I couldn’t find a way to use macros to set the filter parameter (attribute value from A in that particular row iteration). This has been a hurdle for my task for quite a few days now, and any help/insight/suggestion would be greatly appreciated!

Answers

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi @Pradyumna_26,

    if the example sets aren't too large, you could use a Cartesian Product (a kind of join, but everything with everything) and then use Generate Attributes for the necessary comparisons, and then Filter Examples to only keep what you need.

    If they are too large, you can process A in batches, e. g. of 100 or 1000 rows, joining with the entire B.

    If you want to go with Loop Examples, use Extract Macro inside the loop with the setting data_value and %{example} as the example index.

    Regards,

    Balázs
Sign In or Register to comment.