ANNOUNCEMENT: RAPIDMINER 9.1 BETA HAS BEEN RELEASED TODAY!   PLEASE DOWNLOAD AND GIVE FEEDBACK. ENJOY AND HAPPY RAPIDMINING!   -- @sgenzer – Community Manager

Vlookup in Rapid Miner?

kyrakyra Member Posts: 5 Contributor I
edited November 9 in Help

Hello!

 

I'm quite new to Rapid Miner and was wondering if there's a VLOOKUP equivalent in it?

I work with two sets of data - the first one being sales file where our supplier lets us know which products have sold, how much etc. The second file is a reference file, which I use to check all product names with codes (ie: Product 1 = GBCIF281939499). 

 

Is there a way to do this in Rapid Miner?

 

Kyra

Tagged:
Gonfiaf_Zuraik

Answers

  • Gonfiaf_ZuraikGonfiaf_Zuraik Member Posts: 8 Contributor I

    hello @kyra

     

    if you choose the "Turbo Prep" in the Views, you will see that one of the actions there is called PIVOT, if you click it, it will give you the opportunity to group the data by the columns you want, and it also gives the possibility to aggregate. try it might be it will help you with your concern :)

    as I am also new to RapidMiner, I would be so pleased if this answer could help :smileyhappy:

     

    Cheers and Good luck

    Jana

     

     

    sgenzermschmitzrfuentealba
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 225  RM Data Scientist

    Hi @kyra,

     

    the concept of Excel's VLOOKUP is called a "join" everywhere else.

    Use the Join operator with the two input example sets, and set the common attribute (the product code).

    Decide if you only want to keep matching rows (inner join) or also keep entries without matching product codes (left or right join).

     

    Regards,

    Balázs

    sgenzerGonfiaf_Zuraikrfuentealba
  • kyrakyra Member Posts: 5 Contributor I

    Thank you, that was super helpful!

    Is there a way to create a process where Rapid Miner creates a new column with the matching product codes?

    For example - after cross referencing both files, a new column is created in the output file called Final Product Code (for example), where it shows you a list of all the Product Codes it was able to match, and highlights the ones that weren't matching or missing?

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 225  RM Data Scientist

    Hi @kyra!

     

    The Join operator has a setting for keeping both key attributes. (You might have to activate the Expert mode in the Parameters panel.)

    To create a new attribute, you use Generate Attributes. It has a graphical formula editor which lists the functions you'll need (if(), missing() etc.).

     

    Regards,

    Balázs

    sgenzerrfuentealba
  • kyrakyra Member Posts: 5 Contributor I
    Hi Balazs,

    Thanks for your response - perhaps I need to provide more info as I'm still having a bit of trouble creating the workflow i want:

    I work at a record label and have two sets of data that I normally work with:
    1. Statement 
    2. VLOOKUP

    When I receive a sales statement, it breaks down the our sales from that Customer for the month, line by line. 
    For example, we’ll receive a monthly sales statement from (ie) Napster that has a line by line breakdown of what digital albums and tracks were sold for the month of October
    The information they provide are things like the ISRC (track code), sale amount (per track), etc.

    My job is to a) do a VLOOKUP to make sure that the ISRC’s provided are correct and b) extract the data that I need and put it into a separate excel sheet. I put it into a format that is readable by our accounting program.

    We have a lot of Customers and each Customer delivers a different sales statement in terms of format so much of my time is spent dissecting it. What I’d like to do with Rapid Miner is automate that process where I can input a sales statement file & my VLOOKUP file and output the data in the format that is readable by our accounting program.

    The data I would need from a statement would be:
    ISRC
    Amount
    Region

    The problem I’m facing is that the statement’s columns are not necessarily named ‘ISRC’, ‘Amount’ and ‘Region’, but this is what our accounting system reads. So the output file on Rapid Miner needs to have these headers but somehow extract the information from the sales statement that has different headers.


    Ultimately, my question is what are the steps I need to take to achieve this on Rapid Miner?

    I’ve put in a Sales Statement and my VLOOKUP sheet onto Rapid Miner.
    I’ve tried the JOIN operator but not sure what settings they need to be on and not sure I’m missing anything else…(pretty sure I am!) 

    It is quite complicated and I’m happy to jump on a call to further explain as there’s a lot to it…

    Thank you in advance!

  • Knut-RMKnut-RM Administrator, Employee, Member Posts: 122   Unicorn
  • kyrakyra Member Posts: 5 Contributor I
    @Knut-RM Thanks for that - will check that out.
    Am I right in thinking that Turbo Prep is not available in the free version? Is that a paid add on?
  • IngoRMIngoRM Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 1,406  RM Founder
    edited November 14
    Hi @kyra,
    Am I right in thinking that Turbo Prep is not available in the free version? Is that a paid add on?
    Yes, that is correct.  Turbo Prep is part of the first 30 days to all users and afterwards it is part of RapidMiner's commercial offering.  See this link here for more information:
    Best,
    Ingo
    mschmitzsgenzer
Sign In or Register to comment.