Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

[SOLVED] Pivoting several lines into one single field

MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
edited November 2018 in Help
Hi guys,
this is my first post in this forum, I hope it´s on the right spot.

I´m using currently RapidMiner mainly for ETL processes, and it´s truly impressive what RM is capable of. I´m now looking for a particluar feature which I could not find, and I´d appreciate somebody could put me on the right track. Let me make a very simple example.

Let´s say there are currently two columns, Column 1 shows several cheque numbers, column two shows which invoices were paid with each cheque. A cheque can pay one or several invoices (i. e. a typical payment allocation table).

CHEQUE     INVOICE_PAID
C1               Inv1
C1               Inv2
C2               Inv3
C3               Inv4
C3               Inv5
C3               Inv6

What I try to achieve is a table with only two columns, where the first column is grouped by Cheque number, and the second column contains ALL invoices paid by a particular cheque:


CHEQUE     INVOICES_PAID
C1              Inv1, Inv2
C2              Inv3
C3              Inv4, Inv5, Inv6

It´s very important for me that all paid invoices are contained in one single field per record because I want to use it on a report (a statement, to be precise). Unfortunately, I wasn´t able to figure it out myself.
Any help would be greatly appreciated.

Answers

  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Anyone?
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    Something like this would work...

    Loop values for the cheque attribute

    Inside the loop, filter examples where cheque = the current value

    Create a macro containing null

    Start another loop for all filtered examples

    Inside this loop update the macro with the value of the invoice attribute for the current example

    Regards,

    Andrew
  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Hi Andrew,

    thank you very much for your answer.

    Frankly, something really hilarious just happened! Based on your answer I wanted to comment that I believe this functionality should be built into the "aggregate" functionality. I wanted to use the correct terminology, so I looked up the operator in RapidMiner, and - shame on me - I actually figured that it is already there, it´s the aggregation function "concatenation".

    That´s precisely what I was looking for, and SO simple!

    Again Andrew, I do thank you for your answer, after all you pointed me in the right direction, and you also gave me a glimpse of a more flexible approach. (Not that I could do that (yet), but down the road... who knows? ;) )
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello MacPhotoBiker

    How cool - I've learnt something new too.

    regards

    Andrew
  • rowan_growan_g Member Posts: 47 Contributor II
    Thanks for that solution MacPhotoBiker.
    Did you get the " |" between your values? For some reason I've got those characters and I can't get rid of them. Any ideas?

    Cheers,
  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Hi rowan.g,

    "|" is  simply a field delimiter, and RM decided to choose one that usually does not occur in any other context, which is a very good thing in order to keep the different values (Inv1, Inv2 etc. in my example) clearly separated.

    It's very easy to change those, simply do the following:

    - After the aggregate function, add a "Generate Attribute" operator
    - "Attribute name" is the field you just generated, e.g. "MyListInOneField"
    - "function expression" is "replace(MyListInOneField","|",",")

    This replaces all occurrences of "|" with "," in the field "MyListInOneField".

    In my example, I want one field that reads "Inv. # 4, 5, 6", here's how I do that:

    - My initial "ListOfInvoices" is "4|5|6" -> replace(ListOfInvoices,"|",", ") -> "4, 5, 6"

    - To add "Inv. # " I use the "concat" function: concat("Inv. # ", replace(ListOfInvoices,"|",", ")) -> "Inv. # 4, 5, 6"

    That did the trick for me, I hope it helps.
  • rowan_growan_g Member Posts: 47 Contributor II
    Thanks MacPhotoBiker. Worked perfectly.
  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Glad to hear that it worked.
Sign In or Register to comment.