[SOLVED] Pivoting several lines into one single field

MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
edited November 2018 in Help
Hi guys,
this is my first post in this forum, I hope it´s on the right spot.

I´m using currently RapidMiner mainly for ETL processes, and it´s truly impressive what RM is capable of. I´m now looking for a particluar feature which I could not find, and I´d appreciate somebody could put me on the right track. Let me make a very simple example.

Let´s say there are currently two columns, Column 1 shows several cheque numbers, column two shows which invoices were paid with each cheque. A cheque can pay one or several invoices (i. e. a typical payment allocation table).

CHEQUE     INVOICE_PAID
C1               Inv1
C1               Inv2
C2               Inv3
C3               Inv4
C3               Inv5
C3               Inv6

What I try to achieve is a table with only two columns, where the first column is grouped by Cheque number, and the second column contains ALL invoices paid by a particular cheque:


CHEQUE     INVOICES_PAID
C1              Inv1, Inv2
C2              Inv3
C3              Inv4, Inv5, Inv6

It´s very important for me that all paid invoices are contained in one single field per record because I want to use it on a report (a statement, to be precise). Unfortunately, I wasn´t able to figure it out myself.
Any help would be greatly appreciated.

Answers

  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Anyone?
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello

    Something like this would work...

    Loop values for the cheque attribute

    Inside the loop, filter examples where cheque = the current value

    Create a macro containing null

    Start another loop for all filtered examples

    Inside this loop update the macro with the value of the invoice attribute for the current example

    Regards,

    Andrew
  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Hi Andrew,

    thank you very much for your answer.

    Frankly, something really hilarious just happened! Based on your answer I wanted to comment that I believe this functionality should be built into the "aggregate" functionality. I wanted to use the correct terminology, so I looked up the operator in RapidMiner, and - shame on me - I actually figured that it is already there, it´s the aggregation function "concatenation".

    That´s precisely what I was looking for, and SO simple!

    Again Andrew, I do thank you for your answer, after all you pointed me in the right direction, and you also gave me a glimpse of a more flexible approach. (Not that I could do that (yet), but down the road... who knows? ;) )
  • awchisholmawchisholm RapidMiner Certified Expert, Member Posts: 458 Unicorn
    Hello MacPhotoBiker

    How cool - I've learnt something new too.

    regards

    Andrew
  • rowan_growan_g Member Posts: 47 Contributor II
    Thanks for that solution MacPhotoBiker.
    Did you get the " |" between your values? For some reason I've got those characters and I can't get rid of them. Any ideas?

    Cheers,
  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Hi rowan.g,

    "|" is  simply a field delimiter, and RM decided to choose one that usually does not occur in any other context, which is a very good thing in order to keep the different values (Inv1, Inv2 etc. in my example) clearly separated.

    It's very easy to change those, simply do the following:

    - After the aggregate function, add a "Generate Attribute" operator
    - "Attribute name" is the field you just generated, e.g. "MyListInOneField"
    - "function expression" is "replace(MyListInOneField","|",",")

    This replaces all occurrences of "|" with "," in the field "MyListInOneField".

    In my example, I want one field that reads "Inv. # 4, 5, 6", here's how I do that:

    - My initial "ListOfInvoices" is "4|5|6" -> replace(ListOfInvoices,"|",", ") -> "4, 5, 6"

    - To add "Inv. # " I use the "concat" function: concat("Inv. # ", replace(ListOfInvoices,"|",", ")) -> "Inv. # 4, 5, 6"

    That did the trick for me, I hope it helps.
  • rowan_growan_g Member Posts: 47 Contributor II
    Thanks MacPhotoBiker. Worked perfectly.
  • MacPhotoBikerMacPhotoBiker Member Posts: 60 Contributor II
    Glad to hear that it worked.
Sign In or Register to comment.