Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
[SOLVED] Pivoting several lines into one single field
MacPhotoBiker
Member Posts: 60 Contributor II
Hi guys,
this is my first post in this forum, I hope it´s on the right spot.
I´m using currently RapidMiner mainly for ETL processes, and it´s truly impressive what RM is capable of. I´m now looking for a particluar feature which I could not find, and I´d appreciate somebody could put me on the right track. Let me make a very simple example.
Let´s say there are currently two columns, Column 1 shows several cheque numbers, column two shows which invoices were paid with each cheque. A cheque can pay one or several invoices (i. e. a typical payment allocation table).
CHEQUE INVOICE_PAID
C1 Inv1
C1 Inv2
C2 Inv3
C3 Inv4
C3 Inv5
C3 Inv6
What I try to achieve is a table with only two columns, where the first column is grouped by Cheque number, and the second column contains ALL invoices paid by a particular cheque:
CHEQUE INVOICES_PAID
C1 Inv1, Inv2
C2 Inv3
C3 Inv4, Inv5, Inv6
It´s very important for me that all paid invoices are contained in one single field per record because I want to use it on a report (a statement, to be precise). Unfortunately, I wasn´t able to figure it out myself.
Any help would be greatly appreciated.
this is my first post in this forum, I hope it´s on the right spot.
I´m using currently RapidMiner mainly for ETL processes, and it´s truly impressive what RM is capable of. I´m now looking for a particluar feature which I could not find, and I´d appreciate somebody could put me on the right track. Let me make a very simple example.
Let´s say there are currently two columns, Column 1 shows several cheque numbers, column two shows which invoices were paid with each cheque. A cheque can pay one or several invoices (i. e. a typical payment allocation table).
CHEQUE INVOICE_PAID
C1 Inv1
C1 Inv2
C2 Inv3
C3 Inv4
C3 Inv5
C3 Inv6
What I try to achieve is a table with only two columns, where the first column is grouped by Cheque number, and the second column contains ALL invoices paid by a particular cheque:
CHEQUE INVOICES_PAID
C1 Inv1, Inv2
C2 Inv3
C3 Inv4, Inv5, Inv6
It´s very important for me that all paid invoices are contained in one single field per record because I want to use it on a report (a statement, to be precise). Unfortunately, I wasn´t able to figure it out myself.
Any help would be greatly appreciated.
0
Answers
Something like this would work...
Loop values for the cheque attribute
Inside the loop, filter examples where cheque = the current value
Create a macro containing null
Start another loop for all filtered examples
Inside this loop update the macro with the value of the invoice attribute for the current example
Regards,
Andrew
thank you very much for your answer.
Frankly, something really hilarious just happened! Based on your answer I wanted to comment that I believe this functionality should be built into the "aggregate" functionality. I wanted to use the correct terminology, so I looked up the operator in RapidMiner, and - shame on me - I actually figured that it is already there, it´s the aggregation function "concatenation".
That´s precisely what I was looking for, and SO simple!
Again Andrew, I do thank you for your answer, after all you pointed me in the right direction, and you also gave me a glimpse of a more flexible approach. (Not that I could do that (yet), but down the road... who knows? )
How cool - I've learnt something new too.
regards
Andrew
Did you get the " |" between your values? For some reason I've got those characters and I can't get rid of them. Any ideas?
Cheers,
"|" is simply a field delimiter, and RM decided to choose one that usually does not occur in any other context, which is a very good thing in order to keep the different values (Inv1, Inv2 etc. in my example) clearly separated.
It's very easy to change those, simply do the following:
- After the aggregate function, add a "Generate Attribute" operator
- "Attribute name" is the field you just generated, e.g. "MyListInOneField"
- "function expression" is "replace(MyListInOneField","|",",")
This replaces all occurrences of "|" with "," in the field "MyListInOneField".
In my example, I want one field that reads "Inv. # 4, 5, 6", here's how I do that:
- My initial "ListOfInvoices" is "4|5|6" -> replace(ListOfInvoices,"|",", ") -> "4, 5, 6"
- To add "Inv. # " I use the "concat" function: concat("Inv. # ", replace(ListOfInvoices,"|",", ")) -> "Inv. # 4, 5, 6"
That did the trick for me, I hope it helps.