Cumulative summing

PaulMPaulM Member Posts: 17 Contributor II
I have a large set of examples of financial deposit amounts each with a client id and timestamp. All I want to do is add an attribute to each example that represents the total deposits done by the client up to that timestamp.

The only way I could find to do this was to sort the dataset by timestamp and client id and then loop through it example by example with macros in hold the cumulative total and reset it each time a new client id is encountered. It works but it is VERY VERY slow - it's been running for 15 hours and still going. Most processes on this dataset take seconds....

Surely there has to be a better way of doing this?! (preferably using native operators)

Best Answer

Answers

  • PaulMPaulM Member Posts: 17 Contributor II
    Thanks @BalazsBarany - I had a hunch that it might be something that would be easier to move to SQL and it's super helpful to have this confirmed and also for the method. I'll give it a go.

    I appreciate that RapidMiner's model reinforces that examples are independent but this feels like quite a common use case in the customer lifetime modelling space so I am surprised there isn't an operator that natively supports this. Would be great addition IMHO.

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    I absolutely agree that this functionality would be a great addition to RapidMiner. It's probably a big project, though. 

    Going through a database (or having data there in the first place) is not a bad solution though. 
Sign In or Register to comment.