Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Cumulative summing

PaulMPaulM Member Posts: 17 Contributor II
I have a large set of examples of financial deposit amounts each with a client id and timestamp. All I want to do is add an attribute to each example that represents the total deposits done by the client up to that timestamp.

The only way I could find to do this was to sort the dataset by timestamp and client id and then loop through it example by example with macros in hold the cumulative total and reset it each time a new client id is encountered. It works but it is VERY VERY slow - it's been running for 15 hours and still going. Most processes on this dataset take seconds....

Surely there has to be a better way of doing this?! (preferably using native operators)

Best Answer

Answers

  • PaulMPaulM Member Posts: 17 Contributor II
    Thanks @BalazsBarany - I had a hunch that it might be something that would be easier to move to SQL and it's super helpful to have this confirmed and also for the method. I'll give it a go.

    I appreciate that RapidMiner's model reinforces that examples are independent but this feels like quite a common use case in the customer lifetime modelling space so I am surprised there isn't an operator that natively supports this. Would be great addition IMHO.

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    I absolutely agree that this functionality would be a great addition to RapidMiner. It's probably a big project, though. 

    Going through a database (or having data there in the first place) is not a bad solution though. 
Sign In or Register to comment.