Hi guys, need some help here.


I have data that is organised at 1 second intervals, for each second it keeps track of the user and the product that was traded and the value of that position.

i have multiple users and each user can trade various products.


how do i,


1. classify users into usergroups?


2. obtain the change in each user-product valuation


2. group that resultant valuation into customisable blocks of time?


sample data set:
















frank and frank2 would be grouped together as frankie

the first 2 seconds is 1 block.

the next 2 seconds is another block.


so i can see... in the first 2 seconds. frankie made 0 , melissa made -1

in the next 2 second block. frank seashells made 6, frank cockels made 3, melissa 4.


and also group the product together. so seashells/cockels would be grouped as seafood.

so seafood made -$1 in first 2 seconds.

and seafood made $10 in next 2 seconds.


thanks guys!



    Hi blancharde,


    i think the way to go is first to use a map or replace operator to unify the name and afterwards one aggregate with sum of value and group_by product timestamp and user.



    Hi Martin, thanks for your help.


    What about finding the difference from one timestamp to another, between product timestamp and user,


    can I create a new column called value_difference ?

