Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

question about groupby windowing(?)

user194372user194372 Member Posts: 14 Contributor II
edited April 2021 in Help

Hello, everyone

How are you?


I have one question.


My data look like below


city / date / amount

NY / 20210401 / 100

NY / 20210402 / 150

NY / 20210403 / 50

NY / 20210404 / 30

LA / 20210401 / 40

LA / 20210402 / 20

LA / 20210403 / 50

Chicago / 20210401 / 30

Chicago / 20210402 / 40

Huston / 20210401 / 30

Huston / 20210402 / 20

Huston  / 20210403 / 40

....

....

....

In this data, there are over 1 thousand cities (hugh number of cities)


I want to create a new variable (previous amount) in the data above, like this


city / date / amount / previous amount

NY / 20210401 / 100 / NA

NY / 20210402 / 150 / 100

NY / 20210403 / 50 / 150

NY / 20210404 / 30 / 50

LA / 20210401 / 40 / NA

LA / 20210402 / 20 / 40

LA / 20210403 / 50 / 20

Chicago / 20210401 / 30 / NA

Chicago / 20210402 / 40 / 30

Huston / 20210401 / 30 / NA

Huston / 20210402 / 20 / 30

Huston  / 20210403 / 40 / 20


That is, I want to generate a lagged variable for each city


So I used operator "Loop by value" and "Filter example" and "Windowing"

but it requires hugh amount of memory and it was very slow


So could you please help me with this task?

How can I do this task using rapidminer operator?


Thank you in advance



 

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,527 RM Data Scientist
    Hi,
    please try: Group Into Collection and then Loop Collection with Windowing inside. That can be more efficient. But keep in mind that Windowing just multiplies your data by window_size. So it just can become big.

    ~Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • user194372user194372 Member Posts: 14 Contributor II

    Hello, mschmitz:


    Thank you so much for your help and care.

    It helped me a lot.

    Have a nice day and see you again.

  • yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    Try the "lag" operator as well. You can lag by 1 or by -1 unit for each city. 
  • user194372user194372 Member Posts: 14 Contributor II

    Hello, yyhuang.

    Thank you for your comment.

    Right, there is a "lag" operator. I can remember.

    Have a nice day and see you, yyhuang.

Sign In or Register to comment.