Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Data transformation

SagioProjectSagioProject Member Posts: 2 Contributor I
edited November 2018 in Help
Hi everybody,

For a university project I need to transform a dataset, but I really don't know how to do it in RapidMiner. Could you help me out?

The dataset contains event logs captured from a website. Attributes are timestamp, ip_address, browser_info and some other less important ones.

I generated a new attribute Date, which is only the date, without time.

Then I generated a new attribute Session_ID by concatenating the Date, ip_address and browser_info attributes.

The examples with the same Session_ID are events that occurred on the same day, by the same ip address and by the same browser.

What I now want to do is to split up these sessions. If there is a gap between 2 successive events of 30 minutes or more, I want them to be splitted in 2 different groups. I want to do that by generating a new attribute Session_in_day, which can be 1, 2, 3, ... according to the "smaller session" this example is in.

In MatLab I was more or less able to write a program to do this, but I have no clue how to do this in RapidMiner. Anyone?

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,531 RM Data Scientist
    Hi

    you can create a new attribute, which indicates whether there was a pause or not.
    To do so i would recommend using the time series extensions lag operator. Sort by Timestamp, use the Lag operator to get a new coloumn with the previous timestamp and use Generate attributes with

    if(date_diff(timestamp,timestamp-1)>XX,"jump","nojump")
    or whatever you are comforable with.


    Cheers,

    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.