Data transformation

SagioProjectSagioProject Member Posts: 2 Contributor I
edited November 2018 in Help
Hi everybody,

For a university project I need to transform a dataset, but I really don't know how to do it in RapidMiner. Could you help me out?

The dataset contains event logs captured from a website. Attributes are timestamp, ip_address, browser_info and some other less important ones.

I generated a new attribute Date, which is only the date, without time.

Then I generated a new attribute Session_ID by concatenating the Date, ip_address and browser_info attributes.

The examples with the same Session_ID are events that occurred on the same day, by the same ip address and by the same browser.

What I now want to do is to split up these sessions. If there is a gap between 2 successive events of 30 minutes or more, I want them to be splitted in 2 different groups. I want to do that by generating a new attribute Session_in_day, which can be 1, 2, 3, ... according to the "smaller session" this example is in.

In MatLab I was more or less able to write a program to do this, but I have no clue how to do this in RapidMiner. Anyone?

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,326  RM Data Scientist
    Hi

    you can create a new attribute, which indicates whether there was a pause or not.
    To do so i would recommend using the time series extensions lag operator. Sort by Timestamp, use the Lag operator to get a new coloumn with the previous timestamp and use Generate attributes with

    if(date_diff(timestamp,timestamp-1)>XX,"jump","nojump")
    or whatever you are comforable with.


    Cheers,

    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.