Data Preprocessing Ideas

pix123pix123 Member Posts: 27 Contributor II
edited December 2018 in Help

I am working with a dataset that is relatively clean, it has no missing values and most of the attributes are numeric with one being a date-time stamp of every 30 mins. I need to carry out some pre-processing techniques on it and have the below ideas but am also looking for other suggestions. Thanks.


- Rename some of the numeric attributes so they are easier to identify

- Set roles


Ultimately I will build a model to predict the temperature using regression models and the date-time stamp. This will be trained and then tested.



  • Options
    JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Perhaps windowing the time data.  Or having a column to show if the numeric value is higher or lower than the value 30 minutes previously? 



  • Options
    pix123pix123 Member Posts: 27 Contributor II

    Hi Edward,


    Thanks for the feedback. I am pretty new to RM. Can you explain a little more on how windowing works? Does the time-date attribute need to have the role of label? Thanks.



  • Options
    SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi Sammie,


    first of all you need the time series extension. You can find it in the marketplace (in the menu Extensions -> Marketplace). Try to experiment with the operators and their tutorials.


    I think that your question is more about Feature Generation than about RapidMiner. You will probably need to consult some Time Series literature. I can recommend the following:


    Helmut Lütkepohl-New Introduction To Multiple Time Series Analysis-Springer (2006)


    Kind regards,


Sign In or Register to comment.