Data Preprocessing Ideas

pix123pix123 Member Posts: 27 Contributor II
edited December 2018 in Help

I am working with a dataset that is relatively clean, it has no missing values and most of the attributes are numeric with one being a date-time stamp of every 30 mins. I need to carry out some pre-processing techniques on it and have the below ideas but am also looking for other suggestions. Thanks.

 

- Rename some of the numeric attributes so they are easier to identify

- Set roles

 

Ultimately I will build a model to predict the temperature using regression models and the date-time stamp. This will be trained and then tested.

Tagged:

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    Perhaps windowing the time data.  Or having a column to show if the numeric value is higher or lower than the value 30 minutes previously? 

     

     

     
  • pix123pix123 Member Posts: 27 Contributor II

    Hi Edward,

     

    Thanks for the feedback. I am pretty new to RM. Can you explain a little more on how windowing works? Does the time-date attribute need to have the role of label? Thanks.

     

     

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 344 Unicorn

    Hi Sammie,

     

    first of all you need the time series extension. You can find it in the marketplace (in the menu Extensions -> Marketplace). Try to experiment with the operators and their tutorials.

     

    I think that your question is more about Feature Generation than about RapidMiner. You will probably need to consult some Time Series literature. I can recommend the following:

     

    Helmut Lütkepohl-New Introduction To Multiple Time Series Analysis-Springer (2006)

     

    Kind regards,

    Sebastian

Sign In or Register to comment.