🎉 🎉   RAPIDMINER 9.5 BETA IS OUT!!!   🎉 🎉

GRAB THE HOTTEST NEW BETA OF RAPIDMINER STUDIO, SERVER, AND RADOOP. LET US KNOW WHAT YOU THINK!

CLICK HERE TO DOWNLOAD

🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

Data Preprocessing Ideas

pix123pix123 Member Posts: 27 Contributor I
edited December 2018 in Help

I am working with a dataset that is relatively clean, it has no missing values and most of the attributes are numeric with one being a date-time stamp of every 30 mins. I need to carry out some pre-processing techniques on it and have the below ideas but am also looking for other suggestions. Thanks.

 

- Rename some of the numeric attributes so they are easier to identify

- Set roles

 

Ultimately I will build a model to predict the temperature using regression models and the date-time stamp. This will be trained and then tested.

Tagged:

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 564   Unicorn

    Perhaps windowing the time data.  Or having a column to show if the numeric value is higher or lower than the value 30 minutes previously? 

     

     

     
  • pix123pix123 Member Posts: 27 Contributor I

    Hi Edward,

     

    Thanks for the feedback. I am pretty new to RM. Can you explain a little more on how windowing works? Does the time-date attribute need to have the role of label? Thanks.

     

     

  • SGolbertSGolbert RapidMiner Certified Analyst, Member Posts: 341   Unicorn

    Hi Sammie,

     

    first of all you need the time series extension. You can find it in the marketplace (in the menu Extensions -> Marketplace). Try to experiment with the operators and their tutorials.

     

    I think that your question is more about Feature Generation than about RapidMiner. You will probably need to consult some Time Series literature. I can recommend the following:

     

    Helmut Lütkepohl-New Introduction To Multiple Time Series Analysis-Springer (2006)

     

    Kind regards,

    Sebastian

    sgenzer
Sign In or Register to comment.