🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Algortihms are "cheating" and copying right label from other instances

sebasvogsebasvog Member Posts: 7 Newbie
Hi everyone,

I have a problem with my model. It should predict a monthly product volume from some given attributes.
My (training)data consists of data from ~ 60 past month. Each instance in the dataset represents one day. Two given attributes are the "month" and the "year". The label is the product volume at the end of the month. So in my case every instance of a specific month (~ 30 days/month --> ~ 30 instances) has the same label. Now when I train the algorithm (via Cross Validation / Deep Learning) and look at the performance measure (relative_error) it seems like the algorithm looks at the attributes "month" and "year" and adopts the label value from another row with the same month and year as his prediction for this instance.

I hope you can follow my description. If there is something you don't understand feel free to ask.
I would be very thankfull if someone can tell me if my guess on this is right and how I can avoid this mistake.

Now I am trying to avoid this by just having the month as an attribute, not month+year.

Thanks for your replies,
Sebastian

Answers

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,120  RM Data Scientist
    Hi,
    i would recommend to use a Sliding Window Validation, and not a Cross Validation. This gives you a fair estimation of the performance.

    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
    sebasvog
  • sebasvogsebasvog Member Posts: 7 Newbie
    Hi Martin,

    thank you very much for your answer. I guess this validation method could help me a lot in estimating the performance in my current model! :smiley:

    However I think I have to create a new process with a modified dataset (without year and month as an attribut --> maybe only month) to have a valid solution for my problem.

    Regards,
    Sebastian

  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,120  RM Data Scientist
    Hi,
    either that, our change the preprocessing in a way that you get the month or quarter of the year. That may help.

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • sebasvogsebasvog Member Posts: 7 Newbie
    Hi,

    I tried to apply "Sliding Window Validation" on my model but it seems like this type of validation is only applicable for time series data. 
    I know that my data is "some kind of" time series data, but I am trying to solve the problem by using a Regression with Neural Networks (Deep Learning) .
    So I can not use Sliding Window Validation, right?

    I tried to apply time series models (ARIMA) on my data (period=day, periode=month) but the result was very bad (quess I have not enogh historic data, just 60 month).

    Regards,
    Sebastian
Sign In or Register to comment.