I'm missing something when it comes to predicting the label for the future

TigerPawTigerPaw Member Posts: 3 Contributor I
edited November 2018 in Help
So every so often I pick up rapidminer do a few tutorials and then try to do something on my own only to get stuck and put it away for a few months before getting ambitious again. Anyway its the same problem always. Let's say I have 7 weeks of data with 4 columns (A,B,C,X) where X is the label. I create a model, let's say using a decision tree, where I split the data sending first 6 weeks for training and the 7th week for testing. I apply the model and like what I see where out of all the data the model predicts week 7 results with about %93 accuracy. Great! Here is where I'm confused every time. Using this info how can I predict what week 8 will be? If you go back to the columns I have A,B,C the only column in my file that I know will be column A. I won't know B nor C values until that week comes which of course helps predict X. I was under the impression that rapidminer would use the information that I can supply and maybe use some type of average or medium of the data from prior weeks to fill in any gaps but I assume I'm way off because if I pass all 0's in B and C I'm getting no results. So can someone please help understand what I'm missing because I'm sure it's an ubderstanding of predictive modeling which will open my eyes to what's going on? Should be creating a week 8 file with averages from weeks 1 through 7 or something and pass that in like test data and use weeks 1 thru 7 as training data?

Thanks in advance


  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Is this a time series problem? It sounds like it because of the 'weeks.' 

  • TigerPawTigerPaw Member Posts: 3 Contributor I

    Yes, I guess you can say it's a time series problem as I want to predict the next week but I won't have all the attribute values for the next week to figure out the label. 

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    You probably already know but if you treat this problem as a time series problem and say use and ARIMA operator, you could extrapolate or 'predict' a range of where your label could go. 


    If you don't have all the values of A,B, and C, you could create a scoring set with some random values or even averages using an Aggregate operator and then try to predict X. 

  • TigerPawTigerPaw Member Posts: 3 Contributor I

    Thanks Tom, I'll look into those options and give it a try

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    you're in good hands, @TigerPaw.  @Thomas_Ott is one of our RapidMiner "unicorns".  :)  Also don't forget that our YouTube Channel has tons of videos on how to use RapidMiner, and @tftemme is the guru of the Time Series extension as shown in his most recent blog entry: https://community.rapidminer.com/t5/Community-Blog/Time-Series-Extension-Features-of-Version-0-1-2/ba-p/42585.  Lots of resources to take advantage of.


    Good luck.




Sign In or Register to comment.