Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
I'm missing something when it comes to predicting the label for the future
So every so often I pick up rapidminer do a few tutorials and then try to do something on my own only to get stuck and put it away for a few months before getting ambitious again. Anyway its the same problem always. Let's say I have 7 weeks of data with 4 columns (A,B,C,X) where X is the label. I create a model, let's say using a decision tree, where I split the data sending first 6 weeks for training and the 7th week for testing. I apply the model and like what I see where out of all the data the model predicts week 7 results with about %93 accuracy. Great! Here is where I'm confused every time. Using this info how can I predict what week 8 will be? If you go back to the columns I have A,B,C the only column in my file that I know will be column A. I won't know B nor C values until that week comes which of course helps predict X. I was under the impression that rapidminer would use the information that I can supply and maybe use some type of average or medium of the data from prior weeks to fill in any gaps but I assume I'm way off because if I pass all 0's in B and C I'm getting no results. So can someone please help understand what I'm missing because I'm sure it's an ubderstanding of predictive modeling which will open my eyes to what's going on? Should be creating a week 8 file with averages from weeks 1 through 7 or something and pass that in like test data and use weeks 1 thru 7 as training data?
Thanks in advance
Thanks in advance
Tagged:
0
Answers
Is this a time series problem? It sounds like it because of the 'weeks.'
Yes, I guess you can say it's a time series problem as I want to predict the next week but I won't have all the attribute values for the next week to figure out the label.
You probably already know but if you treat this problem as a time series problem and say use and ARIMA operator, you could extrapolate or 'predict' a range of where your label could go.
If you don't have all the values of A,B, and C, you could create a scoring set with some random values or even averages using an Aggregate operator and then try to predict X.
Thanks Tom, I'll look into those options and give it a try
you're in good hands, @TigerPaw. @Thomas_Ott is one of our RapidMiner "unicorns". Also don't forget that our YouTube Channel has tons of videos on how to use RapidMiner, and @tftemme is the guru of the Time Series extension as shown in his most recent blog entry: https://community.rapidminer.com/t5/Community-Blog/Time-Series-Extension-Features-of-Version-0-1-2/ba-p/42585. Lots of resources to take advantage of.
Good luck.
Scott