RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
"Time series forecasting with multi-dependent input"
I want to do some time series forecasting. I read pretty much about it and I found a nice video from thomas ott here:
which describes some points pretty well :-)
Now I want to learn from the data mining cup 2012, the data I've got is organized like this in a csv file:
In my first step I implemented a model like the one described by thomas ott and therefore only selected the data for one single item by using the "Filter Examples" like this:
and then I also didn't recognize the "price" in my model and thus removed this column using the "Select Attributes" Node.
<operator activated="true" class="filter_examples" compatibility="5.3.013" expanded="true" height="76" name="Nur eine itemID" width="90" x="179" y="210">
<parameter key="condition_class" value="attribute_value_filter"/>
<parameter key="parameter_string" value="itemID=2"/>
Like this I only had the columns "day; quantity" for one single itemID and tried the forecasting.
Now I want to extend the model with the following things:
(1) First of all, I somehow want to get the itemID in there so that I have a learner which will get a function, not only of the history of quantities (via windowing) but also of the itemID as well.
(2) Concerning (1), I wonder about the following: Is it only possible to learn from the data for each itemID seperately? I.e. like a loop of the previous implemented model for each itemID? Or would it maybe make sense to train the predictor for itemID X also with the data which I've got for the other itemIDs? Maybe there is even some correlation and thus I'd have more learning data!? I don't know if that's possible or if that makes sense, that's why I'd be interested in your opinion.
(3) I'd also like to somehow get the price-development of a certain product as a input variable into the model. Currently I'm only recognizing the day (and maybe the itemID when implementing (1) and (2)). But I've got the price of the items itself in my dataset and it would certainly be interesting to know how the trend of the quantity develops when the prices change. Similarly to (2), it would certainly help to take into consideration the whole dataset for all itemIDs even when learning for one single itemID to get the trends etc...
Thanks in advance for your opinions and every hint :-)
EDIT: I simply tried the windowing operator, but it also windows my itemIDs which is not really what I want obviously..
I'm getting something like:
row number | day (my id) | item-id0 | item-id1 | item-id 2 | price-0 | price-1 | price-2 | quantity-0 | quantity-1 | quantity-2
actually I don't need the history of the items being windowed as this is just a input variable for the learner but not concerning its "history".
I guess I'd rather want something like this:
row number | day (my id) | item-id | price-0 | price-1 | price-2 | quantity-0 | quantity-1 | quantity-2
so that for each tuple of (day/itemid) I have the history of prices and quantities bought (for this certain item-id) as influence variables. with this I then could learn/train the model.
later on, I could then predict from the quantities and prices of the last couple of days a forecast of quantities (that should be my label) bought in on the next day.