RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.


ARIMA model

PapadPapad Computer Science StudentMember Posts: 68  Guru
edited June 2019 in Help
Hello everyone,
I have found a ready transaction about ARIMA forecasting into RapidMiner. I tried to apply it on my dataset. My dataset includes sales for 6 products, the temperature and date. I have sales for 2 years and I want to predict sales for the next Januray. So I made my new dataset like the old but I added the temperature an the date. So sales for every product is empty in order to predict it. Although i'm trying to predict with ARIMA , it says tha my dataset contains missing values which is true but why doesn't it predict them?
Also I want to ask, how have I got to change parameters of forecast validation and arima operators, in order to have a good prediction(step size, window size etc).
Thanks in advance.


  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 135  RM Research
    Hi @Papad,

    The ARIMA forecast model (same for the other forecast model in RapidMiner) behave a bit different to how machine learning (ml) models behave, due to the nature of the prediction (a forecast of future values). While ml models predict the values of an attribute, the forecast models predict values for new Examples. So the ARIMA operator expects data with one attribute containing the time series which future values you want to predict (in your case the sales) and optional an indices attribute, which is used to create future indices values (your data attribute). As ARIMA is a univariate method it cannot include another attribute (So with ARIMA, you cannot include temperature data).Β 
    After the ARIMA operator trained the model, you connect the model to the Apply Forecast operator, which creates then the forecasted values.
    Have a look into the tutorial processes of the ARIMA operator for further clarification. If it is still unclear please post your xml of your process and best your data (or a sample), if possible.

    Also ARIMA cannot handle missing values in the training data, so your data set is not allowed to contain missing values in the training data.

    To find the best parameters I would use a Forecast Validation to calculate the regression performance of the forecast (have a look at the tutorial process of the Forecast Validation for details).
    Then put this Forecast Validation inside an Optimize Grid operator to optimize the parameters of your ARIMA model (p,d,q). Be aware that you of course not optimize the parameters of the Forecast Validation, cause this is only how you validate the performance and not a parameter of the model to be optimized.
    The Optimize operator should also be inside another Forecast Validation, to validate the performance of the optimization.

    Be careful when you choose window size, step size and horizon size of the Forecast Validation operators, because you can easily have hundreds and thousands of iterations if you have them too small. For testing I would go with larger windows and 'no overlapping windows'.

    Hopes this helps,
    Best regards

Sign In or Register to comment.