How to use ARIMA with Forecast Validation and Optimize Parameters Operator?

ors101ors101 Member Posts: 2 Newbie

I want to build up a Salesforecast with an ARIMA Model. Therefore I would like to train and test my model and additional I would like to find the best values for p, q and d. Can someone help me how to include the ARIMA Model, Forecast Validation and Optimize Parameters Operator into each other? Thank you in advance for your help.



  • Options
    MarcoBarradasMarcoBarradas Administrator, Employee, RapidMiner Certified Analyst, Member Posts: 272 Unicorn
    edited May 2019
    Hi @ors101 You can go to the Samples->TomeSeries->templates an open //Samples/Time Series/templates/Automized Arima on US - Consumption data Process and adjust it to you needs.
    But I recommend you to explore your data first so that you can understand what is happening while the operator is optimizing the ARIMA.

  • Options
    ors101ors101 Member Posts: 2 Newbie
    Hey Marco, thanks a lot for your quick response! Could you please specify how I should explore the data? I thought through the aic criterion I could check which combination of p, d and q is the best. In the "Automized Arima on US - Consumption data" example there is only the Optimize Parameters Operator and the ARIMA model, but no Forecast Validation. Have you, or someone else, tried to combine these 3 Operators? Thanks a lot for your help.   
  • Options
    tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    Hi @ors101

    You can place the Forecast Validation operator with the Arima operator in the training and a Performance (Regression) operator in the testing subprocess (the Forecast Validation operator as it is used in the "Forecast Validation of ARIMA Model for Lake Huron" template process) inside an Optimize operator. Just connect the performance output port of the Forecast Validation with the performance port of the Optimize operator. Then select p,d,q to optimize. (below a demo process how this could look like).

    Note that the Forecast Validation operator validates the regression performance (for example the relative error) on an independent test window, while the aic, bic, aicc are values describing how well the model fits the training data. Thus it is a training error, which can be used (and for example is used in the arima.auto function in R) to select p,d,q as shown in the "Automized Arima on US - Consumption data" template process. But the performance of the forecast itself should always be evaluated with the Forecast Validation. 

    <process version="9.2.001">
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process" origin="GENERATED_SAMPLE">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Lake Huron" origin="GENERATED_SAMPLE" width="90" x="112" y="34">
            <parameter key="repository_entry" value="//Samples/Time Series/data sets/Lake Huron"/>
          <operator activated="true" class="concurrency:optimize_parameters_grid" compatibility="9.2.001" expanded="true" height="145" name="Optimize Parameters (Grid)" width="90" x="380" y="34">
            <list key="parameters">
              <parameter key="ARIMA.p:_order_of_the_autoregressive_model" value="[1;5;10;linear]"/>
              <parameter key="ARIMA.d:_degree_of_differencing" value="[0.0;1;10;linear]"/>
              <parameter key="ARIMA.q:_order_of_the_moving-average_model" value="[0.0;5;10;linear]"/>
            <parameter key="error_handling" value="fail on error"/>
            <parameter key="log_performance" value="true"/>
            <parameter key="log_all_criteria" value="false"/>
            <parameter key="synchronize" value="false"/>
            <parameter key="enable_parallel_execution" value="true"/>
            <process expanded="true">
              <operator activated="true" class="time_series:forecast_validation" compatibility="9.3.000-SNAPSHOT" expanded="true" height="145" name="Forecast Validation" origin="GENERATED_SAMPLE" width="90" x="246" y="34">
                <parameter key="time_series_attribute" value="Lake surface level / feet"/>
                <parameter key="has_indices" value="true"/>
                <parameter key="indices_attribute" value="Date"/>
                <parameter key="window_size" value="20"/>
                <parameter key="no_overlapping_windows" value="false"/>
                <parameter key="step_size" value="5"/>
                <parameter key="horizon_size" value="5"/>
                <parameter key="enable_parallel_execution" value="true"/>
                <process expanded="true">
                  <operator activated="true" class="time_series:arima_trainer" compatibility="9.3.000-SNAPSHOT" expanded="true" height="103" name="ARIMA" origin="GENERATED_SAMPLE" width="90" x="313" y="34">
                    <parameter key="time_series_attribute" value="Lake surface level / feet"/>
                    <parameter key="has_indices" value="false"/>
                    <parameter key="indices_attribute" value=""/>
                    <parameter key="p:_order_of_the_autoregressive_model" value="1"/>
                    <parameter key="d:_degree_of_differencing" value="0"/>
                    <parameter key="q:_order_of_the_moving-average_model" value="1"/>
                    <parameter key="estimate_constant" value="true"/>
                    <parameter key="main_criterion" value="aic"/>
                  <connect from_port="training set" to_op="ARIMA" to_port="example set"/>
                  <connect from_op="ARIMA" from_port="forecast model" to_port="model"/>
                  <portSpacing port="source_training set" spacing="0"/>
                  <portSpacing port="sink_model" spacing="0"/>
                  <portSpacing port="sink_through 1" spacing="0"/>
                  <description align="center" color="blue" colored="true" height="198" resized="false" width="265" x="20" y="80">The ExampleSet at the training set output port contains the values of the training window.&lt;br/&gt;&lt;br/&gt;In the next fold of the Forecast Validation, the training window, as well as the test window is shifted by 5 (parameter step size) values.</description>
                <process expanded="true">
                  <operator activated="true" class="performance_regression" compatibility="9.2.001" expanded="true" height="82" name="Performance" origin="GENERATED_SAMPLE" width="90" x="380" y="34">
                    <parameter key="main_criterion" value="first"/>
                    <parameter key="root_mean_squared_error" value="true"/>
                    <parameter key="absolute_error" value="false"/>
                    <parameter key="relative_error" value="true"/>
                    <parameter key="relative_error_lenient" value="false"/>
                    <parameter key="relative_error_strict" value="false"/>
                    <parameter key="normalized_absolute_error" value="false"/>
                    <parameter key="root_relative_squared_error" value="false"/>
                    <parameter key="squared_error" value="false"/>
                    <parameter key="correlation" value="false"/>
                    <parameter key="squared_correlation" value="false"/>
                    <parameter key="prediction_average" value="false"/>
                    <parameter key="spearman_rho" value="false"/>
                    <parameter key="kendall_tau" value="false"/>
                    <parameter key="skip_undefined_labels" value="true"/>
                    <parameter key="use_example_weights" value="true"/>
                  <connect from_port="test set" to_op="Performance" to_port="labelled data"/>
                  <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
                  <connect from_op="Performance" from_port="example set" to_port="test set results"/>
                  <portSpacing port="source_test set" spacing="0"/>
                  <portSpacing port="source_through 1" spacing="0"/>
                  <portSpacing port="sink_test set results" spacing="0"/>
                  <portSpacing port="sink_performance 1" spacing="0"/>
                  <portSpacing port="sink_performance 2" spacing="0"/>
                  <description align="center" color="blue" colored="true" height="140" resized="false" width="265" x="45" y="80">The ExampleSet at the test set output port already contains the values of the test window as well as the values predicted by the forecast model for the test window.</description>
                  <description align="center" color="yellow" colored="false" height="140" resized="false" width="265" x="45" y="230">The role of the truth values attribute is set to Label, while the role of the forecasted values attribute is set to Prediction, thus the Performance operator can directly used to calulate the performance of the forecast model.</description>
              <connect from_port="input 1" to_op="Forecast Validation" to_port="example set"/>
              <connect from_op="Forecast Validation" from_port="model" to_port="output 1"/>
              <connect from_op="Forecast Validation" from_port="performance 1" to_port="performance"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="source_input 2" spacing="0"/>
              <portSpacing port="sink_performance" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
          <connect from_op="Retrieve Lake Huron" from_port="output" to_op="Optimize Parameters (Grid)" to_port="input 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="performance" to_port="result 1"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="parameter set" to_port="result 3"/>
          <connect from_op="Optimize Parameters (Grid)" from_port="output 1" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
          <portSpacing port="sink_result 4" spacing="0"/>

Sign In or Register to comment.