How can i make time series prediction that has binomial label?

kokszoskarojkokszoskaroj Member Posts: 2 Newbie
I have a binomial label, and i want to predict it in a logistic regression with windowed example set. My problem is that  windowing operator can't handle binomial variables. How can i create windowed variables and keep the binomial variable in the same time?

Thanks, 
Daniel
Tagged:

Answers

  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    edited March 2019
    Hi Daniel, you can try and use the Value Series extension. There are many useful tools there. I am sure the new time series operators will be updated at some point but until then, that is your best option. The windowing operator in the extension can work with binomial labels.

    Regards,

    Alex
    varunm1kokszoskaroj
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    FYI Windowing is now part of the "new" time series set of operators in the core. :smile:



    Scott
  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    edited April 2019
    Hi Scott @sgenzer , the last time I checked the new Windowing operator only worked with numerical labels so time series classification problems still need to use the Value series extension. I will check again with 9.21 and see if it works.

    regards,

    Alex
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    edited April 2019
    ahhh my bad @hughesfleming68 I should have read this better. So just to be clear, I created a quick process where you can create a logistic regression model from time series data without windowing fine, but when you use the "Process Windows" operator, it fails with error. Correct? Because this works for me. See this process:

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001"><context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" breakpoints="after" class="subprocess" compatibility="9.2.001" expanded="true" height="82" name="Subprocess" width="90" x="45" y="34">
            <process expanded="true">
              <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve Lake Huron" width="90" x="45" y="34">
                <parameter key="repository_entry" value="//Samples/Time Series/data sets/Lake Huron"/>
              </operator>
              <operator activated="true" class="generate_attributes" compatibility="9.2.001" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
                <list key="function_descriptions">
                  <parameter key="hiLo" value="if([Lake surface level / feet]&lt;579,&quot;Low&quot;,&quot;High&quot;)"/>
                </list>
                <parameter key="keep_all" value="true"/>
              </operator>
              <operator activated="true" class="nominal_to_binominal" compatibility="9.2.001" expanded="true" height="103" name="Nominal to Binominal" width="90" x="313" y="34">
                <parameter key="return_preprocessing_model" value="false"/>
                <parameter key="create_view" value="false"/>
                <parameter key="attribute_filter_type" value="single"/>
                <parameter key="attribute" value="hiLo"/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="nominal"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="file_path"/>
                <parameter key="block_type" value="single_value"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="single_value"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="transform_binominal" value="false"/>
                <parameter key="use_underscore_in_name" value="false"/>
              </operator>
              <operator activated="false" class="guess_types" compatibility="9.2.001" expanded="true" height="82" name="Guess Types" width="90" x="313" y="289">
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="decimal_point_character" value="."/>
              </operator>
              <operator activated="true" class="set_role" compatibility="9.2.001" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
                <parameter key="attribute_name" value="hiLo"/>
                <parameter key="target_role" value="label"/>
                <list key="set_additional_roles"/>
              </operator>
              <connect from_op="Retrieve Lake Huron" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
              <connect from_op="Generate Attributes" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
              <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
            <description align="center" color="transparent" colored="false" width="126">Lake Huron with binominal label</description>
          </operator>
          <operator activated="true" class="time_series:process_windows" compatibility="9.2.001" expanded="true" height="82" name="Process Windows" width="90" x="246" y="238">
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="nominal"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="single_value"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="has_indices" value="true"/>
            <parameter key="indices_attribute" value="Date"/>
            <parameter key="window_size" value="40"/>
            <parameter key="no_overlapping_windows" value="true"/>
            <parameter key="step_size" value="1"/>
            <parameter key="create_horizon_(labels)" value="false"/>
            <parameter key="horizon_attribute" value=""/>
            <parameter key="horizon_size" value="1"/>
            <parameter key="horizon_offset" value="0"/>
            <parameter key="add_last_index_in_window_attribute" value="true"/>
            <parameter key="enable_parallel_execution" value="false"/>
            <process expanded="true">
              <operator activated="true" breakpoints="before,after" class="h2o:logistic_regression" compatibility="9.2.000" expanded="true" height="124" name="Logistic Regression" width="90" x="112" y="34">
                <parameter key="solver" value="AUTO"/>
                <parameter key="reproducible" value="false"/>
                <parameter key="maximum_number_of_threads" value="4"/>
                <parameter key="use_regularization" value="false"/>
                <parameter key="lambda_search" value="false"/>
                <parameter key="number_of_lambdas" value="0"/>
                <parameter key="lambda_min_ratio" value="0.0"/>
                <parameter key="early_stopping" value="true"/>
                <parameter key="stopping_rounds" value="3"/>
                <parameter key="stopping_tolerance" value="0.001"/>
                <parameter key="standardize" value="true"/>
                <parameter key="non-negative_coefficients" value="false"/>
                <parameter key="add_intercept" value="true"/>
                <parameter key="compute_p-values" value="true"/>
                <parameter key="remove_collinear_columns" value="true"/>
                <parameter key="missing_values_handling" value="MeanImputation"/>
                <parameter key="max_iterations" value="0"/>
                <parameter key="max_runtime_seconds" value="0"/>
              </operator>
              <connect from_port="windowed example set" to_op="Logistic Regression" to_port="training set"/>
              <connect from_op="Logistic Regression" from_port="model" to_port="output 1"/>
              <portSpacing port="source_windowed example set" spacing="0"/>
              <portSpacing port="source_input 1" spacing="0"/>
              <portSpacing port="sink_output 1" spacing="0"/>
              <portSpacing port="sink_output 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="false" class="h2o:logistic_regression" compatibility="9.2.000" expanded="true" height="124" name="Logistic Regression (2)" width="90" x="246" y="34">
            <parameter key="solver" value="AUTO"/>
            <parameter key="reproducible" value="false"/>
            <parameter key="maximum_number_of_threads" value="4"/>
            <parameter key="use_regularization" value="false"/>
            <parameter key="lambda_search" value="false"/>
            <parameter key="number_of_lambdas" value="0"/>
            <parameter key="lambda_min_ratio" value="0.0"/>
            <parameter key="early_stopping" value="true"/>
            <parameter key="stopping_rounds" value="3"/>
            <parameter key="stopping_tolerance" value="0.001"/>
            <parameter key="standardize" value="true"/>
            <parameter key="non-negative_coefficients" value="false"/>
            <parameter key="add_intercept" value="true"/>
            <parameter key="compute_p-values" value="true"/>
            <parameter key="remove_collinear_columns" value="true"/>
            <parameter key="missing_values_handling" value="MeanImputation"/>
            <parameter key="max_iterations" value="0"/>
            <parameter key="max_runtime_seconds" value="0"/>
          </operator>
          <connect from_op="Subprocess" from_port="out 1" to_op="Process Windows" to_port="example set"/>
          <connect from_op="Process Windows" from_port="output 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    Hi @sgenzer, the problem is simpler. Lets say that I want to re-purpose a time series regression problem into a times series classification to predict direction one step ahead. This can be done quickly with the Classify by Trend operator to define the label. Lets assume now that I want to window the data to create lagged numeric attributes to try and capture temporal features. This worked in the old Value Series extension but the new windowing operator does not seem to support the binomial label.

    regards,

    Alex
  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    Since version 9.1.0 the Windowing operator (as well as the Process Windows operator) can handle nominal attributes (including nominal horizon attributes). So you can use the Windowing operator of the new time series extension. I just checked by applying it on the Golf data set (I know it isn't a time series, but for demonstration purposes I used it).



    If there are any issues, feel free to report them.
    Best regards,
    Fabian
    hughesfleming68sgenzer
  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    Thanks Fabian, I just tried it on a fresh install of 9.21 and it does work. I did a have process last week where I ran into an issue. I will try and reproduce it.
    sgenzer
  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    @tftemme...Hi Fabian. My mistake. The issue I was running into had to do with Forecast Validation and not the windowing operator. It seems that the Forecast Validation operator is still numerical. That was the reason I switched back to the old value series. If there is a way to do it, please let me know as I might have missed something. 

    Kind regards,

    Alex
  • tftemmetftemme Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, RMResearcher, Member Posts: 164 RM Research
    Hi, yes as the Forecast Validation operator is working with Arima or Holt-Winter models with are only valid for numerical attributes and labels, the operator is also not working for nominal attributes.
    hughesfleming68sgenzer
Sign In or Register to comment.