Doing LinearRegression in a loop? [Solved]

cbwqcbwq Member Posts: 11 Contributor II
edited June 2019 in Help
Hi,

I'm having a problem trying to automate something across a dataset that works fine for subsets.

I want to generate linear regression gradients for the weekly sales of a bunch of products.  My input data is of the form:
Product, Week, Quantity
"Product 1", "2012-03-02", 34
"Product 1", "2012-03-09", 72
"Product 2", "2012-03-02", 91
"Product 2", "2012-03-09", 27
etc.

I want to generate a resultset that looks like:
Product, Trend_Gradient
Product 1, 39.2
Product 2, 15.2

I have it working well enough for a dataset that contains only the one product's sales data but can't figure out how to loop across the dataset with each loop containing all the entries for one product.  Essentially I want to apply the LinearRegression operator in an SQL "GROUP BY Product_ID" type of process.

Any tips?

This is the process I'm trying at the moment though something is wrong and it's probably the loop operator.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
   <process expanded="true" height="762" width="685">
     <operator activated="true" class="read_csv" compatibility="5.2.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
       <parameter key="csv_file" value="/home/user/Repots/SalesAllProducts/SalesByWeekAllProducts.csv"/>
       <parameter key="column_separators" value=","/>
       <parameter key="date_format" value="yyyy-MM-dd"/>
       <parameter key="first_row_as_names" value="false"/>
       <list key="annotations">
         <parameter key="0" value="Name"/>
       </list>
       <parameter key="encoding" value="UTF-8"/>
       <list key="data_set_meta_data_information">
         <parameter key="0" value="Product.true.polynominal.id"/>
         <parameter key="1" value="Date.true.date.attribute"/>
         <parameter key="2" value="Sold.true.numeric.label"/>
       </list>
     </operator>
     <operator activated="true" class="loop_values" compatibility="5.2.006" expanded="true" height="94" name="Loop Values" width="90" x="246" y="75">
       <parameter key="attribute" value="Product"/>
       <process expanded="true" height="780" width="708">
         <operator activated="true" class="series:moving_average" compatibility="5.1.002" expanded="true" height="76" name="Moving Average" width="90" x="45" y="30">
           <parameter key="attribute_name" value="Sold"/>
           <parameter key="window_width" value="4"/>
           <parameter key="ignore_missings" value="true"/>
           <parameter key="keep_original_attribute" value="false"/>
         </operator>
         <operator activated="true" class="series:replace_missing_series_values" compatibility="5.1.002" expanded="true" height="76" name="Replace Missing Values" width="90" x="179" y="30">
           <parameter key="attribute_name" value="moving_average(Sold)"/>
           <parameter key="replacement" value="next value"/>
         </operator>
         <operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
           <parameter key="old_name" value="moving_average(Sold)"/>
           <parameter key="new_name" value="Sold"/>
           <list key="rename_additional_attributes"/>
         </operator>
         <operator activated="true" class="set_role" compatibility="5.2.006" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
           <parameter key="name" value="Sold"/>
           <parameter key="target_role" value="label"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression" width="90" x="581" y="30"/>
         <connect from_port="example set" to_op="Moving Average" to_port="example set input"/>
         <connect from_op="Moving Average" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
         <connect from_op="Replace Missing Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
         <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
         <connect from_op="Linear Regression" from_port="model" to_port="out 1"/>
         <connect from_op="Linear Regression" from_port="exampleSet" to_port="out 2"/>
         <portSpacing port="source_example set" spacing="0"/>
         <portSpacing port="sink_out 1" spacing="0"/>
         <portSpacing port="sink_out 2" spacing="0"/>
         <portSpacing port="sink_out 3" spacing="0"/>
       </process>
     </operator>
     <connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
     <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
     <connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    you are missing a Filter Examples operator in the loop. Please see the attached process for an example.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <process expanded="true" height="762" width="685">
          <operator activated="true" class="generate_data" compatibility="5.2.006" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="random classification"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename (2)" width="90" x="179" y="30">
            <parameter key="old_name" value="label"/>
            <parameter key="new_name" value="product"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="loop_values" compatibility="5.2.006" expanded="true" height="94" name="Loop Values" width="90" x="313" y="30">
            <parameter key="attribute" value="label"/>
            <process expanded="true" height="780" width="882">
              <operator activated="true" class="filter_examples" compatibility="5.2.006" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="product=%{loop_value}"/>
              </operator>
              <operator activated="true" class="series:moving_average" compatibility="5.1.002" expanded="true" height="76" name="Moving Average" width="90" x="179" y="30">
                <parameter key="attribute_name" value="att1"/>
                <parameter key="window_width" value="4"/>
                <parameter key="ignore_missings" value="true"/>
                <parameter key="keep_original_attribute" value="false"/>
              </operator>
              <operator activated="true" class="series:replace_missing_series_values" compatibility="5.1.002" expanded="true" height="76" name="Replace Missing Values" width="90" x="313" y="30">
                <parameter key="attribute_name" value="moving_average(att1)"/>
                <parameter key="replacement" value="next value"/>
              </operator>
              <operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="447" y="30">
                <parameter key="old_name" value="moving_average(att1)"/>
                <parameter key="new_name" value="att1"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.2.006" expanded="true" height="76" name="Set Role" width="90" x="581" y="30">
                <parameter key="name" value="att1"/>
                <parameter key="target_role" value="label"/>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression" width="90" x="715" y="30"/>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Moving Average" to_port="example set input"/>
              <connect from_op="Moving Average" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
              <connect from_op="Replace Missing Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
              <connect from_op="Linear Regression" from_port="model" to_port="out 1"/>
              <connect from_op="Linear Regression" from_port="exampleSet" to_port="out 2"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
              <portSpacing port="sink_out 3" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Loop Values" to_port="example set"/>
          <connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
          <connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • cbwqcbwq Member Posts: 11 Contributor II
    Aah, thank you Marius.  That should do the trick.

    My Product ID is polynomial but other than that, your process is pretty straightforward to adapt.
Sign In or Register to comment.