Generalized Sequential Patterns (GSP) dataset format

abderoabdero Member Posts: 1 Contributor I
edited November 2018 in Help
Hello,

i have seen some posts about this subject but i didn't see any good answer.

Can anyone say the format of the input dataset for GSP???

The only format that i have some results (bad ones) is like this:

Client_id, time , feature 1, feature 2, ....
1,1,0,1,0,...
1,2,1,1,1,....
2,1,0,0,0

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,525   Unicorn
    Hi,
    this is already the correct format, you only need to turn the feature 1, feature 2, ... attributes into binominal ones. Use the Numerical To Binominal for this.

    Greetings,
      Sebastian
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    abdero,
    Can you post the XML of how you got your data in the format:

    Client_id, time , feature 1, feature 2, ....
    1,1,0,1,0,...
    1,2,1,1,1,....
    2,1,0,0,0

    Everytime I try to pivot my data from this format:
    Customer, Time, Item
    1,1,a
    1,1,b
    1,2,a
    2,1,c
    etc

    I fail to get your format. 
    Thanks,
    Will
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hi, unfortunately, the Pivot operator is currently only capable of grouping by one single attribute, so you have to combine client id and time before the Pivot operator and separate them afterwards. Please have a look at the attached process.

    Best regards,
    Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="subprocess" compatibility="5.3.005" expanded="true" height="76" name="Generate Data" width="90" x="45" y="30">
            <process expanded="true">
              <operator activated="true" class="generate_transaction_data" compatibility="5.3.005" expanded="true" height="60" name="Generate Transaction Data" width="90" x="45" y="30"/>
              <operator activated="true" class="set_role" compatibility="5.3.005" expanded="true" height="76" name="Set Role" width="90" x="180" y="30">
                <parameter key="name" value="Id"/>
                <list key="set_additional_roles"/>
              </operator>
              <operator activated="true" class="generate_id" compatibility="5.3.005" expanded="true" height="76" name="Generate ID" width="90" x="315" y="30"/>
              <operator activated="true" class="rename" compatibility="5.3.005" expanded="true" height="76" name="Rename" width="90" x="450" y="30">
                <parameter key="old_name" value="id"/>
                <parameter key="new_name" value="time"/>
                <list key="rename_additional_attributes"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="5.3.005" expanded="true" height="76" name="Set Role (2)" width="90" x="585" y="30">
                <parameter key="name" value="time"/>
                <parameter key="target_role" value="id"/>
                <list key="set_additional_roles"/>
              </operator>
              <connect from_op="Generate Transaction Data" from_port="output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
              <connect from_op="Generate ID" from_port="example set output" to_op="Rename" to_port="example set input"/>
              <connect from_op="Rename" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
              <connect from_op="Set Role (2)" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="generate_concatenation" compatibility="5.3.005" expanded="true" height="76" name="Generate Concatenation" width="90" x="179" y="30">
            <parameter key="first_attribute" value="Id"/>
            <parameter key="second_attribute" value="time"/>
          </operator>
          <operator activated="true" class="pivot" compatibility="5.3.005" expanded="true" height="76" name="Pivot" width="90" x="313" y="30">
            <parameter key="group_attribute" value="Id_time"/>
            <parameter key="index_attribute" value="Item"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="split" compatibility="5.3.005" expanded="true" height="76" name="Split" width="90" x="447" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Id_time"/>
            <parameter key="split_pattern" value="_"/>
          </operator>
          <connect from_op="Generate Data" from_port="out 1" to_op="Generate Concatenation" to_port="example set input"/>
          <connect from_op="Generate Concatenation" from_port="example set output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    Marius,
    Thanks for the timely response, I will examine the code you provided.

    Will
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    Marius,
    I actually applied your logic to my SQL and concat'd before rapid miner which speeds up processing.

    The trouble I have now is, when I pivot and attempt to replace missing values, that process doesn't work.

    I result in a green lighted process but still have '?' values in my pivot table.

    Example of my data:

    Time_Customer Item Count
    1_9 a 1
    2_9 b 1
    3_9 c 1
    3_9 d 1
    3_9 e 1
    3_9 f 1
    3_9 e 1
    3_9 b 1
    4_9 c 1
    4_9 b 1
    1_22 c 1
    1_27 c 1
    1_27 a 1
    1_27 g 1
    2_27 c 1
    2_27 h 1
    2_27 g 1
    3_27 c 1


    My code is below:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.005">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.005" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="5.3.005" expanded="true" height="60" name="Read Excel" width="90" x="112" y="30">
            <parameter key="excel_file" value="C:\MYFILE"/>
            <parameter key="sheet_number" value="2"/>
            <parameter key="imported_cell_range" value="A1:C32256"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Time_Customer.true.polynominal.attribute"/>
              <parameter key="1" value="Item.true.polynominal.attribute"/>
              <parameter key="2" value="Count.true.polynominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="pivot" compatibility="5.3.005" expanded="true" height="76" name="Pivot" width="90" x="246" y="30">
            <parameter key="group_attribute" value="Time_Customer"/>
            <parameter key="index_attribute" value="Item"/>
            <parameter key="consider_weights" value="false"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="5.3.005" expanded="true" height="94" name="Replace Missing Values" width="90" x="447" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Time_Customer"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="0"/>
          </operator>
          <operator activated="true" class="split" compatibility="5.3.005" expanded="true" height="76" name="Split" width="90" x="648" y="30">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Time_Customer"/>
            <parameter key="split_pattern" value="_"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.3.005" expanded="true" height="76" name="Rename" width="90" x="782" y="30">
            <parameter key="old_name" value="Time_Customer_1"/>
            <parameter key="new_name" value="Time"/>
            <list key="rename_additional_attributes">
              <parameter key="Time_Customer_2" value="Customer"/>
            </list>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>



    I greatly appreciate any help you all can offer.

    Will
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hi,

    please examine your Replace Missing Values operator. You are replacing the values of only one attribute, but in reality you probably want to replace missing values in *all* attributes, right?

    Best regards,
    Marius
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    Marius,
    Thank you for your help, I got it to work.  The code for reference is provided below.  I do have one more snag, the output of the GSP Set works in a Mac OSX install but not in Windows 7. 

    In the Win7, I see summary data in the results overview tab, but when moving to the GSPSet(GSP) tab, all I see are the annotations options.  In the Mac OSX instance, everything appears as one would expect.

    Not sure if I should submit a bug report or what.

    Thanks for your help!

    Will
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.007">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="5.3.007" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:myfile.xls"/>
            <parameter key="sheet_number" value="2"/>
            <parameter key="imported_cell_range" value="A1:C32256"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="Time_Customer.true.polynominal.attribute"/>
              <parameter key="1" value="Item.true.polynominal.attribute"/>
              <parameter key="2" value="Count.true.binominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="pivot" compatibility="5.3.007" expanded="true" height="76" name="Pivot" width="90" x="179" y="30">
            <parameter key="group_attribute" value="Time_Customer"/>
            <parameter key="index_attribute" value="Item"/>
            <parameter key="consider_weights" value="false"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="5.3.007" expanded="true" height="94" name="Replace Missing Values" width="90" x="313" y="30">
            <parameter key="attribute" value="Time_Customer"/>
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="0"/>
          </operator>
          <operator activated="true" class="split" compatibility="5.3.007" expanded="true" height="76" name="Split" width="90" x="45" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Time_Customer"/>
            <parameter key="split_pattern" value="_"/>
          </operator>
          <operator activated="true" class="rename" compatibility="5.3.007" expanded="true" height="76" name="Rename" width="90" x="179" y="255">
            <parameter key="old_name" value="Time_Customer_1"/>
            <parameter key="new_name" value="Time"/>
            <list key="rename_additional_attributes">
              <parameter key="Time_Customer_2" value="Customer"/>
            </list>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.007" expanded="true" height="94" name="Nominal to Numerical" width="90" x="380" y="255">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Time"/>
            <parameter key="coding_type" value="unique integers"/>
            <list key="comparison_groups"/>
          </operator>
          <operator activated="true" class="generalized_sequential_patterns" compatibility="5.3.007" expanded="true" height="76" name="GSP" width="90" x="581" y="210">
            <parameter key="customer_id" value="Customer"/>
            <parameter key="time_attribute" value="Time"/>
            <parameter key="min_support" value="0.1"/>
            <parameter key="window_size" value="1.0"/>
            <parameter key="max_gap" value="18.0"/>
            <parameter key="min_gap" value="13.0"/>
            <parameter key="positive_value" value="1"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_op="GSP" to_port="example set"/>
          <connect from_op="GSP" from_port="patterns" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Hey Will,

    are you using the RapidMiner 5.3.7 on both your machines?

    Best regards,
    Marius
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    Yes Sir.  Updated this morning and it still produces the "error".

    Will
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    I could reproduce that behavior under windows, and it is obviously a bug. I created an internal bug report for that, so no need to submit a bug from your side.

    Best regards,
    Marius
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    Outstanding Marius,
    Thank you for your assistance!

    Will
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    Marius,
    Another question concerning GSP.  I receive the same result sets regardless of my Window, Min and Max Gap setting. 
    My raw data is using days between events as the time element.

    Is this a function of the same bug we previously found?


    Thanks,
    Will

    Marius wrote:

    I could reproduce that behavior under windows, and it is obviously a bug. I created an internal bug report for that, so no need to submit a bug from your side.

    Best regards,
    Marius
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    I can't imagine that the the two issues are related.
    Did you inspect your data and make sure that the entered values actually would make a difference?

    Best regards,
    Marius
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Moderator, Employee, Member, University Professor Posts: 1,806   RM Engineering
    Hi,

    we've just fixed the "empty GSP results" bug. You can either checkout the latest SVN version (see here, updated around midnight) and build RapidMiner yourself, or wait for the next release.

    Regards,
    Marco
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    Marco,
    Thanks for the response, I'll check my updates!
    Will
  • lvanelvane Member Posts: 2 Contributor I
    Hello dear Rapid I developers,

    my GSP empty problem still exists till now, how can i update my Rapidminer? or do I need to wait until next official update? Could anyone tell me at what time?

    Thank you!

  • willgouldinwillgouldin Member Posts: 14 Contributor II
    I am curious as to when the next release will be that covers this as well.

    Thanks,
    Will
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Will, we don't have any release schedule targeted at the great public yet.

    Best regards,
    Marius
  • willgouldinwillgouldin Member Posts: 14 Contributor II
    Not to dig up an old topic, but I am still having trouble with the data layout for the GSP operator.

    I have combined the time (in day of year format) with my customer ID per your instructions.  I have a column for item and a binomial value for the "qty".

    When I import the excel sheet, pivot, replace the missing values with value "false" and then split, everything looks good.

    When I attempt to convert the split columns for time and customer from nominal to numerical per the GSP operator requirements, my pivot is ruined. 

    I expect :

    Customer, time, item a, item b, ......
    1,1,TRUE, FALSE
    1,3,TRUE, FALSE
    2,4, FALSE, FALSE
                      etc

    however it turns time into multiple columns within the pivot as well.

    I can provide a larger example data if required for trouble shooting.
    Any help that can be provided is appreciated.

    Will
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.3.015">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.3.015" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="read_excel" compatibility="5.3.015" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:\Users\me\Desktop\input.xls"/>
            <parameter key="sheet_number" value="2"/>
            <parameter key="imported_cell_range" value="A1:C7768"/>
            <parameter key="first_row_as_names" value="false"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="time_customer.true.polynominal.attribute"/>
              <parameter key="1" value="Item.true.polynominal.attribute"/>
              <parameter key="2" value="Qty.true.binominal.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="pivot" compatibility="5.3.015" expanded="true" height="76" name="Pivot" width="90" x="45" y="120">
            <parameter key="group_attribute" value="time_customer"/>
            <parameter key="index_attribute" value="Item"/>
            <parameter key="consider_weights" value="false"/>
            <parameter key="skip_constant_attributes" value="false"/>
          </operator>
          <operator activated="true" class="replace_missing_values" compatibility="5.3.015" expanded="true" height="94" name="Replace Missing Values" width="90" x="45" y="210">
            <parameter key="include_special_attributes" value="true"/>
            <parameter key="default" value="value"/>
            <list key="columns"/>
            <parameter key="replenishment_value" value="false"/>
          </operator>
          <operator activated="true" class="split" compatibility="5.3.015" expanded="true" height="76" name="Split" width="90" x="179" y="210">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="time_customer"/>
            <parameter key="split_pattern" value="_"/>
          </operator>
          <operator activated="true" class="nominal_to_numerical" compatibility="5.3.015" expanded="true" height="94" name="Nominal to Numerical" width="90" x="313" y="210">
            <parameter key="create_view" value="true"/>
            <list key="comparison_groups"/>
          </operator>
          <connect from_op="Read Excel" from_port="output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Split" to_port="example set input"/>
          <connect from_op="Split" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
          <connect from_op="Nominal to Numerical" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • usct01usct01 Member Posts: 10 Contributor II
    Hi
    Do we have any operator to apply GSP rules

    Thanks
  • MBMMBM Member Posts: 23 Contributor I

    this is a really good question

Sign In or Register to comment.