Options

GSP not producing any results [SOLVED]

yannkiralyyannkiraly Member Posts: 1 Contributor I
edited November 2018 in Help
In case anyone else has a similar problem, the solution is to use a "Write Text" operator for the pattern instead of wiring it
directly to output. You can then view the results in the text file using a text editor.
----
Although apparently this is a common question on this board, I have not been able to find an answer to my particular problem.
I am working with dataset 15 found here: http://ailab.wsu.edu/casas/datasets/index.html (Milan).
I extracted only the beginnings of actions from this dataset in Excel, introduced artificial "customers" by simply
assigning the first x elements the name "1" etc, and then used this process to convert the time into a
numerical value and the chosen action from a string into a binomial value (some names  are in German):

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.007">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="read_excel" compatibility="5.3.007" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
       <parameter key="excel_file" value="/path/to/file"/>
       <parameter key="imported_cell_range" value="A1:C2301"/>
       <parameter key="first_row_as_names" value="false"/>
       <list key="annotations">
         <parameter key="0" value="Name"/>
       </list>
       <list key="data_set_meta_data_information">
         <parameter key="0" value="name.true.integer.attribute"/>
         <parameter key="1" value="beschreibung.true.polynominal.attribute"/>
         <parameter key="2" value="datum korrigiert.true.polynominal.attribute"/>
       </list>
     </operator>
     <operator activated="true" class="nominal_to_binominal" compatibility="5.3.007" expanded="true" height="94" name="Nominal to Binominal" width="90" x="246" y="30">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="beschreibung"/>
     </operator>
     <operator activated="true" class="nominal_to_date" compatibility="5.3.007" expanded="true" height="76" name="Nominal to Date" width="90" x="45" y="165">
       <parameter key="attribute_name" value="datum korrigiert"/>
       <parameter key="date_type" value="date_time"/>
       <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/>
     </operator>
     <operator activated="true" class="date_to_numerical" compatibility="5.3.007" expanded="true" height="76" name="Date to Numerical" width="90" x="45" y="300">
       <parameter key="attribute_name" value="datum korrigiert"/>
       <parameter key="time_unit" value="year"/>
       <parameter key="keep_old_attribute" value="true"/>
     </operator>
     <operator activated="true" class="date_to_numerical" compatibility="5.3.007" expanded="true" height="76" name="Date to Numerical (2)" width="90" x="179" y="300">
       <parameter key="attribute_name" value="datum korrigiert"/>
       <parameter key="time_unit" value="month"/>
       <parameter key="keep_old_attribute" value="true"/>
     </operator>
     <operator activated="true" class="date_to_numerical" compatibility="5.3.007" expanded="true" height="76" name="Date to Numerical (3)" width="90" x="313" y="300">
       <parameter key="attribute_name" value="datum korrigiert"/>
       <parameter key="time_unit" value="day"/>
       <parameter key="keep_old_attribute" value="true"/>
     </operator>
     <operator activated="true" class="date_to_numerical" compatibility="5.3.007" expanded="true" height="76" name="Date to Numerical (4)" width="90" x="447" y="300">
       <parameter key="attribute_name" value="datum korrigiert"/>
       <parameter key="time_unit" value="hour"/>
       <parameter key="keep_old_attribute" value="true"/>
     </operator>
     <operator activated="true" class="date_to_numerical" compatibility="5.3.007" expanded="true" height="76" name="Date to Numerical (5)" width="90" x="581" y="300">
       <parameter key="attribute_name" value="datum korrigiert"/>
       <parameter key="time_unit" value="minute"/>
       <parameter key="keep_old_attribute" value="true"/>
     </operator>
     <operator activated="true" class="generate_concatenation" compatibility="5.3.007" expanded="true" height="76" name="Generate Concatenation" width="90" x="45" y="435">
       <parameter key="first_attribute" value="datum korrigiert_year"/>
       <parameter key="second_attribute" value="datum korrigiert_month"/>
     </operator>
     <operator activated="true" class="generate_concatenation" compatibility="5.3.007" expanded="true" height="76" name="Generate Concatenation (2)" width="90" x="179" y="435">
       <parameter key="first_attribute" value="datum korrigiert_year_datum korrigiert_month"/>
       <parameter key="second_attribute" value="datum korrigiert_day"/>
     </operator>
     <operator activated="true" class="generate_concatenation" compatibility="5.3.007" expanded="true" height="76" name="Generate Concatenation (3)" width="90" x="313" y="435">
       <parameter key="first_attribute" value="datum korrigiert_year_datum korrigiert_month_datum korrigiert_day"/>
       <parameter key="second_attribute" value="datum korrigiert_hour"/>
     </operator>
     <operator activated="true" class="generate_concatenation" compatibility="5.3.007" expanded="true" height="76" name="Generate Concatenation (4)" width="90" x="447" y="435">
       <parameter key="first_attribute" value="datum korrigiert_year_datum korrigiert_month_datum korrigiert_day_datum korrigiert_hour"/>
       <parameter key="second_attribute" value="datum korrigiert_minute"/>
     </operator>
     <operator activated="true" class="nominal_to_numerical" compatibility="5.3.007" expanded="true" height="94" name="Nominal to Numerical" width="90" x="581" y="435">
       <parameter key="attribute_filter_type" value="single"/>
       <parameter key="attribute" value="datum korrigiert_year_datum korrigiert_month_datum korrigiert_day_datum korrigiert_hour_datum korrigiert_minute"/>
       <parameter key="coding_type" value="unique integers"/>
       <list key="comparison_groups"/>
     </operator>
     <operator activated="true" class="store" compatibility="5.3.007" expanded="true" height="60" name="Store" width="90" x="45" y="525">
       <parameter key="repository_entry" value="PreprocessedMilanData"/>
     </operator>
     <connect from_port="input 1" to_op="Read Excel" to_port="file"/>
     <connect from_op="Read Excel" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
     <connect from_op="Nominal to Binominal" from_port="example set output" to_op="Nominal to Date" to_port="example set input"/>
     <connect from_op="Nominal to Date" from_port="example set output" to_op="Date to Numerical" to_port="example set input"/>
     <connect from_op="Date to Numerical" from_port="example set output" to_op="Date to Numerical (2)" to_port="example set input"/>
     <connect from_op="Date to Numerical (2)" from_port="example set output" to_op="Date to Numerical (3)" to_port="example set input"/>
     <connect from_op="Date to Numerical (3)" from_port="example set output" to_op="Date to Numerical (4)" to_port="example set input"/>
     <connect from_op="Date to Numerical (4)" from_port="example set output" to_op="Date to Numerical (5)" to_port="example set input"/>
     <connect from_op="Date to Numerical (5)" from_port="example set output" to_op="Generate Concatenation" to_port="example set input"/>
     <connect from_op="Generate Concatenation" from_port="example set output" to_op="Generate Concatenation (2)" to_port="example set input"/>
     <connect from_op="Generate Concatenation (2)" from_port="example set output" to_op="Generate Concatenation (3)" to_port="example set input"/>
     <connect from_op="Generate Concatenation (3)" from_port="example set output" to_op="Generate Concatenation (4)" to_port="example set input"/>
     <connect from_op="Generate Concatenation (4)" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
     <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Store" to_port="input"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="source_input 2" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>
I then feed the result of this (which contains name (1-6), timestamp (converted to a numerical value, thereby equivalent to row number - 1), and binomial values
for different activities, into this process:

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.007">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.007" expanded="true" name="Process">
   <process expanded="true">
     <operator activated="true" class="retrieve" compatibility="5.3.007" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
       <parameter key="repository_entry" value="//NewLocalRepository/PreprocessedMilanData"/>
     </operator>
     <operator activated="true" class="select_attributes" compatibility="5.3.007" expanded="true" height="76" name="Select Attributes" width="90" x="180" y="30">
       <parameter key="attribute_filter_type" value="subset"/>
       <parameter key="attributes" value="|beschreibung = Watch_TV begin|beschreibung = Sleep begin|beschreibung = Read begin|beschreibung = Morning_Meds begin|beschreibung = Meditate begin|beschreibung = Master_Bedroom_Activity begin|beschreibung = Master_Bathroom begin|beschreibung = Leave_Home begin|beschreibung = Kitchen_Activity begin|beschreibung = Guest_Bathroom begin|beschreibung = Eve_Meds begin|beschreibung = Dining_Rm_Activity begin|beschreibung = Desk_Activity begin|beschreibung = Chores begin|beschreibung = Bed_to_Toilet begin|beschreibung =  Master_Bathroom begin|beschreibung =  Leave_Home begin|beschreibung =  Chores begin|datum korrigiert_year_datum korrigiert_month_datum korrigiert_day_datum korrigiert_hour_datum korrigiert_minute|name"/>
     </operator>
     <operator activated="true" class="generalized_sequential_patterns" compatibility="5.3.007" expanded="true" height="76" name="GSP" width="90" x="380" y="30">
       <parameter key="customer_id" value="name"/>
       <parameter key="time_attribute" value="datum korrigiert_year_datum korrigiert_month_datum korrigiert_day_datum korrigiert_hour_datum korrigiert_minute"/>
       <parameter key="min_support" value="0.1"/>
       <parameter key="window_size" value="1.0"/>
       <parameter key="max_gap" value="1.1"/>
       <parameter key="min_gap" value="0.0"/>
       <parameter key="positive_value" value="true"/>
     </operator>
     <connect from_op="Retrieve" from_port="output" to_op="Select Attributes" to_port="example set input"/>
     <connect from_op="Select Attributes" from_port="example set output" to_op="GSP" to_port="example set"/>
     <connect from_op="GSP" from_port="example set" to_port="result 1"/>
     <connect from_op="GSP" from_port="patterns" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>
   </process>
 </operator>
</process>
The process does not produce any result, even though minsup is set to 0.1 and there are, upon manual examination,
definitely repeating subsequences in the data.
This is the log output assoicated with the second process:

Apr 9, 2013 11:06:37 AM INFO: Process //NewLocalRepository/Milan Sequential Classification Step 2 starts
Apr 9, 2013 11:06:37 AM INFO: Loading initial data.
Apr 9, 2013 11:06:37 AM WARNING: Found only 24 sequences. Together with the small minimal support, this could result in very many patterns and a long calculation time.
Apr 9, 2013 11:06:37 AM INFO: Generating Candidates of length 1
Apr 9, 2013 11:06:37 AM INFO: Generating Candidates of length 435
Apr 9, 2013 11:06:37 AM INFO: Building Hashtree for counting candidates of length 2
Apr 9, 2013 11:06:37 AM INFO: Counting supporting sequences for candidates of length 2
Apr 9, 2013 11:06:37 AM INFO: Filtered Candidates. Remaining: 315
Apr 9, 2013 11:06:37 AM INFO: Generating Candidates of length 2
Apr 9, 2013 11:06:37 AM INFO: Generating Candidates of length 4,646
Apr 9, 2013 11:06:37 AM INFO: Building Hashtree for counting candidates of length 3
Apr 9, 2013 11:06:37 AM INFO: Counting supporting sequences for candidates of length 3
Apr 9, 2013 11:06:37 AM INFO: Filtered Candidates. Remaining: 1,664
Apr 9, 2013 11:06:37 AM INFO: Generating Candidates of length 3
Apr 9, 2013 11:06:38 AM INFO: ...
Apr 9, 2013 11:06:38 AM INFO: ...
Apr 9, 2013 11:06:38 AM INFO: Generating Candidates of length 17,020
Apr 9, 2013 11:06:38 AM INFO: Building Hashtree for counting candidates of length 4
Apr 9, 2013 11:06:38 AM INFO: Counting supporting sequences for candidates of length 4
Apr 9, 2013 11:06:38 AM INFO: Filtered Candidates. Remaining: 3,487
Apr 9, 2013 11:06:38 AM INFO: Generating Candidates of length 4
Apr 9, 2013 11:06:39 AM INFO: ...
Apr 9, 2013 11:06:40 AM INFO: Generating Candidates of length 8,132
Apr 9, 2013 11:06:40 AM INFO: Building Hashtree for counting candidates of length 5
Apr 9, 2013 11:06:40 AM INFO: Counting supporting sequences for candidates of length 5
Apr 9, 2013 11:06:40 AM INFO: Filtered Candidates. Remaining: 2,400
Apr 9, 2013 11:06:40 AM INFO: Generating Candidates of length 5
Apr 9, 2013 11:06:40 AM INFO: Generating Candidates of length 1,802
Apr 9, 2013 11:06:40 AM INFO: Building Hashtree for counting candidates of length 6
Apr 9, 2013 11:06:40 AM INFO: Counting supporting sequences for candidates of length 6
Apr 9, 2013 11:06:41 AM INFO: Filtered Candidates. Remaining: 887
Apr 9, 2013 11:06:41 AM INFO: Generating Candidates of length 6
Apr 9, 2013 11:06:41 AM INFO: Generating Candidates of length 538
Apr 9, 2013 11:06:41 AM INFO: Building Hashtree for counting candidates of length 7
Apr 9, 2013 11:06:41 AM INFO: Counting supporting sequences for candidates of length 7
Apr 9, 2013 11:06:41 AM INFO: Filtered Candidates. Remaining: 251
Apr 9, 2013 11:06:41 AM INFO: Generating Candidates of length 7
Apr 9, 2013 11:06:41 AM INFO: Generating Candidates of length 90
Apr 9, 2013 11:06:41 AM INFO: Building Hashtree for counting candidates of length 8
Apr 9, 2013 11:06:41 AM INFO: Counting supporting sequences for candidates of length 8
Apr 9, 2013 11:06:41 AM INFO: Filtered Candidates. Remaining: 54
Apr 9, 2013 11:06:41 AM INFO: Generating Candidates of length 8
Apr 9, 2013 11:06:41 AM INFO: Generating Candidates of length 5
Apr 9, 2013 11:06:41 AM INFO: Building Hashtree for counting candidates of length 9
Apr 9, 2013 11:06:41 AM INFO: Counting supporting sequences for candidates of length 9
Apr 9, 2013 11:06:41 AM INFO: Filtered Candidates. Remaining: 5
Apr 9, 2013 11:06:41 AM INFO: Generating Candidates of length 9
Apr 9, 2013 11:06:41 AM INFO: Generating Candidates of length 0
Apr 9, 2013 11:06:41 AM INFO: Saving results.
Apr 9, 2013 11:06:41 AM INFO: Process //NewLocalRepository/Milan Sequential Classification Step 2 finished successfully after 3 s
I would be very happy about any hints you could give me on this.
Sign In or Register to comment.