Options

Mining Sequential Association rules / Sequential Pattern Mining

SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
Hello all again,

I am stitting on a task right now and have a problem. I need to do Sequential Pattern Mining and wanted to know what the most used Operator for this task are.

I would be glad if someone could give me a tip .

greetings

SunnyLotusFlower

Answers

  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    Can i Do this in RapidMiner at all?

    greetings

    SunnyLotusFlower
  • Options
    wesselwessel Member Posts: 537 Maven
    Yes you can use the "windowing" operator, or use the "moving average" operator.

    Another way would be to use recurrent neural networks with a delay, this last option is not possible in rapid miner.
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    thx for the answer. Can u give me just a little Workflow about this. This would be great.:) :)

    greetings

    SunnyLotusFlower
  • Options
    wesselwessel Member Posts: 537 Maven
    Okay.

    Can you give me a sequential pattern to analyze?
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    I am a little bit confued if we have the same in mind. i want to extrat association rules with a given date like the following:

    Customer-ID, Product, Date

    10150, softdrink,  1.5.2010
    10150, fruitveg, 1.5.2010
    10236, frozenmeal,  1.5.2010
    10236, beer,  15.5.2010
    10360, fish,  21.6.2010
    10360, cannedveg,  21.6.2010
    10360, beer,  26.6.2010

    And i need Association Rules like

    "If Customer A    buys  fish and cannedveg on 21.6.2010  , then he will buy beer on 26.6.2010.

    Fish and cannedveg on 21.6  => beer on 26.6

    if u have missunderstood me , i must appologize for that.


    greetings

    Lotus


  • Options
    wesselwessel Member Posts: 537 Maven
    Hmm, this is a very hard problem, because your hypothesis space is extremely large.
    "If persons buys some things at sometime, how will this effect his buying in the future?"

    A more specific hypothesis:
    "If persons buys some things at sometime, how will this effect his buying the next time he enters the shop?"
    This problem is a little less hard, but more manageable then the first.

    To solve this problem I would convert the data.
    edit: This might be possible in rapid miner using the windowing operator, but it is tricky
    ID, softdrink, fruitveg, forzenmeal, fish, cannedveg, beer, softdrink2, fruitveg2, forzenmeal2, fish2, cannedveg2, beer2
    10150, 1, 1, 0, 0, 0, 0, ?, ?, ?, ?, ?, ?   (this guy buy softdrink and fruitveg, no info for next time)
    10236, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1   (this guy buy frozen meal, next time beer)
    10236, 0, 0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ?   (same as last entry, but no info on next next time)
    10360, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1   (this guy buy fish and canned veg, and next time buy beer)
    10360, 0, 0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ?   (same as last entry, but again no next next info)
    (You did not give me much data, so you get a lot of ? symbols)

    You can run any unsupervised learning algorithm on this data.

    If you want to solve the "If persons buys some things at sometime, how will this effect his buying in the future?" problem,
    you will get many more attributes in your dataset, it is possible, but unlikely to yield good results.


    edit:
    you might want to also add the attribute "number of days since last visit"
    to account for the fact that shop visits do not occur at equal intervals.
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    ok for my task the two definitions of the problem are 'aquivalent'. There i choose ur second managable one:

    "If persons buys some things at sometime, how will this effect his buying the next time he enters the shop?"

    Do u do this conversion with the Windowing operator?

    If i understand it correkt: for every new date u get a new basket. Is that correct?

    and then i should perform a FP-Growth on that data?


    Could u get me a Workflow for this ? i never used the windowing-operator...


    greetings Lotus
  • Options
    wesselwessel Member Posts: 537 Maven
    I tried but could not get it to work in RapidMiner, I normally use python for preprocessing like this.


    Maybe this code can help, from com.rapidminer.gui.templates.Template@320a80db (market basket)
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <process expanded="true" height="558" width="567">
         <operator activated="true" breakpoints="after" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30"/>
         <operator activated="true" class="set_macro" expanded="true" height="76" name="Define Item Count" width="90" x="179" y="30">
           <parameter key="macro" value="%{itemCountAttributeName}"/>
           <parameter key="value" value="itemCount"/>
         </operator>
         <operator activated="true" class="set_macro" expanded="true" height="76" name="Define Customer" width="90" x="313" y="30">
           <parameter key="macro" value="customerIdAttributeName"/>
           <parameter key="value" value="customerId"/>
         </operator>
         <operator activated="true" class="set_macro" expanded="true" height="76" name="Define Item" width="90" x="447" y="30">
           <parameter key="macro" value="itemIdAttributeName"/>
           <parameter key="value" value="itemId"/>
         </operator>
         <operator activated="true" class="aggregate" expanded="true" height="76" name="Aggregate" width="90" x="45" y="210">
           <list key="aggregation_attributes">
             <parameter key="%{itemCountAttributeName}" value="sum"/>
           </list>
           <parameter key="group_by_attributes" value="%{customerIdAttributeName}|%{itemIdAttributeName}"/>
         </operator>
         <operator activated="true" class="pivot" expanded="true" height="76" name="Pivot" width="90" x="179" y="210">
           <parameter key="group_attribute" value="%{customerIdAttributeName}"/>
           <parameter key="index_attribute" value="%{itemIdAttributeName}"/>
         </operator>
         <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="210">
           <parameter key="name" value="%{customerIdAttributeName}"/>
           <parameter key="target_role" value="id"/>
         </operator>
         <operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="447" y="210"/>
         <operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="447" y="345"/>
         <connect from_op="Retrieve" from_port="output" to_op="Define Item Count" to_port="through 1"/>
         <connect from_op="Define Item Count" from_port="through 1" to_op="Define Customer" to_port="through 1"/>
         <connect from_op="Define Customer" from_port="through 1" to_op="Define Item" to_port="through 1"/>
         <connect from_op="Define Item" from_port="through 1" to_op="Aggregate" to_port="example set input"/>
         <connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
         <connect from_op="Pivot" from_port="example set output" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
         <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
         <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    thx u for ur help anyway. its very nice. :D

    but the problem is i dont got much time to work on this further.

    Maybe i find some time later for do this ....

    thanks alot

    greetings User

  • Options
    B_MinerB_Miner Member Posts: 72 Contributor II
    Why not use the Generalized Sequential Patterns operator?? That is for sequence analysis...
  • Options
    wesselwessel Member Posts: 537 Maven
    Ah cool, I googled the paper:

    Ramakrishnan Srikant, Rakesh Agrawal (1996). Mining Sequential Patterns: Generalizations and Performance Improvements.

    What should be the input to rapidminer?
    Like figure 1, or like figure 2, or other?
    http://img441.imageshack.us/img441/9206/inputx.jpg
    image
  • Options
    SunnyLotusFloweSunnyLotusFlowe Member Posts: 37 Contributor II
    for my task is the input in figure 1 the most suitable

    greetings

    Lotus
    ______________________

    @ B_Miner

    the problem is i should use additional algorithms from weka. I only can use the algorithms from RapidMiner (this comes from the task).
    and now it looks  like rapidminer cant do a sequential pattern analysis....

    but thx for the tip
Sign In or Register to comment.