Mining Sequential Association rules / Sequential Pattern Mining

SunnyLotusFlowe · June 2010

Hello all again,

I am stitting on a task right now and have a problem. I need to do Sequential Pattern Mining and wanted to know what the most used Operator for this task are.

I would be glad if someone could give me a tip .

greetings

SunnyLotusFlower

SunnyLotusFlowe · June 2010

Can i Do this in RapidMiner at all?

greetings

SunnyLotusFlower

wessel · June 2010

Yes you can use the "windowing" operator, or use the "moving average" operator.

Another way would be to use recurrent neural networks with a delay, this last option is not possible in rapid miner.

SunnyLotusFlowe · June 2010

thx for the answer. Can u give me just a little Workflow about this. This would be great.:)

greetings

SunnyLotusFlower

wessel · June 2010

Okay.

Can you give me a sequential pattern to analyze?

SunnyLotusFlowe · June 2010

I am a little bit confued if we have the same in mind. i want to extrat association rules with a given date like the following:

Customer-ID, Product, Date

10150, softdrink, 1.5.2010
10150, fruitveg, 1.5.2010
10236, frozenmeal, 1.5.2010
10236, beer, 15.5.2010
10360, fish, 21.6.2010
10360, cannedveg, 21.6.2010
10360, beer, 26.6.2010

And i need Association Rules like

"If Customer A buys fish and cannedveg on 21.6.2010 , then he will buy beer on 26.6.2010.

Fish and cannedveg on 21.6 => beer on 26.6

if u have missunderstood me , i must appologize for that.

greetings

Lotus

wessel · June 2010

Hmm, this is a very hard problem, because your hypothesis space is extremely large.
"If persons buys some things at sometime, how will this effect his buying in the future?"

A more specific hypothesis:
"If persons buys some things at sometime, how will this effect his buying the next time he enters the shop?"
This problem is a little less hard, but more manageable then the first.

To solve this problem I would convert the data.
edit: This might be possible in rapid miner using the windowing operator, but it is tricky
ID, softdrink, fruitveg, forzenmeal, fish, cannedveg, beer, softdrink2, fruitveg2, forzenmeal2, fish2, cannedveg2, beer2
10150, 1, 1, 0, 0, 0, 0, ?, ?, ?, ?, ?, ? (this guy buy softdrink and fruitveg, no info for next time)
10236, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1 (this guy buy frozen meal, next time beer)
10236, 0, 0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ? (same as last entry, but no info on next next time)
10360, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1 (this guy buy fish and canned veg, and next time buy beer)
10360, 0, 0, 0, 0, 0, 1, ?, ?, ?, ?, ?, ? (same as last entry, but again no next next info)
(You did not give me much data, so you get a lot of ? symbols)

You can run any unsupervised learning algorithm on this data.

If you want to solve the "If persons buys some things at sometime, how will this effect his buying in the future?" problem,
you will get many more attributes in your dataset, it is possible, but unlikely to yield good results.

edit:
you might want to also add the attribute "number of days since last visit"
to account for the fact that shop visits do not occur at equal intervals.

SunnyLotusFlowe · June 2010

ok for my task the two definitions of the problem are 'aquivalent'. There i choose ur second managable one:

"If persons buys some things at sometime, how will this effect his buying the next time he enters the shop?"

Do u do this conversion with the Windowing operator?

If i understand it correkt: for every new date u get a new basket. Is that correct?

and then i should perform a FP-Growth on that data?

Could u get me a Workflow for this ? i never used the windowing-operator...

greetings Lotus

wessel · June 2010

I tried but could not get it to work in RapidMiner, I normally use python for preprocessing like this.

Maybe this code can help, from com.rapidminer.gui.templates.Template@320a80db (market basket)

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
  <context>
    <input>
      <location/>
    </input>
    <output>
      <location/>
      <location/>
    </output>
    <macros/>
  </context>
  <operator activated="true" class="process" expanded="true" name="Process">
    <process expanded="true" height="558" width="567">
      <operator activated="true" breakpoints="after" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30"/>
      <operator activated="true" class="set_macro" expanded="true" height="76" name="Define Item Count" width="90" x="179" y="30">
        <parameter key="macro" value="%{itemCountAttributeName}"/>
        <parameter key="value" value="itemCount"/>
      </operator>
      <operator activated="true" class="set_macro" expanded="true" height="76" name="Define Customer" width="90" x="313" y="30">
        <parameter key="macro" value="customerIdAttributeName"/>
        <parameter key="value" value="customerId"/>
      </operator>
      <operator activated="true" class="set_macro" expanded="true" height="76" name="Define Item" width="90" x="447" y="30">
        <parameter key="macro" value="itemIdAttributeName"/>
        <parameter key="value" value="itemId"/>
      </operator>
      <operator activated="true" class="aggregate" expanded="true" height="76" name="Aggregate" width="90" x="45" y="210">
        <list key="aggregation_attributes">
          <parameter key="%{itemCountAttributeName}" value="sum"/>
        </list>
        <parameter key="group_by_attributes" value="%{customerIdAttributeName}|%{itemIdAttributeName}"/>
      </operator>
      <operator activated="true" class="pivot" expanded="true" height="76" name="Pivot" width="90" x="179" y="210">
        <parameter key="group_attribute" value="%{customerIdAttributeName}"/>
        <parameter key="index_attribute" value="%{itemIdAttributeName}"/>
      </operator>
      <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="210">
        <parameter key="name" value="%{customerIdAttributeName}"/>
        <parameter key="target_role" value="id"/>
      </operator>
      <operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="447" y="210"/>
      <operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="447" y="345"/>
      <connect from_op="Retrieve" from_port="output" to_op="Define Item Count" to_port="through 1"/>
      <connect from_op="Define Item Count" from_port="through 1" to_op="Define Customer" to_port="through 1"/>
      <connect from_op="Define Customer" from_port="through 1" to_op="Define Item" to_port="through 1"/>
      <connect from_op="Define Item" from_port="through 1" to_op="Aggregate" to_port="example set input"/>
      <connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
      <connect from_op="Pivot" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
      <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
      <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
    </process>
  </operator>
</process>

SunnyLotusFlowe · June 2010

thx u for ur help anyway. its very nice.

but the problem is i dont got much time to work on this further.

Maybe i find some time later for do this ....

thanks alot

greetings User

B_Miner · June 2010

Why not use the Generalized Sequential Patterns operator?? That is for sequence analysis...

wessel · June 2010

Ah cool, I googled the paper:

Ramakrishnan Srikant, Rakesh Agrawal (1996). Mining Sequential Patterns: Generalizations and Performance Improvements.

What should be the input to rapidminer?
Like figure 1, or like figure 2, or other?
http://img441.imageshack.us/img441/9206/inputx.jpg

SunnyLotusFlowe · June 2010

for my task is the input in figure 1 the most suitable

greetings

Lotus
______________________

@ B_Miner

the problem is i should use additional algorithms from weka. I only can use the algorithms from RapidMiner (this comes from the task).
and now it looks like rapidminer cant do a sequential pattern analysis....

but thx for the tip

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Mining Sequential Association rules / Sequential Pattern Mining

Answers