how to use rapidminer to implement association rule modelling??

vilencyvilency Member Posts: 6 Contributor II
edited June 2019 in Help
sorry,i m a new comer .i confuse how to use association rule modelling. i have semibinary data ...as i read at tutorial.the process must retrieve->preprocessing->fp-growth->association rule. i already made data for retrieve(already import from excel) then i confuse how to make preprocessing?can you help me?thank you for your attention
idjualk226  k227  k228  k229  k230  k231  k232  k233  k237  k239
1  0  0  0  0  0  0  0  0  0  0
2  0  0  0  0  0  0  0  0  0  0
3  0  0  0  0  0  0  0  0  0  0
4  0  0  0  0  0  1  0  0  0  0
5  0  0  0  0  0  0  0  0  0  0
6  0  0  0  0  0  1  0  0  0  0
7  0  0  0  0  0  1  0  0  0  0
8  0  0  0  0  0  1  0  0  1  0
9  0  0  0  0  0  0  0  0  0  0
10  0  0  0  0  0  1  0  0  0  0
11  0  0  0  0  0  0  0  0  0  0
12  0  0  0  0  0  1  0  0  0  0
13  0  0  0  0  0  1  0  0  0  0
14  0  0  0  1  0  1  0  0  0  0
15  0  0  0  0  1  1  0  0  0  0
16  0  0  0  0  0  0  0  0  0  0
17  0  0  0  0  0  0  0  0  0  0
18  1  0  0  1  1  0  0  0  0  0
19  0  1  0  0  0  0  0  0  0  0
20  0  0  0  0  0  0  0  0  0  0
21  0  0  0  0  0  0  0  0  0  0
22  0  0  0  0  0  0  0  0  0  0
23  0  0  0  0  0  1  0  0  0  0
24  0  0  0  0  0  1  0  0  0  0
25  0  0  0  0  0  0  0  0  0  0
26  0  0  0  0  0  1  0  0  0  0
27  0  0  0  0  0  1  0  0  0  0
28  0  0  0  0  0  0  0  0  0  0
29  0  0  0  0  0  0  0  0  0  0
30  0  0  0  0  0  1  0  0  0  0
31  0  0  0  0  0  0  0  0  0  0
32  0  0  0  0  0  0  0  0  0  0
33  0  0  0  0  0  0  0  0  0  0
34  0  1  0  1  0  0  0  0  1  0
35  0  0  0  0  0  0  0  0  0  0
36  0  0  0  0  0  0  0  0  0  0
37  0  0  0  0  0  1  0  0  0  0
38  0  0  0  0  0  1  0  0  0  0
39  0  0  0  0  0  0  0  0  0  0
40  0  0  0  0  0  0  0  0  0  0
41  0  0  0  0  0  0  0  0  0  0
42  0  1  0  0  0  1  0  0  0  0
43  0  0  0  0  0  0  0  0  0  0
44  0  0  0  0  0  0  0  0  0  0
45  0  0  0  0  0  0  0  0  0  0
46  0  0  0  0  0  0  0  1  0  0
47  1  0  0  0  0  1  0  0  0  0
48  0  0  0  0  0  0  0  0  0  0
49  0  0  0  0  0  0  0  0  0  0
50  0  0  0  0  0  0  0  0  0  0
51  0  0  0  0  0  1  1  0  0  0
52  1  0  0  0  0  0  0  0  0  0
53  0  0  0  0  0  0  0  0  0  0
54  0  0  0  0  0  1  0  0  1  0
55  0  0  0  0  0  0  0  0  0  0
56  0  0  0  0  0  0  0  0  0  0
57  0  0  1  0  0  0  0  0  0  0
58  0  0  0  0  0  1  0  0  0  0
idjual is sales id
k299 and... are category

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    You're right - if you have a column of 0's and 1's I would assume it was binary/binominal, but RM needs to be told. Anyway if I put your data away as a CSV I can generate rules from it, OK only if I set the bar very low! Here's how..

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="353" width="934">
          <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="96" y="61">
            <parameter key="file_name" value="C:\Haddock\vilency.csv"/>
          </operator>
          <operator activated="true" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="246" y="75">
            <parameter key="name" value="id"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="numerical_to_binominal" expanded="true" height="76" name="Numerical to Binominal" width="90" x="380" y="75"/>
          <operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="506" y="72">
            <parameter key="min_support" value="0.1"/>
          </operator>
          <operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="648" y="120">
            <parameter key="min_confidence" value="0.5"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="example set" to_port="result 1"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
          <connect from_op="Create Association Rules" from_port="rules" to_port="result 2"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • vilencyvilency Member Posts: 6 Contributor II
    1.its helpfull.thanx haddock...im so appreciate it. As you said that it would generated rule if the bar set low(you mean minimum support and minimum confidence right?)
    2,does it because the sample of the data so little?  Actually the real data is 80000record , does it effect the result if i put it in rapidminer?
    3.what should i set the minimum support and minimum confidence if the data is 80000 record to  make the accurate result for prediction ?
    4. i still confuse with the result from the rapidminer@tableview ,maybe because i'm new in mining
    what mean laplace,gain,p-s,conviction? i  just understand support and confidence.sorry if i asking this...
    thanx
  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    Here are my answers in order..

    1. Yep.
    2. More data, longer run.
    3. Whatever convinces you !!! There is no single correct answer.
    4. I have this bookmarked http://michael.hahsler.net/research/association_rules/measures.html

    Have fun..

  • vilencyvilency Member Posts: 6 Contributor II
    hi..
    i want to ask, does set role before numerical to binomial process, it means told RM that which field must convert to binomial and which field doesnt need to convert?
    thanx
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    per default special attribute (= attributes having a role different from regular) will be excluded from being transformed to binominal. This behavior is defined by the attribute subset selection parameters on Numerical to Binominal opertor.
    So setting the first column to the special role "id", it will be excluded from this transformation unless you change the parameter settings,


    Greetings,
      Sebastian
  • vilencyvilency Member Posts: 6 Contributor II
    thanx sebastian
    i already make the process until create association rule as haddock said.it success generate
    some rule.
    1. what i'm wondering is what function of next process like apply association rules,
    generalized sequential patterns,unify item sets.
    2.if i just want to generates association rule , where i must stop apply the process?at create association rule or
    at unify item sets?
    3.at result view, we can see from association rule result theres have table view,text view, graph view,annotation
    what function of annotation?and when we must use that?
    4.at example set numerical to binomial at metadata view.theres have statistic values "mode=false(12709) ,least=true(4161).what does it means?
    sorry if i asking too much...and thanx again for replying my posting.long life rapidminer!!!hehe^^
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I will answer in order:
    1. Take a look at the operator documentation to get insights in what each operator does.
    2. If you want to have association rules, you probably should stop after generating them, I suggest.
    3. You don't have to use the annotation view. Some operators will annotate the results they generate. For example the read database operator will attach the querry used to retrieve the example set to the example set.
    4. The mode is the most often occuring nominal value of an attribute.  Least is the one that occurs least often. Big surprise, isn't it? :P

    Greetings,
      Sebastian
  • vilencyvilency Member Posts: 6 Contributor II
    hi there
    in association rule result,theres have ,support ,confidence, laplace, gain,conviction,lift,p-s
    i already understand support,confidence,lift,conviction.
    i want to ask how to count laplace, gain,p-s, and what it used for in association rule?do anyone have tutorial about that?
    thank you.

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I would suggest taking a look at the wikipedia. Each measure should be explained there and otherwise a google search will help you. Of course we eagerly offer you an introduction to all these measures and other things connected with association rule mining in our webinars. See the shop for details.

    Greetings,
      Sebastian
Sign In or Register to comment.