Using FP-Growth and Weka-Aprori

neokyrgyzneokyrgyz Member Posts: 4 Contributor I
edited November 2018 in Help
Hi, all

To decrease learning curve is it possible to make a little step-by-step tutorial for beginners. I mean really new beginners.
I'm not able to make even an example of FP-Growth and Weka-Aprori with generated transaction data set, whereas this should be really easy process.

Does any one know if there exist such a tutorial? Or is it possible for you to give step-by-step tutorial for above example.

I spent 2 days for getting general layout and do some processes, but seems it takes a month before I can do what I want.

Thanks and Regards.
Hoping to be understood and not accepted as a lazy "user".


Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    Firstly, welcome to the world of pattern mining. As to finding the tutorial, this might be rather an embarrassing answer for you, but from within RapidMiner try Help->RapidMiner Tutorial. Then do -> Next -> Next in the window that shows and you will see a working example of FP-Growth. It is a smart move to go through that tutorial several times, and to be familiar with all the examples.

    Have fun!


  • neokyrgyzneokyrgyz Member Posts: 4 Contributor I
    Hi,
    Thank you very much.
    Sometimes this kind of "pointing" can save a lot time.

    I've tried to do same as in tutorial but not working. I try step by step without FP-Growth and write output after each step - it works ok. Bu as soon as I insert FP-Growth, it's giving following error:
    The method getNominalMapping() is not supprted by numeric attributes! You probably tried to execute an operator on anumeric data which is only able to handel nominal values.
    So, basically it means that it can do nominal2binominal  without FP-Growth. Is this bug, or am I doing something wrong?
    Thanks in advance.


    My file:

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="all"/>
        <parameter key="logfile" value="C:\AfterRuleAccoss.log"/>
        <parameter key="resultfile" value="C:\afterRuleAccos.res"/>
        <process expanded="true" height="601" width="784">
          <operator activated="true" class="read_aml" expanded="true" height="60" name="Read AML" width="90" x="45" y="120">
            <parameter key="attributes" value="C:\labor-negotiations.aml"/>
          </operator>
          <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="45" y="300">
            <parameter key="attributes" value="duration|wage-inc-1st|wage-inc-2nd|wage-inc-3rd|working-hours|standby-pay|shift-differential|statutory-holidays"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="discretize_by_frequency" expanded="true" height="94" name="Discretize" width="90" x="179" y="300">
            <parameter key="range_name_type" value="short"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="313" y="300"/>
          <operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="447" y="210"/>
          <operator activated="true" class="write_excel" expanded="true" height="60" name="Write Excel" width="90" x="514" y="30">
            <parameter key="excel_file" value="C:\result_afterRMVDiscretizeNom2BinomFPGrowth.xls"/>
          </operator>
          <connect from_op="Read AML" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="example set" to_op="Write Excel" to_port="input"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_port="result 2"/>
          <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    So close, and yet so far! If you had just ticked the "transform_binominal" tick box in the nominal_to_binominal operator all would have worked fine...like this.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="all"/>
        <parameter key="logfile" value="C:\AfterRuleAccoss.log"/>
        <parameter key="resultfile" value="C:\afterRuleAccos.res"/>
        <process expanded="true" height="404" width="915">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="30" y="53">
            <parameter key="repository_entry" value="//Samples/data/Labor-Negotiations"/>
          </operator>
          <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="45" y="297">
            <parameter key="attributes" value="duration|wage-inc-1st|wage-inc-2nd|wage-inc-3rd|working-hours|standby-pay|shift-differential|statutory-holidays"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="discretize_by_frequency" expanded="true" height="94" name="Discretize" width="90" x="179" y="300">
            <parameter key="range_name_type" value="short"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="380" y="300">
          <parameter key="transform_binominal" value="true"/>
          </operator>
          <operator activated="true" breakpoints="before,after" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="581" y="255"/>
          <operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="726" y="263"/>
          <operator activated="true" class="write_excel" expanded="true" height="60" name="Write Excel" width="90" x="514" y="30">
            <parameter key="excel_file" value="C:\result_afterRMVDiscretizeNom2BinomFPGrowth.xls"/>
          </operator>
          <connect from_op="Retrieve" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Discretize" to_port="example set input"/>
          <connect from_op="Discretize" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="example set" to_op="Write Excel" to_port="input"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
          <connect from_op="Create Association Rules" from_port="rules" to_port="result 2"/>
          <connect from_op="Write Excel" from_port="through" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • neokyrgyzneokyrgyz Member Posts: 4 Contributor I
    Thank you very much for your answer. It was really helpful. Learning step1 is completed :)

    I stuck again on step2.
    I'm trying to use W-Apriori on my data:

    beer,bread,jam,butter,cheese,chips,soda,chocolate
    TRUE,TRUE,FALSE,TRUE,FALSE,TRUE,FALSE,TRUE
    TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE
    FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE
    TRUE,TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,FALSE
    FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,FALSE
    FALSE,TRUE,TRUE,FALSE,FALSE,TRUE,TRUE,TRUE
    TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,FALSE,FALSE
    TRUE,FALSE,FALSE,TRUE,TRUE,FALSE,FALSE,TRUE
    FALSE,TRUE,TRUE,FALSE,TRUE,TRUE,FALSE,FALSE
    TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE
    TRUE,TRUE,TRUE,FALSE,FALSE,FALSE,TRUE,TRUE
    1) I want to calculate only True values. For instance I am not interested in if someone did not bought something, but I'm interested in if someone bought something, then what else did he/she buy.
    2) Even if I ignore first requirement (assuming that since RapidMiner calculates Falses then this must be a correct way). If I set M=0.4, but interesting part is that it's not showing what I'm expecting: I expect it to show itemsets with min support of 0.4, but it shows just some of them.
    For above example it's (I expected beer=True 7. bread=true 9, ...)
    beer=FALSE 4
    jam=FALSE 5
    butter=TRUE 5
    cheese=TRUE 4
    chips=FALSE 5
    soda=TRUE 5
    chocolate=FALSE 5

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <parameter key="logverbosity" value="all"/>
        <parameter key="logfile" value="C\part1_log.log"/>
        <parameter key="resultfile" value="C:\part1_res.res"/>
        <process expanded="true" height="601" width="784">
          <operator activated="true" class="read_csv" expanded="true" height="60" name="Read CSV" width="90" x="68" y="121">
            <parameter key="file_name" value="C:\part1_data.csv"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="246" y="120"/>
          <operator activated="true" class="weka:W-Apriori" expanded="true" height="60" name="W-Apriori" width="90" x="447" y="165">
            <parameter key="C" value="0.6"/>
            <parameter key="M" value="0.4"/>
            <parameter key="I" value="true"/>
            <parameter key="R" value="true"/>
            <parameter key="V" value="true"/>
            <parameter key="c" value="1.0"/>
          </operator>
          <connect from_op="Read CSV" from_port="output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="W-Apriori" to_port="example set"/>
          <connect from_op="W-Apriori" from_port="associator" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>


    What am I doing wrong? What do I need to get what I want?

  • haddockhaddock Member Posts: 849 Maven
    Hola,

    If you want to thin out the Premises or Conclusions you may find this post interesting.

    http://rapid-i.com/rapidforum/index.php/topic,1887.msg7366.html#msg7366

    Because it shows how you can convert Association Rules to an exampleSet, which of course means that all the regular thinning agents can be applied.

    Just a thought.
  • neokyrgyzneokyrgyz Member Posts: 4 Contributor I
    Hi, haddock

    I tried to understand what you have written. But it seems it is not the answer or the way. I'm not sure though.
    My problem is I'm trying to get result from W-Apriori, but result is not what I expect
    It's not minor difference, which can be a result of different implementations, but totally different that it should be.

    FP-Growth is giving: { bread}, {beer},{jam},{chips},{chocolate}, {bread, jam}, {bread, beer}

    I expect W-Apriori to give at least 50% similar to above for such a small data set.

    This makes me to think that I'm doing something wrong, such as ticking some checkbox which was the case in above problem.
    As it can be guessed I spent a week, but still could not solve.
    Any ideas? Or any working processes of W-Apriori?

    Thanks in advance.

  • haddockhaddock Member Posts: 849 Maven
    Perhaps you can post the code?
Sign In or Register to comment.