RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.

CLICK HERE TO DOWNLOAD

FP--Growth <-> Apriori

cpc2cpc2 Member Posts: 18  Maven
edited November 2018 in Help
Hi,
I am currently using the FPGrowth and the WEKA-Apriori Operator on the Iris Dataset.
The Process looks like this:

<operator name="Root" class="Process" expanded="yes">
    <operator name="ArffExampleSource" class="ArffExampleSource">
        <parameter key="data_file" value="C:\Dokumente und Einstellungen\b\Eigene Dateien\rm_workspace\sample\data\iris.arff"/>
    </operator>
    <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
    </operator>
    <operator name="W-Apriori" class="W-Apriori">
        <parameter key="M" value="0.0010"/>
        <parameter key="I" value="true"/>
    </operator>
</operator>


Both the FPGrwoth and the Apriori Op have a min_support of 0.001 . The other options are Standard.

My question is: Why is the Weka Op able to find Itemsets and the FPGrowth Op not ? Even when i lower the min_support FPGrowth doesn't
find any Itemsets at all.

Answers

  • haddockhaddock Member Posts: 849  Guru
    My question is: Why is the Weka Op able to find Itemsets and the FPGrowth Op not ? Even when i lower the min_support FPGrowth doesn't find any Itemsets at all.
    The answer is in the documentation...
    Please note that the given data set is only allowed to contain binominal attributes, i.e. nominal attributes with only two different values. Simply use the provided preprocessing operators in order to transform your data set. The necessary operators are the discretization operators for changing the value types of numerical attributes to nominal and the operator Nominal2Binominal for transforming nominal attributes into binominal / binary ones.
    and here is an example..
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ArffExampleSource" class="ArffExampleSource">
            <parameter key="data_file" value="C:\Documents and Settings\Alien\My Documents\rm_workspace\sample\data\iris.arff"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="W-Apriori" class="W-Apriori" activated="no">
            <parameter key="M" value="0.0010"/>
            <parameter key="I" value="true"/>
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
            <parameter key="min_number_of_itemsets" value="1"/>
            <parameter key="min_support" value="0.0010"/>
        </operator>
    </operator>

  • cpc2cpc2 Member Posts: 18  Maven
    Thanks man, that helped me alot.

    Theres still something that I don't get:
    The last 4 Rules from the Apriori Result:

    7. petallength = 1.300=true 7 ==> class = Iris-setosa=true 7    conf:(1)
    8. petallength = 1.600=true 7 ==> class = Iris-setosa=true 7    conf:(1)
    9. petalwidth = 0.400=true 7 ==> class = Iris-setosa=true 7    conf:(1)
    10. petalwidth = 0.300=true 7 ==> class = Iris-setosa=true 7    conf:(1)

    Are not generated from FPGrowth. Even the itemsets are not generated (The Apriori OP generates more sets than FPGrowth) . Do you have any idea why ?

    Thanks in advance,
    Birger
  • haddockhaddock Member Posts: 849  Guru
    Hi there Birger,

    Don't want to sound like the Thought Police, but you need to check out the algorithms, which take different inputs and produce different outputs, as we have seen. http://en.wikipedia.org/wiki/Association_rule_learning is as good a place to start as any.

    That being said it would be as useful as a fart in a space-suit if different algorithms were to produce wildly different associations. But fear not! If you set like against like with the minimum support the results on the Iris set are consistent, as this shows...
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ArffExampleSource" class="ArffExampleSource">
            <parameter key="data_file" value="C:\Documents and Settings\Alien\My Documents\rm_workspace\sample\data\iris.arff"/>
        </operator>
        <operator name="Numerical2Polynominal" class="Numerical2Polynominal">
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
            <parameter key="keep_example_set" value="true"/>
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_support" value="0.1"/>
        </operator>
        <operator name="W-Apriori" class="W-Apriori">
            <parameter key="C" value="0.6"/>
            <parameter key="R" value="true"/>
            <parameter key="c" value="1.0"/>
        </operator>
    </operator>
  • cpc2cpc2 Member Posts: 18  Maven
    Thanks a ton, sry for the stupid question  ;)
  • haddockhaddock Member Posts: 849  Guru
    Not stupid at all, glad to be of assistance.
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531   Unicorn
    Hi,
    although FP-Growth and Apriori should return exactly the same results in theory, the implementations are quite different. This does not change the result, if the input is equal, but both operators make different assumptions. For example does the FP-Growth operator ignore special attributes, it seems to me, that the W-Apriori doesn't. So if you label is a special attribute, for example of role label, FP-Growth would ignore it, and hence no FrequentItemSet would be generated containing it.

    Greetings,
    Sebastian
Sign In or Register to comment.