Input for AssociationRuleGenerator

Legacy UserLegacy User Member Posts: 0 Newbie
edited November 2018 in Help
Hi,
I'm know to the RapidMiner, so the question might be dump, but I hope you'll help me anyway.

I want to use the AssociationRuleGenerator and I found the Tutorial on how to use it, but my Input Format is different and I'm not shure how to configure RapidMiner to work with my input.

I've two formats available:

1. 1NF
A CSV file with two columns. The first column contains the transaction-ID, the second the items.
Example:
TIDITEM
01
02
03
11
13
......
2. Binary Bitmap
A CSV file with the transaction-ID and the items as columns. The values for the items are 0 and 1 to indicate wether the transaction contains the item or not.
Example:
TID123
0111
1101
............
Can anyone tell me how I can use one or both of the formats for generating association rules? I would personally prefer the first format, since we need to convert the data to get the second, but any solution which makes it work will really help me.

Thanks a lot in advance,

Stampede

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,645  RM Founder
    Hi,

    you could simply use the second option: just load the data set with one of the file based input operators and transform the numbers into binominal values with the corresponding preprocessing operators. Then you can apply FPGrowth.

    Cheers,
    Ingo
  • Legacy UserLegacy User Member Posts: 0 Newbie
    Hi,
    thanks for the answer, but I knew that much from the tutorial. Maybe I'm just slow on this one, but I just can't find a fitting Input and Preprocessing Operator.

    My main Problem is: I tried a lot of programs for Association Rule generation, but all of them interpreted the zeros as values and not as "false". So I got negative association rules. I hope that RapidMiner will solve this problem, since I read, that it can handle bitmaps.

    If someone could just tell me the fitting input operator and, if needed, the correct preprocessor, that would be really great.

    Thanks a lot,

    Stampede
  • Legacy UserLegacy User Member Posts: 0 Newbie
    Hi,
    sorry for the incovinience, but I found a partially working solution.

    I get association rules from my input data now, but the rules make no sense (on a manually generated example).

    for example:

    I have the (little bit stupid) example:
    CARAPPARTEMENTVILLAPOORAVERAGERICH
    falsetruefalsetruefalsefalse
    truetruefalsefalsetruefalse
    truefalsetruefalsefalsetrue
    But I get rules like:
    CAR -> POOR
    AVERAGE -> POOR
    RICH -> POOR
    VILLA -> POOR
    CAR v APPARTEMENT v VILLA -> POOR
    CAR v APPARTEMENT v VILLA v RICH -> POOR

    And I don't know where this rules came from. I used CSVExampleSource, FPGroth and AssociationRuleGenerator. It worked without any configuration.

    If anyone can tell me if I make any mistaces on this one or if I missed something (for example preprocessing, but I can't think of any), I would be really thankfull!!!!

    Greetings,

    Stampede
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,645  RM Founder
    Hi,

    the reason is that you have to define which value should be regarded as "negative" and which value should be regarded as "positive". You can do this by using an .aml file and using the ExampleSource operator (I have attached the .aml file and the corresponding .dat file to this message). Then the result will be like you would expect it. Please note that the first value which is defined for each attribute in the .aml file will be seen as positive - in this case it is "true".

    And here the generated rules:

    [VILLA] --> [RICH] (confidence: 1.000)
    [RICH] --> [VILLA] (confidence: 1.000)
    [CAR] --> [VILLA] (confidence: 1.000)
    [CAR] --> [RICH] (confidence: 1.000)
    [APPARTEMENT] --> [POOR] (confidence: 1.000)
    [CAR] --> [AVERAGE] (confidence: 1.000)
    [APPARTEMENT] --> [AVERAGE] (confidence: 1.000)
    [VILLA, POOR] --> [RICH] (confidence: 1.000)
    [RICH, POOR] --> [VILLA] (confidence: 1.000)
    [VILLA, AVERAGE] --> [RICH] (confidence: 1.000)
    [RICH, AVERAGE] --> [VILLA] (confidence: 1.000)
    [CAR] --> [VILLA, RICH] (confidence: 1.000)
    [VILLA, CAR] --> [RICH] (confidence: 1.000)
    [RICH, CAR] --> [VILLA] (confidence: 1.000)
    ...
    Hope that helps,
    Ingo

    [attachment deleted by admin]
  • StampedeStampede Member Posts: 2 Contributor I
    Hi,
    Thanks a lot for the answer and the help, but it still doesn't work. In your example result (I was able to produce the same results), the following rules are generated:
    mierswa wrote:

    [VILLA, POOR] --> [RICH] (confidence: 1.000)
    [RICH, POOR] --> [VILLA] (confidence: 1.000)
    [VILLA, AVERAGE] --> [RICH] (confidence: 1.000)
    [RICH, AVERAGE] --> [VILLA] (confidence: 1.000)
    Since there is no transaction where someone is POOR and RICH or AVERAGE and RICH, this doesn't make sense. I'll keep trying, but I hope you might have some ideas.

    Thanks a lot,

    Stampede
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,645  RM Founder
    Hi,

    thanks again for this note - I totally missed this. I tried FPGrowth on this data set after first applying the operator Nominal2Binominal and then the results seems to be correct:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="C:\Dokumente und Einstellungen\Mierswa\Eigene Dateien\rm_workspace\fp_growth\transformed.aml"/>
        </operator>
        <operator name="Nominal2Binominal" class="Nominal2Binominal">
        </operator>
        <operator name="FPGrowth" class="FPGrowth">
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_support" value="0.3"/>
        </operator>
        <operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
            <parameter key="gain_theta" value="0.0"/>
            <parameter key="keep_frequent_item_sets" value="true"/>
        </operator>
    </operator>

    The drawback however is that the rules also contain XXX=false items which are often not desired. We will check the behavior of FPGrowth but we will not manage this before the next release.

    Thanks again and cheers,
    Ingo
  • StampedeStampede Member Posts: 2 Contributor I
    Hi,
    thank you very much for your help. It works now and I'll just try to remove the negative association rules afterwards.

    Thanks again,

    Stampede
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,645  RM Founder
    Hello,

    you could also get rid of the ... = false features by removing them first with the FeatureNameRemoval or the new AttributeFilter operator. This should also reduce running time.

    Cheers,
    Ingo
Sign In or Register to comment.