Input for AssociationRuleGenerator

Legacy User · June 2008

Hi,
I'm know to the RapidMiner, so the question might be dump, but I hope you'll help me anyway.

I want to use the AssociationRuleGenerator and I found the Tutorial on how to use it, but my Input Format is different and I'm not shure how to configure RapidMiner to work with my input.

I've two formats available:

1. 1NF
A CSV file with two columns. The first column contains the transaction-ID, the second the items.
Example:

TID	ITEM
0	1
0	2
0	3
1	1
1	3
...	...

2. Binary Bitmap
A CSV file with the transaction-ID and the items as columns. The values for the items are 0 and 1 to indicate wether the transaction contains the item or not.
Example:

TID	1	2	3
0	1	1	1
1	1	0	1
...	...	...	...

Can anyone tell me how I can use one or both of the formats for generating association rules? I would personally prefer the first format, since we need to convert the data to get the second, but any solution which makes it work will really help me.

Thanks a lot in advance,

Stampede

IngoRM · June 2008

Hi,

you could simply use the second option: just load the data set with one of the file based input operators and transform the numbers into binominal values with the corresponding preprocessing operators. Then you can apply FPGrowth.

Cheers,
Ingo

Legacy User · June 2008

Hi,
thanks for the answer, but I knew that much from the tutorial. Maybe I'm just slow on this one, but I just can't find a fitting Input and Preprocessing Operator.

My main Problem is: I tried a lot of programs for Association Rule generation, but all of them interpreted the zeros as values and not as "false". So I got negative association rules. I hope that RapidMiner will solve this problem, since I read, that it can handle bitmaps.

If someone could just tell me the fitting input operator and, if needed, the correct preprocessor, that would be really great.

Thanks a lot,

Stampede

Legacy User · June 2008

Hi,
sorry for the incovinience, but I found a partially working solution.

I get association rules from my input data now, but the rules make no sense (on a manually generated example).

for example:

I have the (little bit stupid) example:

CAR	APPARTEMENT	VILLA	POOR	AVERAGE	RICH
false	true	false	true	false	false
true	true	false	false	true	false
true	false	true	false	false	true

But I get rules like:
CAR -> POOR
AVERAGE -> POOR
RICH -> POOR
VILLA -> POOR
CAR v APPARTEMENT v VILLA -> POOR
CAR v APPARTEMENT v VILLA v RICH -> POOR

And I don't know where this rules came from. I used CSVExampleSource, FPGroth and AssociationRuleGenerator. It worked without any configuration.

If anyone can tell me if I make any mistaces on this one or if I missed something (for example preprocessing, but I can't think of any), I would be really thankfull!!!!

Greetings,

Stampede

IngoRM · July 2008

Hi,

the reason is that you have to define which value should be regarded as "negative" and which value should be regarded as "positive". You can do this by using an .aml file and using the ExampleSource operator (I have attached the .aml file and the corresponding .dat file to this message). Then the result will be like you would expect it. Please note that the first value which is defined for each attribute in the .aml file will be seen as positive - in this case it is "true".

And here the generated rules:

[VILLA] --> [RICH] (confidence: 1.000)
[RICH] --> [VILLA] (confidence: 1.000)
[CAR] --> [VILLA] (confidence: 1.000)
[CAR] --> [RICH] (confidence: 1.000)
[APPARTEMENT] --> [POOR] (confidence: 1.000)
[CAR] --> [AVERAGE] (confidence: 1.000)
[APPARTEMENT] --> [AVERAGE] (confidence: 1.000)
[VILLA, POOR] --> [RICH] (confidence: 1.000)
[RICH, POOR] --> [VILLA] (confidence: 1.000)
[VILLA, AVERAGE] --> [RICH] (confidence: 1.000)
[RICH, AVERAGE] --> [VILLA] (confidence: 1.000)
[CAR] --> [VILLA, RICH] (confidence: 1.000)
[VILLA, CAR] --> [RICH] (confidence: 1.000)
[RICH, CAR] --> [VILLA] (confidence: 1.000)
...

Hope that helps,
Ingo

[attachment deleted by admin]

Stampede · July 2008

Hi,
Thanks a lot for the answer and the help, but it still doesn't work. In your example result (I was able to produce the same results), the following rules are generated:

mierswa wrote:

[VILLA, POOR] --> [RICH] (confidence: 1.000)
[RICH, POOR] --> [VILLA] (confidence: 1.000)
[VILLA, AVERAGE] --> [RICH] (confidence: 1.000)
[RICH, AVERAGE] --> [VILLA] (confidence: 1.000)

Since there is no transaction where someone is POOR and RICH or AVERAGE and RICH, this doesn't make sense. I'll keep trying, but I hope you might have some ideas.

Thanks a lot,

Stampede

IngoRM · July 2008

Hi,

thanks again for this note - I totally missed this. I tried FPGrowth on this data set after first applying the operator Nominal2Binominal and then the results seems to be correct:

<operator name="Root" class="Process" expanded="yes">
<operator name="ExampleSource" class="ExampleSource">
<parameter key="attributes" value="C:\Dokumente und Einstellungen\Mierswa\Eigene Dateien\rm_workspace\fp_growth\transformed.aml"/>
</operator>
<operator name="Nominal2Binominal" class="Nominal2Binominal">
</operator>
<operator name="FPGrowth" class="FPGrowth">
<parameter key="find_min_number_of_itemsets" value="false"/>
<parameter key="min_support" value="0.3"/>
</operator>
<operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
<parameter key="gain_theta" value="0.0"/>
<parameter key="keep_frequent_item_sets" value="true"/>
</operator>
</operator>

The drawback however is that the rules also contain XXX=false items which are often not desired. We will check the behavior of FPGrowth but we will not manage this before the next release.

Thanks again and cheers,
Ingo

Stampede · July 2008

Hi,
thank you very much for your help. It works now and I'll just try to remove the negative association rules afterwards.

Thanks again,

Stampede

IngoRM · July 2008

Hello,

you could also get rid of the ... = false features by removing them first with the FeatureNameRemoval or the new AttributeFilter operator. This should also reduce running time.

Cheers,
Ingo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Input for AssociationRuleGenerator

Answers