The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"FP_Growth : must_contain"
Hi,
I use RapidMiner 5.0, and I want to do FPGrowth with parameter "must_contain". I hope that this parameter helps the algorithm (FP-Growth) use less memory.
So, I use Golf data set which is a sample data set coming with RapidMiner. Then, I set must_contain to "CAR = true".
The result is every item set contains CAR = true at the rightmost. It seems this parameter (must_contain) do nothing but
bring its value showed in the final result by attaching every frequent item set.
Does anyone give me some advice?
Thank you in advance.
I use RapidMiner 5.0, and I want to do FPGrowth with parameter "must_contain". I hope that this parameter helps the algorithm (FP-Growth) use less memory.
So, I use Golf data set which is a sample data set coming with RapidMiner. Then, I set must_contain to "CAR = true".
The result is every item set contains CAR = true at the rightmost. It seems this parameter (must_contain) do nothing but
bring its value showed in the final result by attaching every frequent item set.
Does anyone give me some advice?
Thank you in advance.
Tagged:
0
Answers
this is a known bug that will vanish with the next update.
Greetings,
Sebastian
RM 5.2, Transactions data set, sample process "25_FPGrowth", change must_contain to "CAR = true"
Returns a single item set: 1 0.667 CAR = true
Is there any way to return all item sets when at least one of the items matches the pattern? (This is what I expected must_contain would do).
In other words I want to mimic UI 'Contains Item' behavior and get:
1 0.667 CAR = true
2 0.333 CAR = true APPARTEMENT = true
2 0.333 CAR = true VILLA = true
2 0.333 CAR = true RICH = true
2 0.333 CAR = true AVERAGE = true
3 0.333 CAR = true APPARTEMENT = true AVERAGE = true
3 0.333 CAR = true VILLA = true RICH = true
Thank you,
Bemoose
Any clarification on the expected behavior of must_contain is appreciated.
Thank you,
Bemoose
there was still a bug in FP-Growth in combination with must_contain. It has been fixed and the fix will be included in the next version.
Best, Marius
Which version with the fix did you refer in your last post?
I upgraded to 5.2.008. must_contain does not seem to work right.
Thanks,
Bemoose
FP Growth was patched with version 5.2.008. Best,
Nils
what exactly does not work? Can you provide a sample process and sample data which cause the problem?
Best,
Marius
1. Change must_contain in FPGrowth operator to "CAR = true"
Throws exception
Aug 21, 2012 5:02:40 PM SEVERE: Process failed: operator cannot be executed. Check the log messages...
Aug 21, 2012 5:02:40 PM SEVERE: Here: Root[1] (Process)
subprocess 'Main Process'
+- Retrieve[1] (Retrieve)
+- Nominal2Binominal[1] (Nominal to Binominal)
+- AttributeFilter[1] (Select Attributes)
+- FPGrowth[1] (FP-Growth)
==> +- AssociationRuleGenerator[1] (Create Association Rules)
Aug 21, 2012 5:02:40 PM SEVERE: java.lang.NullPointerException
2. Change must_contain in FPGrowth operator to to "CAR"
Erroneously (?) returns FrequentItemSets that do not have CAR. E.g. "VILLA = true"
the FPGrowth Operators works as it should as you can see in the process I have posted above.
You have to set the must_contain parameter to CAR = true and you will get FrequentItemSets that contain CAR = true.
The problem you describe concerns the AssociationRuleGenerator Operator. We are aware of it and hopefully will provide a bugfix with the next release.
Best,
Nils
What about my case #2? It looks like in this case FPgrowth should not return anything but it returns all item sets. To make it more clear we can set must_contain in FPGrowth operator to "nomatch". It will bring all the sets. Minor though.
Best, Marius