Problem with FPGrowth?

earmijoearmijo Member Posts: 270 Unicorn
edited November 2018 in Help
I'm using the attached dataset to illustrate the problem. It is a very basic program to compute association rules. I read the binary matrix. I transform the 1/0 to true/false. I compute the frequent itemset with the operator FPgrowth and here the problems start. "Blouse" is item that appears only in 3 out of 20 transactions. The program reports a support of 0.85. Obviouly, the error carries over to the rule calculation part.

Here's my code in case I did something silly.
<operator name="Root" class="Process" expanded="yes">
   <operator name="CSVExampleSource" class="CSVExampleSource" breakpoints="after">
       <parameter key="filename" value="K:\clothingstore.csv"/>
       <parameter key="id_name" value="tid"/>
   </operator>
   <operator name="Numerical2Binominal" class="Numerical2Binominal" breakpoints="after">
   </operator>
   <operator name="FPGrowth" class="FPGrowth" breakpoints="after">
       <parameter key="min_support" value="0.2"/>
   </operator>
   <operator name="AssociationRuleGenerator" class="AssociationRuleGenerator">
       <parameter key="min_confidence" value="0.7"/>
   </operator>
</operator>
If I try the Apriori algorithm from the Weka list everything is fine. I've noticed this problem with other (bigger)  datasets. Can you replicate my problem? I'm using version 4.3.



[attachment deleted by admin]

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    your process is just fine. But rapidMiner does something in this situation, that might be surprising: It uses the first nominal value as "false" and the second as "true". The Numerical2Binominal operator sometime gives "true" the index 0, causing this problem. If you invert this, support of 0.85 is exactly correct.

    The solution is quite easy: Store your data into an rapid miner file using the exampleSetWriter and then sort the nominal mappings. Sorry for this inconvinience, but we are working to solve this problem once and for all in RapidMiner 5.0

    Greetings,
      Sebastian
  • earmijoearmijo Member Posts: 270 Unicorn
    Thanks Sebastian. I'm still a bit confused though. If I understand you correctly, the problem is created by the Numerical2Binomial operator. Am I right? But when I place a stop after the conversion, everything looks fine. There are only 3 "true"s for blouse for instance.  I even do the coding myself of true/false in Excel, read the file and I still have the problem. I thought that the problem was FPGROWTH since the operator Weka.Apriori doesn't have any problems.
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    the trouble comes from the internal data handling within rapid miner. I try to sumarize the handling of nominal data shortly:
    Nominal attributes hold a mapping from numbers to the real nominal String values. So internally nominal values are just numbers. Since binominal attributes are not restricted to true/false, instead could have any two nominal values like "1" "0" and "yes", "no" and so on, the FPGrowth operator assumes the first (index 0) nominal value as false and the second (index 1) as true.
    If now true is mapped onto 0 and false onto 1, it will switch the meaning.
    This has happend, because the Numerical2Binominal Operator simply adds the first occuring value, which then gets the index 0. If this was true by random, true gets index 0.
    To overcome this problem you can save the data with exampleSetWriter. The aml file contains informations about the mapping, and there this mapping might be switched. If this is unhandy, because you have a too many attributes, then you could add an artificial example containing only numbers mapped onto false as first one.

    Greetings,
      Sebastian

    PS: We are currently working hardly on removing this troublemaking edge of rapidMiner
Sign In or Register to comment.