RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
CLICK HERE TO DOWNLOAD
FPGrowth results inconsistent
Hi,
I'm a student using RapidMiner in a University Data Mining course. So far it's been without any issues, and now I've arrived at Association rules. Here I've hit a bump in the road which i hope someone may be able to point me in the right direction. I've chosen a dataset with a bunch of transactions (21 000). Every item has it's own example (many examples could be the same transaction), so I've filtered out the irrelevant attributes and then converted it into binominal to get the items as attributes.
I've tried many different ways to use the aggregate operator. I ended up using concatenation and two replace operators, first converting everything containing true into true, and secondly everything containing false into false. Then i remove the transaction number with select.and converting it into binominal.
This was a messy way to solve it but it appeared to be working. I now wanted to do it more cleanly so i started experimenting with different approaches. One being converting it into numerical instead of binominal, this way i could use the sum function in the aggregate operator. I then remove the transaction attribute and convert it into binominal.
As far as I can tell, the result appear to be the same. 9531 examples, 95 attributes. As far as i can tell the true and false values are the same for every attribute comparing side by side. However, the results from the following FPGrowth operator differ, one showing bread have a support of 0.675 and the other giving bread 0.325. I manually calculated it and the correct result would be 0.325, this meaning that the concationation approach is incorrect. Now my question is why? What am I missing, as far as i can tell, the input into the FPGrowth operator is the same. I'm aware that there must be much better ways to solve this problem, but what I'm most interested in is why the results differ using these two methods?
Thankful for any help.
Best Regards
David
It seems I'm not allowed to make links, but my dataset was downloaded from github.com/viktree/curlyoctochainsaw
<?xml version="1.0" encoding="UTF8"?><process version="9.3.001">
0
Best Answer

gmeier Employee, Member Posts: 16 RM EngineeringHi @zept,please set the parameter "positive value" of your first FPGrowth operator to "true". If you cannot see this parameter, click first on "Show advanced parameters" at the bottom of the Parameters panel. Then both operators yield the same result.The reason that this is necessary is because the Nominal to Binominal operator befor FPGrowth does not recognize correctly that "true" should be the positive value everywhere, since you created the true and false values by replacing something else instead of using a Numerical to Binominal operator as in the alternative below.Hope that helps!3
Answers