FP-Growth results inconsistent

zeptzept Member Posts: 2 Newbie
Hi,

I'm a student using RapidMiner in a University Data Mining course. So far it's been without any issues, and now I've arrived at Association rules. Here I've hit a bump in the road which i hope someone may be able to point me in the right direction. I've chosen a dataset with a bunch of transactions (21 000). Every item has it's own example (many examples could be the same transaction), so I've filtered out the irrelevant attributes and then converted it into binominal to get the items as attributes.

I've tried many different ways to use the aggregate operator. I ended up using concatenation and two replace operators, first converting everything containing true into true, and secondly everything containing false into false. Then i remove the transaction number with select.and converting it into binominal.

This was a messy way to solve it but it appeared to be working. I now wanted to do it more cleanly so i started experimenting with different approaches. One being converting it into numerical instead of binominal, this way i could use the sum function in the aggregate operator. I then remove the transaction attribute and convert it into binominal.

As far as I can tell, the result appear to be the same. 9531 examples, 95 attributes. As far as i can tell the true and false values are the same for every attribute comparing side by side. However, the results from the following FP-Growth operator differ, one showing bread have a support of 0.675 and the other giving bread 0.325. I manually calculated it and the correct result would be 0.325, this meaning that the concationation approach is incorrect. Now my question is why? What am I missing, as far as i can tell, the input into the FP-Growth operator is the same. I'm aware that there must be much better ways to solve this problem, but what I'm most interested in is why the results differ using these two methods?

Thankful for any help.

Best Regards
David

It seems I'm not allowed to make links, but my dataset was downloaded from github.com/viktree/curly-octo-chainsaw
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">




Best Answer

  • gmeiergmeier Posts: 15   RM Engineering
    Solution Accepted
    Hi @zept,

    please set the parameter "positive value" of your first FP-Growth operator to "true". If you cannot see this parameter, click first on "Show advanced parameters" at the bottom of the Parameters panel. Then both operators yield the same result.

    The reason that this is necessary is because the Nominal to Binominal operator befor FP-Growth does not recognize correctly that "true" should be the positive value everywhere, since you created the true and false values by replacing something else instead of using a Numerical to Binominal operator as in the alternative below.

    Hope that helps!

Answers

  • zeptzept Member Posts: 2 Newbie
    Tested and it works

    I suspected it had something to do with the replaced values not being registered correctly. However when looking at the ExampleSet output i concluded that Binominal could only mean one of two things. Therefore if it didn't succeed in converting something, I assumed it would throw an error or exclude the values which would result in less examples in the output. It is a little illogical that when looking at the input into the FP-Growth operator the datasets look identical but produce different results. Regardless, I'm thankful for the help and glad to have found the cause, Thank you for a quick response!

    Best Regards
    David
    Tghadially
Sign In or Register to comment.