I'm a student using RapidMiner in a University Data Mining course. So far it's been without any issues, and now I've arrived at Association rules. Here I've hit a bump in the road which i hope someone may be able to point me in the right direction. I've chosen a dataset with a bunch of transactions (21 000). Every item has it's own example (many examples could be the same transaction), so I've filtered out the irrelevant attributes and then converted it into binominal to get the items as attributes.
I've tried many different ways to use the aggregate operator. I ended up using concatenation and two replace operators, first converting everything containing true into true, and secondly everything containing false into false. Then i remove the transaction number with select.and converting it into binominal.
This was a messy way to solve it but it appeared to be working. I now wanted to do it more cleanly so i started experimenting with different approaches. One being converting it into numerical instead of binominal, this way i could use the sum function in the aggregate operator. I then remove the transaction attribute and convert it into binominal.
As far as I can tell, the result appear to be the same. 9531 examples, 95 attributes. As far as i can tell the true and false values are the same for every attribute comparing side by side. However, the results from the following FP-Growth operator differ, one showing bread have a support of 0.675 and the other giving bread 0.325. I manually calculated it and the correct result would be 0.325, this meaning that the concationation approach is incorrect. Now my question is why? What am I missing, as far as i can tell, the input into the FP-Growth operator is the same. I'm aware that there must be much better ways to solve this problem, but what I'm most interested in is why the results differ using these two methods?
Thankful for any help.
It seems I'm not allowed to make links, but my dataset was downloaded from github.com/viktree/curly-octo-chainsaw
<?xml version="1.0" encoding="UTF-8"?><process version="9.3.001">