"Preprocessing for FPGrowth"

guilhermecrguilhermecr Member Posts: 4 Contributor I
edited June 2019 in Help
I am working with basket analisys. I am already generating the binomial format using other programs.

What RM operator can I use to transform the dataset from this format:


to this:


Thanks in advance :)


  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,529   Unicorn
    your data format is called dense, because it only saves the indices of the columns unequal 0. RapidMiner supports a dense format, but it slightly differs from yours. If you could bring your data in the following format, you can easily load it:
    1:1 3:1
    2:1 3:1 4:1
    1:1 2:1 3:1

    If you then use the operator SparseFormatExampleSource with the parameter format set to no_label and the parameter dimension set to the number of dimensions (the highest number occuring in your file) then it works.

    Β  Sebastian
  • guilhermecrguilhermecr Member Posts: 4 Contributor I
    I am starting with market basket, so I have been practicing with datasets available in the internet.
    I have used the 'retail' data set available at http://fimi.cs.helsinki.fi/data/retail.dat, which is in the dense format.

    But since I will get my own data from a friend's shop, my question is:

    What is the best format for a market basket analysis with RM?


    PS: I will probaly use Apriori and FPGrowth.
Sign In or Register to comment.