🦉 🎤   RapidMiner Wisdom 2020 - CALL FOR SPEAKERS   🦉 🎤

We are inviting all community members to submit proposals to speak at Wisdom 2020 in Boston.


Whether it's a cool RapidMiner trick or a use case implementation, we want to see what you have.
Form link is below and deadline for submissions is November 15. See you in Boston!

CLICK HERE TO GO TO ENTRY FORM

"Preprocessing for FPGrowth"

guilhermecrguilhermecr Member Posts: 4 Contributor I
edited June 5 in Help
I am working with basket analisys. I am already generating the binomial format using other programs.

What RM operator can I use to transform the dataset from this format:

1,3
2,3,4
1,2,3

to this:

1,0,1,0
0,1,1,1
1,1,1,0

Thanks in advance :)

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,527   Unicorn
    Hi,
    your data format is called dense, because it only saves the indices of the columns unequal 0. RapidMiner supports a dense format, but it slightly differs from yours. If you could bring your data in the following format, you can easily load it:
    1:1 3:1
    2:1 3:1 4:1
    1:1 2:1 3:1

    If you then use the operator SparseFormatExampleSource with the parameter format set to no_label and the parameter dimension set to the number of dimensions (the highest number occuring in your file) then it works.

    Greetings,
      Sebastian
  • guilhermecrguilhermecr Member Posts: 4 Contributor I
    I am starting with market basket, so I have been practicing with datasets available in the internet.
    I have used the 'retail' data set available at http://fimi.cs.helsinki.fi/data/retail.dat, which is in the dense format.

    But since I will get my own data from a friend's shop, my question is:

    What is the best format for a market basket analysis with RM?


    Thanks

    PS: I will probaly use Apriori and FPGrowth.
Sign In or Register to comment.