What is the best dataset form to mining using fp-growth algorithm in RM?

brenda_natashabrenda_natasha Member Posts: 1 Newbie
edited January 17 in Help
Anyone knows the best criteria or at least the rules for dataset that want to be mined using fp-growth?
And about the form, which one is better?
1. order_id | item1 | item 2 | item 3
or
2. order_id | item {} 
or
3. order_id | book (T/F) | pencil (T/F) | bag (T/F)

because every example i read always use #2 form but what about the #1 and #3?? 

Answers

  • MarcoBarradasMarcoBarradas RapidMiner Certified Analyst, Member Posts: 36  Guru
    Hi @brenda_natasha
    It would not affect the outcome as long as you have information related to the order id and the items id. 
    The real difference is on the performance when you try to explore your data. 
    on case 1 and 3 you may have a column for each of the products depending on your use case it could be any number of columns and as it grows the array is bigger and the resources used by your computer would be bigger. 
    The main difference between 1 and 3 would be having binary encoding vs quantity of products on the order. Since the he amount ordered of each producto doesn't impact the outcome of the rule either way is ok. 
    At the end the process would transform the DataSet(DS) to a binary Matrix.
    I prefer form 2 since you only need 2 columns on your DS an its easier to obtain that structure out of any transaccional software. 
    Hope this answers you question. 
    Best regards.
    sgenzer
Sign In or Register to comment.