Options

Market Basket Analysis

Spcalan14Spcalan14 Member Posts: 16 Learner I
I have a very simple excel file with 2 columns.
Invoice #
Item #

About 12k entries

I want to know what the most commonly purchased products are..
If Product A is always sold with Product B.. we can make a package deal.

Thoughts ?

Best Answer

  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Solution Accepted
    ah ok.. thank you very much for your help!
    My first time post and you saved the day !

Answers

  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Sorry to be vague.

    I have opened the Market Basket Analysis template, imported data, and I get the Association rules output..
    Largest support is 0.026 for Product 12 and Product 15..

    29 sets... Support = 0.047 ( highest )...

    So does this mean Product 12 and Product 15 are most commonly purchased together and there are 29 sets to confirm ?
    Can't be... Item 15 was only purchased 1x.. and Product 12 was purchased 32 times...
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @Spcalan14,

    Yes, you can take a look at the process template called "Market Basket Analysis" which include the 2 following operators : 
     - FP-Growth
     - Create Association Rules



    Hope this helps,

    Regards,

    Lionel
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Thank you.. Yes, this is the one I used.
    But how do I interpret the results ?

  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    According to the Association Rules.. Product 12, 27, and 20 have the most number of sets with a Support value of 0.006.
    of the 12k data points... only make up 1084...
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Should I focus on the greatest number of sets, Support, Confidence, Lift ?

    I don't want to predict.. I just want to know what is my 2 most commonly purchased items on the same invoice
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Product 20 is the most commonly purchased product (1042 of 12k ), followed by Product 33 (887 of 12k ).
    I would assume that these would be in the mix..
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Spcalan14,

    The "support" is defined by the proportion of transactions T which contain both X and Y.
    So I would say that to find "the 2 most commonly purchased items on the same invoice" you have to find the association with the hightest value of "support".(for that you can sort the results of the Create Association Rules operator). 



    Regards,

    Lionel
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Question.. How can I see the actual name of the product.. instead of a "Product # ?
    My descriptions are "CC-TT", and "CC-TTG".. not numbers..
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    I don't know what the output is referring to, since I am not using the exact words Product 1/2/3/ ...
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Thanks Lionel ...
    How can I see the actual Product Description ( instead of Product 1, 2, 3 ) ?
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Spcalan14,

    1. Go to the results of "Association Rules" generated by the operator Create Association Rules. 
    2. Sort the table by descending order of "support" by clicking on the name of the column "Support"
    3. The first row (Premises and Conclusion) indicates the "2 most commonly purchased items on the same invoice" 

    Regards,

    Lionel
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    My question is.. what is
     Product 12 and Product 15 ?
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    This is very clear..
    1. Go to the results of "Association Rules" generated by the operator Create Association Rules. 
    2. Sort the table by descending order of "support" by clicking on the name of the column "Support"
    3. The first row (Premises and Conclusion) indicates the "2 most commonly purchased items on the same invoice" 

    But what is Product 12 and Product 15 ?
    I need my product names...

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Spcalan14


    My screenshots are coming from the RapidMiner Template which are fictive examples and not from your own data...

    As said, run the process with your own data and go to the Association Rules results and you will see the 2 most commonly purchased items on the same invoice" of your own data....

    If you are lost after this explanation, please share your data...

    Regards,

    Lionel


  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Wow.. don't I feel like a dumbxxx.

    See below...
    This makes MUCH more sense ( considering this is my data )..
    LGL-TEE and CL-TEE makes much more sense....

    But since I have 4 groups with same Support (0.017).. the are even..
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    So the Support is the key indicator..
    What does the first column represent ?
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    Does the first column represent the number of bundles that included these 2 products ?
    If I first column..
    Then STP-Tee and CP-TEE have a value of (59)...
    Does that mean there were 59 instances of that specific bundle ?
  • Options
    Spcalan14Spcalan14 Member Posts: 16 Learner I
    See below..
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Spcalan14

    Yes, Support is the key indicator.
    I must admit that I don't know what the first column represent...

    Regards,

    Lionel
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    @Spcalan14,

    After reflexion, the first column is a kind of "Id", the number of the association rules...
    By playing with the "Min. Criterion Value", you will see that there are more or less association rules : 


    Regards,

    Lionel

Sign In or Register to comment.