Options

Market Basket Analysis: First Timer

WindsAloftWindsAloft Member Posts: 11 Contributor II
edited June 2019 in Help
Okay, so first of all, the tutorials are nice, I watched them all but still cannot figure out how to do a Market Basket Analysis.

I got so frustrated with the error messages that I deleted everything I created and I'm starting over and typing this step by step so maybe someone can point out my mistake.

1. Open RapidMiner
2. File, Import Data, Import CSV file
3. I selected a .csv file which I am using as a sample.  It has 3 headers and sample data
CustomerID, itemID, itemCount
4. Wizard suggests CustomerID to be Nominal, itemID to be Nominal, itemCount to be integer
Here is a sample row of my data:  CustomerID, D21953; itemID, E3; itemCount, 1;
5. Wizard suggests I set all roles as Regular
6. I choose my Local Repository as the location and name it DATA
7. I go to File, Open Template, Market Basket Analysis, Next
8. I leave the Values the same, since I made my example headers to match perfectly. 
9.  For Retrieve.repository_entry, I manually type in //My Repository/DATA  Because when I click the little folder and select DATA in my repository, it stays blank.

I show 3 red errors. 
"The Attribute customerIDAttributeName is missing in the input example set" - from Pivot
"The Attribute itemIDAttributeName is missing in the input example set" - from Pivot
"The Attribute customerIDAttributeName is missing in the example set" - from Set Role (quickfix)

Now what? 

Answers

  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    Um, did I perhaps post this in the wrong forum?
  • Options
    haddockhaddock Member Posts: 849 Maven
    Greetings Windsaloft!

    Looks like the lights are on but nobody is at home, so let me confuse you further...

    I've used the same template, and it needs some attention, specifically it uses macros ( the RM equivalent of variables which show as %{XXXX} in parameters  ), but does not assign values to them, so no wonder it confuses you! I've butchered a template by replacing the data call with a generator, like this...

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="431" width="915">
          <operator activated="false" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Transactions"/>
          </operator>
          <operator activated="false" class="set_macro" expanded="true" height="60" name="Define Item Count" width="90" x="179" y="30">
            <parameter key="macro" value="%{itemCountAttributeName}"/>
            <parameter key="value" value="itemCount"/>
          </operator>
          <operator activated="false" class="set_macro" expanded="true" height="60" name="Define Customer" width="90" x="313" y="30">
            <parameter key="macro" value="customerIdAttributeName"/>
            <parameter key="value" value="customerId"/>
          </operator>
          <operator activated="false" breakpoints="after" class="set_macro" expanded="true" height="60" name="Define Item" width="90" x="447" y="30">
            <parameter key="macro" value="itemIdAttributeName"/>
            <parameter key="value" value="itemId"/>
          </operator>
          <operator activated="false" class="aggregate" expanded="true" height="76" name="Aggregate" width="90" x="45" y="255">
            <list key="aggregation_attributes">
              <parameter key="amount" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="customer_id|product_id"/>
          </operator>
          <operator activated="true" class="generate_transaction_data" expanded="true" height="60" name="Generate Transaction Data" width="90" x="4" y="113">
            <parameter key="number_clusters" value="1"/>
          </operator>
          <operator activated="true" class="pivot" expanded="true" height="76" name="Pivot" width="90" x="179" y="210">
            <parameter key="group_attribute" value="Id"/>
            <parameter key="index_attribute" value="Item"/>
          </operator>
          <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="246" y="75">
            <parameter key="default" value="zero"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="numerical_to_binominal" expanded="true" height="76" name="Numerical to Binominal" width="90" x="380" y="75"/>
          <operator activated="false" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="300">
            <parameter key="name" value="%{customerIdAttributeName}"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="447" y="210"/>
          <operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="581" y="210">
            <parameter key="min_confidence" value="0.1"/>
          </operator>
          <connect from_op="Generate Transaction Data" from_port="output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
          <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    Actually there are also relevant samples ( 1-25 and 2-23 ), and this subject has raised its ugly head before, as a quick seach for  "Market Basket" shows.

    Pip Pip  ;D
  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    Thanks for the reply -- I did see a lot of XML when I was doing preliminary searches, but it was so far over my head that I couldn't understand what was going on.  I had no idea you could just pate the code and go back to design view and visually see what was going on!

    I tried your process but I didn't necessarily get any results that I could see.... but I am going to mess around with this.

    Thanks for the reply!
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    still any problems?

    Greetings,
      Sebastian
  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    Yes actually, with the above process I keep getting an error for Regular attributes must be of type binomial.  The preview shows the ID field as nominal (I assume thats my problem)

    Whats weird is, I actually *get* results with the process above (it generates its own recordset).  My OWN recordset, has an ID field which is text, so when I replace the first process with a retrieve, everything transitions fine except I don't get any results.  And I'm betting the nominal field is the problem. 

    I've tried adding a Type Conversion process in between:  Nominal to Binomial.    But that didn't work either. 

  • Options
    haddockhaddock Member Posts: 849 Maven
    Ooops,

    Mea maxima culpa :-[ I pasted in completely the wrong code.. this is what should have been there...
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input>
         <location/>
       </input>
       <output>
         <location/>
         <location/>
         <location/>
       </output>
       <macros/>
     </context>
     <operator activated="true" class="process" expanded="true" name="Process">
       <process expanded="true" height="391" width="915">
         <operator activated="false" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
           <parameter key="repository_entry" value="//Samples/data/Transactions"/>
         </operator>
         <operator activated="false" class="set_macro" expanded="true" height="60" name="Define Item Count" width="90" x="179" y="30">
           <parameter key="macro" value="%{itemCountAttributeName}"/>
           <parameter key="value" value="itemCount"/>
         </operator>
         <operator activated="false" class="set_macro" expanded="true" height="60" name="Define Customer" width="90" x="313" y="30">
           <parameter key="macro" value="customerIdAttributeName"/>
           <parameter key="value" value="customerId"/>
         </operator>
         <operator activated="false" breakpoints="after" class="set_macro" expanded="true" height="60" name="Define Item" width="90" x="447" y="30">
           <parameter key="macro" value="itemIdAttributeName"/>
           <parameter key="value" value="itemId"/>
         </operator>
         <operator activated="false" class="aggregate" expanded="true" height="76" name="Aggregate" width="90" x="45" y="255">
           <list key="aggregation_attributes">
             <parameter key="amount" value="sum"/>
           </list>
           <parameter key="group_by_attributes" value="customer_id|product_id"/>
         </operator>
         <operator activated="true" class="generate_transaction_data" expanded="true" height="60" name="Generate Transaction Data" width="90" x="4" y="113">
           <parameter key="number_clusters" value="1"/>
         </operator>
         <operator activated="true" class="pivot" expanded="true" height="76" name="Pivot" width="90" x="179" y="210">
           <parameter key="group_attribute" value="Id"/>
           <parameter key="index_attribute" value="Item"/>
         </operator>
         <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="246" y="75">
           <parameter key="default" value="zero"/>
           <list key="columns"/>
         </operator>
         <operator activated="true" class="numerical_to_binominal" expanded="true" height="76" name="Numerical to Binominal" width="90" x="380" y="75"/>
         <operator activated="false" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="300">
           <parameter key="name" value="%{customerIdAttributeName}"/>
           <parameter key="target_role" value="id"/>
         </operator>
         <operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="447" y="210"/>
         <operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="581" y="210">
           <parameter key="min_confidence" value="0.1"/>
         </operator>
         <connect from_op="Generate Transaction Data" from_port="output" to_op="Pivot" to_port="example set input"/>
         <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
         <connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
         <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
         <connect from_op="Numerical to Binominal" from_port="original" to_port="result 2"/>
         <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
         <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
         <portSpacing port="sink_result 3" spacing="0"/>
       </process>
     </operator>
    </process>

    Hope that goes a bit better!

  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    The grey boxes can be deleted correct?

    And is it okay if I still get the caution for FP-Growth that regular attributes must be binomial?  I will experiment with this and see if I can put my own dataset into the input, and see if it works.
  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    I think maybe the problem is that there are descriptive statistics such as Range, etc for my ID field, which happens to be all text.  My dataset is OrderID and ProductID, and they are both text.  Like, E30098A  E230843F  E230289D;  Product0001, Product0002, Product00003

    Perhaps that is my problem.  I get the warning with the process you posted, but it actually is successful despite the warning, probably because the ID's are numbers?
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi,

    Yep, you can bin the gray jobs, and you can ignore the warning, especially as it all runs OK. So all you need to do is replace the example generator, and all should be well....

    Make sure that your meta-data matches on attribute Name and Content

    Role          Name          Content

    id         Id           nominal
    regular Item            nominal
    regular Amount   integer



  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    If my example set is different than the generator in content,    i.e. my ID's are text    ... should I be applying the to binomial conversion at the beginning?
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    Don't think so, concentrate first on loading the data and seeing ( from the meta-data ) that RM thinks it has data as I described.

  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    Ok.  I had been modifying the processes in the graphical view as I switched to my example set, because the column names were slightly different.

    Instead of doing that, I'll simply create a new set of data which has the names and content you describe above.  That should eliminate the possibility that I was making mistakes while reconfiguring the different processes.
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    I've imported this CSV format as a data repository, and substituted that repository for the generator

    Id,Item,Amount
    E30098AE,Product0001,1
    E230843F,Product0001,1
    E230289D,Product0002,2
    E30098AE,Product0002,1
    E230843F,Product0001,1
    E230289D,Product0002,2

    And it works ( in the sense it doesn't fall over ).

    ;D
  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    Okay, I'm moving forward.  I stopped trying to be smart and I just renamed my headers to match the process so that I didn't have that problem.

    Now it doesn't break.  But my association rules are blank.  However this might mean I'm filtering out rules that might have existed in my data, but didn't meet a criteria. 


    To get the maximum number of results, I set

    FP-Growth
    min number of items = 0;
    positive value = [blank]
    min support = 0
    max items = -1
    must contain = [blank]

    Create Association Rules
    Min Conf = 0
    Gain theta = 0
    laplace k = 0


    But still can't see rules. 

    Some real rows from my data that I have, that I would expect some sort of rule would be:
    Id Item Amount
    D11131 E1 1
    D11131 E5 1
    D11124 E5 1
    D11125 E5 1


    I should see a rule appearing for E1 -- E5 right?

    Now we're on the rigth track, I'm thinking my example data isn't very good.  :)

  • Options
    haddockhaddock Member Posts: 849 Maven
    All good, now get sensible data and lower the criterion constraints until rules emerge.

    Happy dredging!
  • Options
    WindsAloftWindsAloft Member Posts: 11 Contributor II
    I really appreciate your help so much, I hope I can learn how to use this tool!

    Could you help me find the criteria that could be the maximum results?  or did I have it right with my previous post?
  • Options
    haddockhaddock Member Posts: 849 Maven

    Take it step by step. First thing is to understand about frequent item sets, and the parameters for their generation. If in doubt, as always, check out Wikipedia. Then do the rule building end.

    pip pip

  • Options
    steve0steve0 Member Posts: 6 Contributor II
    Hi

    I am just reading this post here, it is very good. I have a question- how would i modify the code to include a zip code, therefore providing associations rules by zip code for each?

    Thanks you
  • Options
    steve0steve0 Member Posts: 6 Contributor II
    Just on my previous question, is a clustering method needed for something like this? All the zip codes (attribute) are there already. I just want to see how the market basket analysis can be done by zip code so the association rules will appear as per zip code?

    Thanks
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    do you want a rule set per zip code? Then you would have to split your data according to the zip codes and perform the process on each of this subsets. You could do this with an filter Examples and a loop value operator.

    Greetings,
      Sebastian
  • Options
    steve0steve0 Member Posts: 6 Contributor II
    Hi Sebastian

    Yes it is a per zip code.

    I am using the code as shown

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="391" width="915">
          <operator activated="false" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="30">
            <parameter key="repository_entry" value="//Samples/data/Transactions"/>
          </operator>
          <operator activated="false" class="set_macro" expanded="true" height="60" name="Define Item Count" width="90" x="179" y="30">
            <parameter key="macro" value="%{itemCountAttributeName}"/>
            <parameter key="value" value="itemCount"/>
          </operator>
          <operator activated="false" class="set_macro" expanded="true" height="60" name="Define Customer" width="90" x="313" y="30">
            <parameter key="macro" value="customerIdAttributeName"/>
            <parameter key="value" value="customerId"/>
          </operator>
          <operator activated="false" breakpoints="after" class="set_macro" expanded="true" height="60" name="Define Item" width="90" x="447" y="30">
            <parameter key="macro" value="itemIdAttributeName"/>
            <parameter key="value" value="itemId"/>
          </operator>
          <operator activated="false" class="aggregate" expanded="true" height="76" name="Aggregate" width="90" x="45" y="255">
            <list key="aggregation_attributes">
              <parameter key="amount" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="customer_id|product_id"/>
          </operator>
          <operator activated="true" class="generate_transaction_data" expanded="true" height="60" name="Generate Transaction Data" width="90" x="4" y="113">
            <parameter key="number_clusters" value="1"/>
          </operator>
          <operator activated="true" class="pivot" expanded="true" height="76" name="Pivot" width="90" x="179" y="210">
            <parameter key="group_attribute" value="Id"/>
            <parameter key="index_attribute" value="Item"/>
          </operator>
          <operator activated="true" class="replace_missing_values" expanded="true" height="94" name="Replace Missing Values" width="90" x="246" y="75">
            <parameter key="default" value="zero"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="numerical_to_binominal" expanded="true" height="76" name="Numerical to Binominal" width="90" x="380" y="75"/>
          <operator activated="false" class="set_role" expanded="true" height="76" name="Set Role" width="90" x="313" y="300">
            <parameter key="name" value="%{customerIdAttributeName}"/>
            <parameter key="target_role" value="id"/>
          </operator>
          <operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth" width="90" x="447" y="210"/>
          <operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules" width="90" x="581" y="210">
            <parameter key="min_confidence" value="0.1"/>
          </operator>
          <connect from_op="Generate Transaction Data" from_port="output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
          <connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="Numerical to Binominal" from_port="original" to_port="result 2"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
          <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>

    Where exactly can i put these into this? Rather than zip code i am looking at State.

    Thanks you
  • Options
    steve0steve0 Member Posts: 6 Contributor II
    This is what i have tried

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input>
          <location/>
        </input>
        <output>
          <location/>
          <location/>
        </output>
        <macros/>
      </context>
      <operator activated="true" class="process" expanded="true" name="Process">
        <process expanded="true" height="566" width="915">
          <operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve (2)" width="90" x="45" y="30">
            <parameter key="repository_entry" value="Total Sales by State"/>
          </operator>
          <operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="45" y="165">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="Item|Products_Sold"/>
          </operator>
          <operator activated="true" class="rename" expanded="true" height="76" name="Rename (2)" width="90" x="45" y="300">
            <parameter key="old_name" value="Products_Sold"/>
            <parameter key="new_name" value="Customer Buys"/>
          </operator>
          <operator activated="true" class="loop_values" expanded="true" height="76" name="Loop Values (2)" width="90" x="246" y="300">
            <parameter key="attribute" value="State"/>
            <process expanded="true" height="415" width="689">
              <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="45" y="30">
                <parameter key="condition_class" value="attribute_value_filter"/>
                <parameter key="parameter_string" value="State=%{loop_value}"/>
              </operator>
              <operator activated="true" class="pivot" expanded="true" height="76" name="Pivot" width="90" x="179" y="30">
                <parameter key="group_attribute" value="Id"/>
                <parameter key="index_attribute" value="Item"/>
              </operator>
              <operator activated="true" class="numerical_to_binominal" expanded="true" height="76" name="Numerical to Binominal" width="90" x="313" y="30"/>
              <operator activated="true" class="fp_growth" expanded="true" height="76" name="FP-Growth (2)" width="90" x="447" y="30"/>
              <operator activated="true" class="create_association_rules" expanded="true" height="60" name="Create Association Rules (2)" width="90" x="581" y="120">
                <parameter key="min_confidence" value="0.95"/>
              </operator>
              <connect from_port="example set" to_op="Filter Examples" to_port="example set input"/>
              <connect from_op="Filter Examples" from_port="example set output" to_op="Pivot" to_port="example set input"/>
              <connect from_op="Pivot" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
              <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth (2)" to_port="example set"/>
              <connect from_op="FP-Growth (2)" from_port="example set" to_op="Create Association Rules (2)" to_port="item sets"/>
              <connect from_op="Create Association Rules (2)" from_port="rules" to_port="out 1"/>
              <portSpacing port="source_example set" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <connect from_op="Retrieve (2)" from_port="output" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
          <connect from_op="Rename (2)" from_port="example set output" to_op="Loop Values (2)" to_port="example set"/>
          <connect from_op="Loop Values (2)" from_port="out 1" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>

    but i keep getting Process failed. Reason: com.rapidminer.example.set.NonSpecialAttributesExampleSet cannot be cast to com.rapidminer.operator.learner.associations.FrequentItemSets

    I want to show the associations by State as results one after another.
Sign In or Register to comment.