Association Rules Customer data

tolga_zm95tolga_zm95 Member Posts: 2 Contributor I
edited December 2018 in Help

Hello everybody!


I am student who recently started to work with RapidMiner for a school project and I have to apply Association Analysis on some data sets. Up until now I have performed it on Market Basket Analysis to determine co-occurrence and relationships between the items. But our professor gave us a data set to apply or assigned models on it. Basically I have customer data with 10.000 examples, which determines if they are likely to buy a bike or not. Out of 10.000 examples 90% are non-bike buyers and 10% bike buyers. I am not sure if this data set is appropriate for association analysis but nevertheless I have done some analysis. I made association between the non-buyers and created some simple rules, but I don't know if it makes sense or not and all of my conclusions are non-buyers. What do I have to do to get only the Buyers?


I appreciate your help and your time.


  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    @tolga_zm95 If this was strictly a classification problem, (i.e. buyers vs non-buyers) and you want to predict potential buyers, then I'd say you have a very unbalanced data set and would need to do some balancing but since you only want to know the rules, can you not just filter on the buyers using a Filter Example and filter for Yes on Bike Buyers?


    <?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
    <operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <parameter key="csv_file" value="C:\Users\Thomas Ott\Downloads\CustomerBase.csv"/>
    <parameter key="column_separators" value=","/>
    <parameter key="first_row_as_names" value="false"/>
    <list key="annotations">
    <parameter key="0" value="Name"/>
    <parameter key="encoding" value="windows-1251"/>
    <list key="data_set_meta_data_information">
    <parameter key="0" value="ID.true.real.id"/>
    <parameter key="1" value="Marital Status.true.polynominal.attribute"/>
    <parameter key="2" value="Gender.true.polynominal.attribute"/>
    <parameter key="3" value="Yearly Income.true.real.attribute"/>
    <parameter key="4" value="Children.true.real.attribute"/>
    <parameter key="5" value="Education.true.polynominal.attribute"/>
    <parameter key="6" value="Occupation.true.polynominal.attribute"/>
    <parameter key="7" value="Home Owner.true.polynominal.attribute"/>
    <parameter key="8" value="Cars.true.real.attribute"/>
    <parameter key="9" value="Commute Distance.true.polynominal.attribute"/>
    <parameter key="10" value="Region.true.polynominal.attribute"/>
    <parameter key="11" value="Age.true.real.attribute"/>
    <parameter key="12" value="BikeBuyer.true.polynominal.attribute"/>
    <operator activated="true" class="select_attributes" compatibility="8.1.000" expanded="true" height="82" name="Select Attributes" width="90" x="45" y="187">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="ID|BikeBuyer|Commute Distance|Home Owner|Region|Yearly Income|Age"/>
    <operator activated="true" class="discretize_by_frequency" compatibility="8.1.000" expanded="true" height="103" name="Discretize" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="attributes" value="Yearly Income|Age"/>
    <parameter key="number_of_bins" value="5"/>
    <operator activated="true" class="filter_examples" compatibility="8.1.000" expanded="true" height="103" name="Filter Examples" width="90" x="246" y="238">
    <list key="filters_list">
    <parameter key="filters_entry_key" value="BikeBuyer.equals.Yes"/>
    <operator activated="true" class="nominal_to_numerical" compatibility="8.1.000" expanded="true" height="103" name="Nominal to Numerical" width="90" x="447" y="238">
    <list key="comparison_groups"/>
    <operator activated="true" class="numerical_to_binominal" compatibility="8.1.000" expanded="true" height="82" name="Numerical to Binominal" width="90" x="447" y="34"/>
    <operator activated="true" class="fp_growth" compatibility="8.1.000" expanded="true" height="82" name="FP-Growth" width="90" x="581" y="34">
    <parameter key="min_support" value="0.5"/>
    <operator activated="true" class="create_association_rules" compatibility="8.1.000" expanded="true" height="82" name="Create Association Rules" width="90" x="715" y="34"/>
    <connect from_op="Read CSV" from_port="output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Discretize" to_port="example set input"/>
    <connect from_op="Discretize" from_port="example set output" to_op="Filter Examples" to_port="example set input"/>
    <connect from_op="Filter Examples" from_port="example set output" to_op="Nominal to Numerical" to_port="example set input"/>
    <connect from_op="Nominal to Numerical" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
    <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
    <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
    <connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
    <connect from_op="Create Association Rules" from_port="item sets" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>

  • Options
    tolga_zm95tolga_zm95 Member Posts: 2 Contributor I

    Thank you so much for the help Thomas!

    I have another questions and it is about discretizing. I want to create 3 groups for Age. For example Young (0-35), Middle (35-50) and Old (50-100), but when I use the discretize (user specifications) operator I can only define the upper limit. Do you know how to solve this problem?


    Thank you so much in advance!




  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    (friendly reminder from moderator – don't forget to mark replies as solutions)





Sign In or Register to comment.