Options

"pruning redundant association rules"

André_RMAndré_RM Member Posts: 4 Contributor I
edited May 2019 in Help
Hi everyone,

I made an association rule in RapidMiner and everything es ok, but there are a lot of redundant rules, for example A=>B, B=>A. As you can see, It's the same rule.

My question is how to remove them?

I need help please. I've looked for some information about it, but I didn't found anything.

Answers

  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi @André_RM,

    I would have rather said that A=>B and B=>A are two different rules...

    Anyways can you share your XML process and your dataset(s) ?

    Regards,

    Lionel


  • Options
    André_RMAndré_RM Member Posts: 4 Contributor I
    HI @lionelderkrikor,

    Can you explain me why A=>B and B=>A are different rules? For example if I have two rules, one says "IF buy Milk Then buy Bread" and the other says "IF buy Bread Then buy Milk".  Aren't them redundant rules?

    Sorry, How can I share my xml process? I am new using rapidminer.

    Regards,

    Andre
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager


    :smiley:
  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Hi again @André_RM,

    I think it is 2 different rules, because it corresponds to different customer behaviours.
    If a client is buying bread, you will propose him some milk
    for an other client who is buying some milk, you will propose him some bread
     ==> it is two different customer behaviours and thus you have to apply a different recommendation (rule) in each case.

    Anyway, I managed without your process and your data. You can find a process performing what you want to do (to adapt to your own data ...) : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.000-RC">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.2.000-RC" expanded="true" name="Process" origin="GENERATED_TEMPLATE">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.2.000-RC" expanded="true" height="68" name="Load Transactions" origin="GENERATED_TEMPLATE" width="90" x="45" y="187">
            <parameter key="repository_entry" value="//Samples/Templates/Market Basket Analysis/Transactions"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="6.0.006" expanded="true" height="82" name="Aggregate" origin="GENERATED_TEMPLATE" width="90" x="179" y="136">
            <parameter key="use_default_aggregation" value="false"/>
            <parameter key="attribute_filter_type" value="all"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="default_aggregation_function" value="average"/>
            <list key="aggregation_attributes">
              <parameter key="product 1" value="concatenation"/>
            </list>
            <parameter key="group_by_attributes" value="Invoice"/>
            <parameter key="count_all_combinations" value="false"/>
            <parameter key="only_distinct" value="false"/>
            <parameter key="ignore_missings" value="true"/>
          </operator>
          <operator activated="true" class="rename" compatibility="9.2.000-RC" expanded="true" height="82" name="Rename" origin="GENERATED_TEMPLATE" width="90" x="313" y="136">
            <parameter key="old_name" value="concat(product 1)"/>
            <parameter key="new_name" value="Products"/>
            <list key="rename_additional_attributes"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="9.2.000-RC" expanded="true" height="82" name="Set Role" origin="GENERATED_TEMPLATE" width="90" x="447" y="136">
            <parameter key="attribute_name" value="Invoice"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="concurrency:fp_growth" compatibility="9.2.000-RC" expanded="true" height="82" name="FP-Growth" origin="GENERATED_TEMPLATE" width="90" x="581" y="136">
            <parameter key="input_format" value="item list in a column"/>
            <parameter key="item_separators" value="|"/>
            <parameter key="use_quotes" value="false"/>
            <parameter key="quotes_character" value="&quot;"/>
            <parameter key="escape_character" value="\"/>
            <parameter key="trim_item_names" value="true"/>
            <parameter key="positive_value" value="true"/>
            <parameter key="min_requirement" value="support"/>
            <parameter key="min_support" value="0.005"/>
            <parameter key="min_frequency" value="100"/>
            <parameter key="min_items_per_itemset" value="1"/>
            <parameter key="max_items_per_itemset" value="0"/>
            <parameter key="max_number_of_itemsets" value="1000000"/>
            <parameter key="find_min_number_of_itemsets" value="false"/>
            <parameter key="min_number_of_itemsets" value="100"/>
            <parameter key="max_number_of_retries" value="15"/>
            <parameter key="requirement_decrease_factor" value="0.9"/>
            <enumeration key="must_contain_list"/>
          </operator>
          <operator activated="true" class="create_association_rules" compatibility="9.2.000-RC" expanded="true" height="82" name="Create Association Rules" origin="GENERATED_TEMPLATE" width="90" x="715" y="187">
            <parameter key="criterion" value="confidence"/>
            <parameter key="min_confidence" value="0.1"/>
            <parameter key="min_criterion_value" value="0.8"/>
            <parameter key="gain_theta" value="2.0"/>
            <parameter key="laplace_k" value="1.0"/>
          </operator>
          <operator activated="true" class="converters:rules_2_example_set" compatibility="0.4.001" expanded="true" height="82" name="Association Rules to ExampleSet" width="90" x="916" y="187"/>
          <operator activated="true" class="multiply" compatibility="9.2.000-RC" expanded="true" height="82" name="Multiply" width="90" x="1117" y="187"/>
          <operator activated="true" class="select_attributes" compatibility="9.2.000-RC" expanded="true" height="82" name="Select Attributes" width="90" x="1251" y="187">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value="Premises"/>
            <parameter key="attributes" value="Conclusion|Premises"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="9.2.000-RC" expanded="true" height="124" name="Multiply (2)" width="90" x="1385" y="85"/>
          <operator activated="true" class="generate_attributes" compatibility="9.2.000-RC" expanded="true" height="82" name="Generate Attributes" width="90" x="1519" y="34">
            <list key="function_descriptions">
              <parameter key="Conclusion" value="Premises"/>
            </list>
            <parameter key="keep_all" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000-RC" expanded="true" height="82" name="Generate ID" width="90" x="1653" y="85">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="9.2.000-RC" expanded="true" height="82" name="Generate Attributes (2)" width="90" x="1519" y="136">
            <list key="function_descriptions">
              <parameter key="Premises" value="Conclusion"/>
            </list>
            <parameter key="keep_all" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000-RC" expanded="true" height="82" name="Generate ID (2)" width="90" x="1653" y="187">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.000-RC" expanded="true" height="82" name="Join" width="90" x="1787" y="136">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000-RC" expanded="true" height="82" name="Generate ID (3)" width="90" x="1519" y="238">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="cross_distances" compatibility="9.2.000-RC" expanded="true" height="103" name="Cross Distances" width="90" x="1921" y="136">
            <parameter key="measure_types" value="NominalMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="SimpleMatchingSimilarity"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="GeneralizedIDivergence"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="only_top_k" value="false"/>
            <parameter key="k" value="10"/>
            <parameter key="search_for" value="nearest"/>
            <parameter key="compute_similarities" value="true"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="9.2.000-RC" expanded="true" height="82" name="Generate ID (4)" width="90" x="1385" y="238">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="concurrency:join" compatibility="9.2.000-RC" expanded="true" height="82" name="Join (2)" width="90" x="2055" y="187">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="inner"/>
            <parameter key="use_id_attribute_as_key" value="false"/>
            <list key="key_attributes">
              <parameter key="request" value="id"/>
            </list>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <operator activated="true" class="filter_examples" compatibility="9.2.000-RC" expanded="true" height="103" name="Filter Examples" width="90" x="2189" y="187">
            <parameter key="parameter_expression" value=""/>
            <parameter key="condition_class" value="custom_filters"/>
            <parameter key="invert_filter" value="false"/>
            <list key="filters_list">
              <parameter key="filters_entry_key" value="distance.ne.1"/>
            </list>
            <parameter key="filters_logic_and" value="true"/>
            <parameter key="filters_check_metadata" value="true"/>
          </operator>
          <operator activated="true" class="remove_duplicates" compatibility="9.2.000-RC" expanded="true" height="103" name="Remove Duplicates" width="90" x="2323" y="187">
            <parameter key="attribute_filter_type" value="single"/>
            <parameter key="attribute" value="Premises"/>
            <parameter key="attributes" value=""/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
            <parameter key="treat_missing_values_as_duplicates" value="false"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="9.2.000-RC" expanded="true" height="82" name="Select Attributes (2)" width="90" x="2524" y="187">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="request|distance|document"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="true"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <connect from_op="Load Transactions" from_port="output" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Rename" to_port="example set input"/>
          <connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
          <connect from_op="Create Association Rules" from_port="rules" to_op="Association Rules to ExampleSet" to_port="rules input"/>
          <connect from_op="Association Rules to ExampleSet" from_port="example set" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Select Attributes" to_port="example set input"/>
          <connect from_op="Select Attributes" from_port="example set output" to_op="Multiply (2)" to_port="input"/>
          <connect from_op="Select Attributes" from_port="original" to_op="Generate ID (4)" to_port="example set input"/>
          <connect from_op="Multiply (2)" from_port="output 1" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Multiply (2)" from_port="output 2" to_op="Generate Attributes (2)" to_port="example set input"/>
          <connect from_op="Multiply (2)" from_port="output 3" to_op="Generate ID (3)" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Generate Attributes (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
          <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Join" from_port="join" to_op="Cross Distances" to_port="request set"/>
          <connect from_op="Generate ID (3)" from_port="example set output" to_op="Cross Distances" to_port="reference set"/>
          <connect from_op="Cross Distances" from_port="result set" to_op="Join (2)" to_port="left"/>
          <connect from_op="Generate ID (4)" from_port="example set output" to_op="Join (2)" to_port="right"/>
          <connect from_op="Join (2)" from_port="join" to_op="Filter Examples" to_port="example set input"/>
          <connect from_op="Filter Examples" from_port="example set output" to_op="Remove Duplicates" to_port="example set input"/>
          <connect from_op="Remove Duplicates" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <description align="left" color="yellow" colored="false" height="70" resized="false" width="850" x="20" y="25">MARKET BASKET ANALYSIS&lt;br&gt;Model associations between products by determining sets of items frequently purchased together and building association rules to derive recommendations.</description>
          <description align="left" color="blue" colored="true" height="185" resized="true" width="550" x="20" y="105">Step 1:&lt;br/&gt;Load transaction data containing a transaction id, a product id and a quantifier. The data denotes how many times a certain product has been purchased as part of a transactions.</description>
          <description align="left" color="purple" colored="true" height="341" resized="true" width="549" x="20" y="300">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 2:&lt;br&gt;Edit, transform &amp;amp; load (ETL) - Aggregate transaction data via concatenation so that the products in a transaction are in one entry, separated by the pipe symbol.&lt;br&gt;</description>
          <description align="left" color="green" colored="true" height="310" resized="true" width="290" x="580" y="105">Step 3:&lt;br/&gt;Using FP-Growth, determine frequent item sets. A frequent item sets denotes that the items (products) in the set have been purchased together frequently, i.e. in a certain ratio of transactions. This ratio is given by the support of the item set.</description>
          <description align="left" color="green" colored="true" height="215" resized="true" width="286" x="579" y="425">&lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; &lt;br&gt; Step 4:&lt;br/&gt;Create association rules which can be used for product recommendations depending on the confidences of the rules.&lt;br&gt;</description>
          <description align="left" color="yellow" colored="false" height="35" resized="true" width="849" x="20" y="655">Outputs: association rules, frequent item set&lt;br&gt;</description>
        </process>
      </operator>
    </process>
    
    This is the XML of the process. To import this XML process into RapidMiner you have to : 

    Open your process in RapidMiner and open the XML panel..

    Copy the XML code from there and paste it somewhere else, for example into a forum post here on the community portal.  By the way, if you post your XML here, please use the code environment which you get by clicking on the </> icon in the toolbar of the post.

    In order to import such an XML description of your process, e.g. to use a process someone else has posted here in the forum, please follow the following steps:

    1. Create a new process and go the the XML panel (see above).
    2. Clear the view and copy the XML code you got into that panel.
    3. Then press the green checkmark icon on top of the panel.
    4. Switch back to the Process panel.
     
    I hope it helps,

    Regards,

    Lionel

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    To reiterate, with association rules it is definitely the case that "if A then B" and "if B then A" are different rules and should be handled independently.  They can have completely different performance metrics (support, lift, etc.) and you might have another rule that is even stronger/better such as "if B then C."  AR output is not symmetrical and browsing through any real life shopping cart data or similar datasets will make that pretty clear.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    LolitaminerLolitaminer Member Posts: 2 Contributor I
    As my teacher explain to me, Theorically are different rules, as an example:
    the customer is buying coca cola since he buys rum, this a conditional probability. It means when he buys rum he will buy coca cola to mix his drink, but it's not the same tha he will rum since he buys coca cola. Buy coca cola it will not force him buy rum. 
    the first option it will have a higher correlation, but the second one no.
    you can confirm with the confidence and with the lift.
Sign In or Register to comment.