FP growth produced infinite value for association rules

knichchaknichcha Member Posts: 3 Contributor I
edited December 2018 in Product Feedback - Resolved

Hello, all
As I got stuck with memory when running old version of FP-growth. Now, I'm using FP-growth ver 8.2. It's so cool that I can limit my results by setting some parameters of FP-growth-min and max items per itemset, i.e., min items per itemset = 2 and max items per itemset = 2 in order to complete the process for my memory limitations. Everything goes well, FI can go through Create Association Rules, but the confidence is shown as infinity value. How can I get the real confidence for this situation?.
(I tried to run by not setting min and max items per itemset (default = 1 and 0), the confidence values are correct)
Thank you very much in advanced.

aKe.

<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="8.1.000" expanded="true" height="68" name="Read Pivot" width="90" x="45" y="136">
<parameter key="csv_file" value="I:\Google Drive\iAnA\elderly\ana_data\elderly_P1_F551_test.csv"/>
<parameter key="column_separators" value=",\s*|;\s*"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="TIS-620"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ID.true.polynominal.attribute"/>
<parameter key="1" value="A1.true.integer.attribute"/>
<parameter key="2" value="A3.true.integer.attribute"/>
<parameter key="3" value="A4.true.integer.attribute"/>
<parameter key="4" value="A5.true.integer.attribute"/>
<parameter key="5" value="A6.true.integer.attribute"/>
<parameter key="6" value="A6_1.true.attribute_value.attribute"/>
<parameter key="7" value="A7.true.integer.attribute"/>
<parameter key="8" value="A8.true.integer.attribute"/>
<parameter key="9" value="A9.true.integer.attribute"/>
<parameter key="10" value="A10.true.integer.attribute"/>
<parameter key="11" value="A11.true.integer.attribute"/>
<parameter key="12" value="A12.true.integer.attribute"/>
<parameter key="13" value="F55.true.integer.attribute"/>
</list>
</operator>
<operator activated="true" class="numerical_to_binominal" compatibility="8.2.000" expanded="true" height="82" name="Numerical to Binominal" width="90" x="45" y="238"/>
<operator activated="true" class="remove_useless_attributes" compatibility="8.2.000" expanded="true" height="82" name="Remove Useless Attributes" width="90" x="45" y="340">
<parameter key="nominal_useless_below" value="0.2"/>
<description align="left" color="transparent" colored="false" width="126">Remove 1-itemset that support &amp;lt; &amp;quot;nominal useless below&amp;quot;</description>
</operator>
<operator activated="true" breakpoints="after" class="concurrency:fp_growth" compatibility="8.2.000" expanded="true" height="82" name="FP-Growth" width="90" x="179" y="340">
<parameter key="positive_value" value="true"/>
<parameter key="min_support" value="0.2"/>
<parameter key="min_items_per_itemset" value="2"/>
<parameter key="max_items_per_itemset" value="2"/>
<enumeration key="must_contain_list"/>
</operator>
<operator activated="true" breakpoints="after" class="create_association_rules" compatibility="8.2.000" expanded="true" height="82" name="Create Association Rules" width="90" x="313" y="340">
<parameter key="min_confidence" value="0.5"/>
</operator>
<connect from_op="Read Pivot" from_port="output" to_op="Numerical to Binominal" to_port="example set input"/>
<connect from_op="Numerical to Binominal" from_port="example set output" to_op="Remove Useless Attributes" to_port="example set input"/>
<connect from_op="Remove Useless Attributes" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
<connect from_op="FP-Growth" from_port="frequent sets" to_op="Create Association Rules" to_port="item sets"/>
<connect from_op="Create Association Rules" from_port="rules" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<description align="left" color="yellow" colored="false" height="76" resized="true" width="559" x="34" y="23">Problem (R4): Generate Report didn't work because the xls-output file is zero bytes, Solution: use Item Sets to Data and Association Rules to Exmaple to change FI and AR to example set inorder to write out to a file by using Write Excel or Write CSV</description>
</process>
</operator>
</process>
0
0 votes

Declined · Last Updated

no update in a year - please comment if still relevant

Comments

  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Solution Accepted

     Hi,

     

    good finding.

    We are looking into it and investigating the matter. Stay posted for updates.

     

    Best,
    David

  • knichchaknichcha Member Posts: 3 Contributor I

    Thank you so much, David for your quick response.
    I'm looking forward for your update.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
  • jczogallajczogalla Employee, Member Posts: 144 RM Engineering

    Hi @knichcha!

    Sorry for the long wait. We looked into this, and everything works as expected.

    The thing about association rules is that they are calculated in an iterative manner, see Wikipedia for an explanation. If there is no support for single items (i.e. FP Growth does not return itemsets with size 1), you cannot calculate confidences for itemsets of size 2. So if you want to use the association rules operator, you need to have FP Growth calculate those single-item itemsets. I hope this helps!

     

    Cheers

    Jan

  • knichchaknichcha Member Posts: 3 Contributor I

    Thank you for your finding.
    I understand for calucating support and confidence.
    As your team have improved FP-Growth operator to limit number of itemsets, it's very good to alleviate memory use.
    If you do this for Create Association Rule operator as well, it's will be great.
    I means that even we limit itemsets with size 2, association rules should be create from 2-itemsets and confidence should be calculated correctly.
    Thank you in advanced

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    moving to Product Ideas.

  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Sign In or Register to comment.