Options

"Recognition of id's in FP Growth"

spoorthy9547spoorthy9547 Member Posts: 3 Contributor I
edited June 2019 in Help
Hello,

I am dealing with transaction data where each transaction has an id.Every id has three or more items dealing with the transaction.So,in the excel sheet i will have an id repeated 3 times if the transaction contains 3 items.Now,i have to find frequent item sets in the transactions.I tried to use FP growth algorithm and i dont get the expected output.Is there a way where an id is grouped with all its transactions?

Thanks,
Tagged:

Answers

  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi,
    You'll need to Pivot your data into Binominals first. If that means nothing to you then you need to check out the examples; believe me, it saves time in the long run.

    Good luck!
  • Options
    spoorthy9547spoorthy9547 Member Posts: 3 Contributor I
    Thanks for your reply!!

    I tried to use the Market Basket Analysis template and everything works fine till the aggregate operator.I gave my input an excel sheet with 3 columns CustomerId,itemId,itemCount. I have my CutomerId as integer and i mentioned it as an id,itemId as nominal and itemCount as integer.When i give my input from Aggegate to pivot its throwing me an error stating "the exampleset doesn't contain itemid and itemcount".

    What should i do??Here is my XML

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <description>Reads a data set containing of three columns: customerId, itemId, and itemCount. The item count is summed up per item and customer, pivoting is performed to have one attribute per item, and finally, association rules are generated.</description>
        <process expanded="true" height="578" width="840">
          <operator activated="true" class="read_excel" compatibility="5.2.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="30">
            <parameter key="excel_file" value="C:\Users\mc29546\Documents\SPOORTHY\Grill\EXCEL\test1.xlsx"/>
            <parameter key="imported_cell_range" value="A1:C7"/>
            <list key="annotations">
              <parameter key="0" value="Name"/>
            </list>
            <list key="data_set_meta_data_information">
              <parameter key="0" value="CustomerId.true.integer.id"/>
              <parameter key="1" value="itemId.true.nominal.attribute"/>
              <parameter key="2" value="itemCount.true.integer.attribute"/>
            </list>
          </operator>
          <operator activated="true" class="set_macro" compatibility="5.2.006" expanded="true" height="76" name="Define Item Count" width="90" x="179" y="30">
            <parameter key="macro" value="%{itemCountAttributeName}"/>
            <parameter key="value" value="itemCount"/>
          </operator>
          <operator activated="true" class="set_macro" compatibility="5.2.006" expanded="true" height="76" name="Define Customer" width="90" x="313" y="30">
            <parameter key="macro" value="customerIdAttributeName"/>
            <parameter key="value" value="CustomerId"/>
          </operator>
          <operator activated="true" class="set_macro" compatibility="5.2.006" expanded="true" height="76" name="Define Item" width="90" x="447" y="30">
            <parameter key="macro" value="itemIdAttributeName"/>
            <parameter key="value" value="itemId"/>
          </operator>
          <operator activated="true" class="aggregate" compatibility="5.1.006" expanded="true" height="76" name="Aggregate" width="90" x="45" y="210">
            <list key="aggregation_attributes">
              <parameter key="itemCount" value="sum"/>
            </list>
            <parameter key="group_by_attributes" value="CustomerId|itemId"/>
          </operator>
          <operator activated="true" breakpoints="after" class="pivot" compatibility="5.2.006" expanded="true" height="76" name="Pivot" width="90" x="179" y="210">
            <parameter key="group_attribute" value="CustomerId"/>
            <parameter key="index_attribute" value="itemId"/>
          </operator>
          <operator activated="false" class="replace_missing_values" compatibility="5.2.006" expanded="true" height="94" name="Replace Missing Values" width="90" x="447" y="300">
            <parameter key="default" value="zero"/>
            <list key="columns"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="5.2.006" expanded="true" height="76" name="Set Role" width="90" x="313" y="210">
            <parameter key="name" value="CustomerId"/>
            <parameter key="target_role" value="id"/>
            <list key="set_additional_roles">
              <parameter key="CustomerId" value="id"/>
              <parameter key="itemId" value="regular"/>
              <parameter key="itemCount" value="regular"/>
            </list>
          </operator>
          <operator activated="true" class="numerical_to_binominal" compatibility="5.2.006" expanded="true" height="76" name="Numerical to Binominal" width="90" x="581" y="210">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="|sum(itemCount)"/>
          </operator>
          <operator activated="true" class="nominal_to_binominal" compatibility="5.2.006" expanded="true" height="94" name="Nominal to Binominal" width="90" x="727" y="225">
            <parameter key="attributes" value="|itemId"/>
          </operator>
          <operator activated="true" class="fp_growth" compatibility="5.2.006" expanded="true" height="76" name="FP-Growth" width="90" x="581" y="75">
            <parameter key="positive_value" value="true"/>
            <parameter key="min_support" value="0.1"/>
          </operator>
          <operator activated="false" class="create_association_rules" compatibility="5.2.006" expanded="true" height="76" name="Create Association Rules" width="90" x="715" y="75"/>
          <connect from_op="Read Excel" from_port="output" to_op="Define Item Count" to_port="through 1"/>
          <connect from_op="Define Item Count" from_port="through 1" to_op="Define Customer" to_port="through 1"/>
          <connect from_op="Define Customer" from_port="through 1" to_op="Define Item" to_port="through 1"/>
          <connect from_op="Define Item" from_port="through 1" to_op="Aggregate" to_port="example set input"/>
          <connect from_op="Aggregate" from_port="example set output" to_op="Pivot" to_port="example set input"/>
          <connect from_op="Pivot" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
          <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
          <connect from_op="Nominal to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
          <connect from_op="FP-Growth" from_port="frequent sets" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="180"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    haddockhaddock Member Posts: 849 Maven
    Hi there,

    There is an example in the Samples for FPGrowth, I'd start there.

    Good luck.
Sign In or Register to comment.