Buggy warning on FP-Growth "non-binominal attribute detected"

TripartioTripartio Member Posts: 37 Maven
Hello,

With the latest version of RapidMiner 9.10.1, I have noticed an erroneous warning on FP-Growth that was not there before. Here is a sample process that illustrates the problem:


<?xml version="1.0" encoding="UTF-8"?><process version="9.10.001">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="9.10.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="1234"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="retrieve" compatibility="9.10.001" expanded="true" height="68" name="Retrieve Transactions" width="90" x="45" y="34">
        <parameter key="repository_entry" value="//Samples/Templates/Market Basket Analysis/Transactions"/>
      </operator>
      <operator activated="true" class="blending:pivot" compatibility="9.10.001" expanded="true" height="82" name="Pivot" width="90" x="179" y="34">
        <parameter key="group_by_attributes" value="Invoice"/>
        <parameter key="column_grouping_attribute" value="product 1"/>
        <list key="aggregation_attributes">
          <parameter key="Orders" value="count"/>
        </list>
        <parameter key="use_default_aggregation" value="false"/>
        <parameter key="default_aggregation_function" value="first"/>
      </operator>
      <operator activated="true" class="rename_by_replacing" compatibility="9.10.001" expanded="true" height="82" name="Rename by Replacing" width="90" x="313" y="136">
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="replace_what" value="count\(Orders\)_"/>
        <parameter key="replace_by" value=""/>
      </operator>
      <operator activated="true" class="set_role" compatibility="9.10.001" expanded="true" height="82" name="Set Role" width="90" x="447" y="136">
        <parameter key="attribute_name" value="Invoice"/>
        <parameter key="target_role" value="id"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="replace_missing_values" compatibility="9.10.001" expanded="true" height="103" name="Replace Missing Values" width="90" x="581" y="136">
        <parameter key="return_preprocessing_model" value="false"/>
        <parameter key="create_view" value="false"/>
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="attribute_value"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="default" value="zero"/>
        <list key="columns"/>
      </operator>
      <operator activated="true" class="numerical_to_binominal" compatibility="9.10.001" expanded="true" height="82" name="Numerical to Binominal" width="90" x="715" y="136">
        <parameter key="attribute_filter_type" value="all"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
        <parameter key="min" value="0.0"/>
        <parameter key="max" value="0.0"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="9.10.001" expanded="true" height="82" name="Select Attributes" width="90" x="849" y="136">
        <parameter key="attribute_filter_type" value="value_type"/>
        <parameter key="attribute" value=""/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="binominal"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="time"/>
        <parameter key="block_type" value="attribute_block"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_matrix_row_start"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="false"/>
      </operator>
      <operator activated="true" class="concurrency:fp_growth" compatibility="9.10.001" expanded="true" height="82" name="FP-Growth" width="90" x="983" y="136">
        <parameter key="input_format" value="items in dummy coded columns"/>
        <parameter key="item_separators" value="|"/>
        <parameter key="use_quotes" value="false"/>
        <parameter key="quotes_character" value="&quot;"/>
        <parameter key="escape_character" value="\"/>
        <parameter key="trim_item_names" value="true"/>
        <parameter key="min_requirement" value="support"/>
        <parameter key="min_support" value="0.05"/>
        <parameter key="min_frequency" value="100"/>
        <parameter key="min_items_per_itemset" value="1"/>
        <parameter key="max_items_per_itemset" value="0"/>
        <parameter key="max_number_of_itemsets" value="1000000"/>
        <parameter key="find_min_number_of_itemsets" value="true"/>
        <parameter key="min_number_of_itemsets" value="100"/>
        <parameter key="max_number_of_retries" value="15"/>
        <parameter key="requirement_decrease_factor" value="0.9"/>
        <enumeration key="must_contain_list"/>
      </operator>
      <connect from_op="Retrieve Transactions" from_port="output" to_op="Pivot" to_port="input"/>
      <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
      <connect from_op="Pivot" from_port="original" to_port="result 1"/>
      <connect from_op="Rename by Replacing" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
      <connect from_op="Replace Missing Values" from_port="example set output" to_op="Numerical to Binominal" to_port="example set input"/>
      <connect from_op="Numerical to Binominal" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
      <connect from_op="FP-Growth" from_port="example set" to_port="result 2"/>
      <connect from_op="FP-Growth" from_port="frequent sets" to_port="result 3"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
    </process>
  </operator>
</process>


As you can see, even though I select only binominal operators, I still get a warning that a "non-binominal attribute" is detected:


After some testing, the problem seems to be that the attribute with an ID role (Invoice) is triggering this error. That is, the FP-Growth operator detects that the ID is not binominal and so flags this warning. However, the false warning does not seem to affect the correct operation of the FP-Growth operator in 9.10.1; it runs just fine, despite the warning.

When I adjust Select Attributes to "include special attributes" (that is, eliminate the special ID attribute), then the FP-Growth warning goes away:



So, this seems to be a buggy false warning that does not otherwise affect the operator's correct operation. Could someone please confirm that this is indeed a bug, that is, that I am not the one who misunderstands the correct operation of the operator? And is this the correct place to report such a bug?

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,320 RM Data Scientist
    Solution Accepted
    i can open a ticket for it, but it will take a while to be fixed.
    Best,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,320 RM Data Scientist
    Hi,
    it complains about the id. Apparently the check for the meta data here does also check the special attributes, which is not what you really want. But those meta data warnings are talkative anyway and often just to be ignored.

    BR,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • TripartioTripartio Member Posts: 37 Maven
    @mschmitz, Yes, but the previous versions of RapidMiner never complained about this. I teach RapidMiner and these kinds of false warnings cause students a lot of confusion. So, rather than simply ignoring it, I would prefer for the false alert to be removed. Is this the right place to officially submit a bug report?
  • TripartioTripartio Member Posts: 37 Maven
    Thanks, @mschmitz . As long as it gets reported for fixing in the next version of RapidMiner (maybe 9.10.002?), then that's fine for now. Note that it should be indicated that this is a regression (that is, something that worked fine before got broken in an update somewhere); perhaps that information can help the bug fixers identify the problem more easily.
  • TripartioTripartio Member Posts: 37 Maven
    @MartinLiebig, Is there any progress on correcting this bug?
Sign In or Register to comment.