Bug in number of attributes processed v9.1 Beta2

ad2045ad2045 Member Posts: 5 Newbie
edited December 2018 in Help
It appears that RapidMiner 9.1 Beta 2 will only process 100 attributes. For instance, the replace missing values operator will only show the first 100 attributes. I have noticed that this behavior of working on or showing only the first 100 attributes is common in other data manipulation processes.
Tagged:

Best Answer

  • gmeiergmeier Posts: 11   RM Engineering
    Solution Accepted
    The reason you only see 100 attributes in the metadata (and the selectors) is that the attributes after pivot are determined by the values of your attribute CATEGORY. And in the metadata for the attribute CATEGORY there are only 100 values since this is only a preview.
    If you really want more than 100 attributes in the metadata, you have two options:
    • You can go to Process > Synchronize Meta Data with Real Data and then run your process until a breakpoint before Rename by Replacing
    • You can go to Settings > Preferences > Maximum number of nominal values in meta data and increase the number there. But setting this number too high might lead to memory problems.
    In general, it is not advisable to select hundreds of attributes by hand, so Jan's regex solution is preferable. Or if you want to select all but a small number of attributes, you can also select the small number and use invert selection. Furthermore, you can use attribute names in the selector even if they are not shown on the right side by just typing the names.

Answers

  • jczogallajczogalla Employee, Member Posts: 93   RM Engineering
    Can you provide some more details please? What do you mean by the operator only shows the first 100 attributes? Do you see only 100 attributes when you hover over the output port? Or does the resulting example set have only 100 attributes?
    Maybe you can share the process here? To do that, you have to open the XML view in Studio (e.g. by typing "XML" in the search field in the top right corner and click "Display Panel: XML". Copy everything from the XML view and then paste it here, using the "</>" button.
    Cheers
    Jan
  • ad2045ad2045 Member Posts: 5 Newbie
    Here as requested. If I select Rename by Replacing->Attribute Type=subset, I do not see any attributes past the 100th attribute. I have to select the filter type all to rename all attributes. My attributes are the CATEGORY, which I have 105 but only see 100.

    <?xml version="1.0" encoding="UTF-8"?><process version="9.1.000-BETA2">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="9.1.000-BETA2" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="9.1.000-BETA2" expanded="true" height="68" name="Retrieve data-FINAL-10days" width="90" x="45" y="34">
            <parameter key="repository_entry" value="//Local Repository/data/data-FINAL-10days"/>
          </operator>
          <operator activated="true" class="subprocess" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Subprocess" width="90" x="313" y="34">
            <process expanded="true">
              <operator activated="true" class="blending:pivot" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Pivot" width="90" x="45" y="34">
                <parameter key="group-by_attributes" value="ID"/>
                <parameter key="column_grouping_attribute" value="CATEGORY"/>
                <list key="aggregation_attributes">
                  <parameter key="DOMAIN" value="count"/>
                </list>
              </operator>
              <operator activated="true" class="rename_by_replacing" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Rename by Replacing" width="90" x="179" y="34">
                <parameter key="attribute_filter_type" value="subset"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="replace_what" value="count\(DOMAIN\)_"/>
              </operator>
              <operator activated="true" class="replace_missing_values" compatibility="9.1.000-BETA2" expanded="true" height="103" name="Replace Missing Values" width="90" x="313" y="34">
                <parameter key="return_preprocessing_model" value="false"/>
                <parameter key="create_view" value="false"/>
                <parameter key="attribute_filter_type" value="all"/>
                <parameter key="attribute" value=""/>
                <parameter key="attributes" value=""/>
                <parameter key="use_except_expression" value="false"/>
                <parameter key="value_type" value="attribute_value"/>
                <parameter key="use_value_type_exception" value="false"/>
                <parameter key="except_value_type" value="time"/>
                <parameter key="block_type" value="attribute_block"/>
                <parameter key="use_block_type_exception" value="false"/>
                <parameter key="except_block_type" value="value_matrix_row_start"/>
                <parameter key="invert_selection" value="false"/>
                <parameter key="include_special_attributes" value="false"/>
                <parameter key="default" value="zero"/>
                <list key="columns"/>
              </operator>
              <operator activated="true" class="set_role" compatibility="9.1.000-BETA2" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
                <parameter key="attribute_name" value="Malware"/>
                <parameter key="target_role" value="label"/>
                <list key="set_additional_roles"/>
              </operator>
              <connect from_port="in 1" to_op="Pivot" to_port="input"/>
              <connect from_op="Pivot" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
              <connect from_op="Rename by Replacing" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
              <connect from_op="Replace Missing Values" from_port="example set output" to_op="Set Role" to_port="example set input"/>
              <connect from_op="Set Role" from_port="example set output" to_port="out 1"/>
              <portSpacing port="source_in 1" spacing="0"/>
              <portSpacing port="source_in 2" spacing="0"/>
              <portSpacing port="sink_out 1" spacing="0"/>
              <portSpacing port="sink_out 2" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="concurrency:parallel_decision_tree" compatibility="9.1.000-BETA2" expanded="true" height="103" name="Decision Tree" width="90" x="581" y="34">
            <parameter key="criterion" value="least_square"/>
            <parameter key="maximal_depth" value="10"/>
            <parameter key="apply_pruning" value="true"/>
            <parameter key="confidence" value="0.1"/>
            <parameter key="apply_prepruning" value="true"/>
            <parameter key="minimal_gain" value="0.01"/>
            <parameter key="minimal_leaf_size" value="2"/>
            <parameter key="minimal_size_for_split" value="4"/>
            <parameter key="number_of_prepruning_alternatives" value="3"/>
          </operator>
          <connect from_op="Retrieve data-FINAL-10days" from_port="output" to_op="Subprocess" to_port="in 1"/>
          <connect from_op="Subprocess" from_port="out 1" to_op="Decision Tree" to_port="training set"/>
          <connect from_op="Decision Tree" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • jczogallajczogalla Employee, Member Posts: 93   RM Engineering
    Are you sure you can only see the first 100 attributes? The order of the attributes is lexicographic, so the attributes att1, att10 and att100 for example would appear directly after each other. Can you see more than the first 100 attributes in the following process?
    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.003">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.003" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_data" compatibility="9.0.003" expanded="true" height="68" name="Generate Data" width="90" x="45" y="34">
    <parameter key="number_of_attributes" value="105"/>
    </operator>
    <operator activated="true" class="rename_by_replacing" compatibility="9.0.003" expanded="true" height="82" name="Rename by Replacing" width="90" x="179" y="34">
    <parameter key="attribute_filter_type" value="subset"/>
    <parameter key="regular_expression" value="att.*"/>
    <parameter key="replace_what" value="([0-9]+)"/>
    <parameter key="replace_by" value="1-$1"/>
    </operator>
    <connect from_op="Generate Data" from_port="output" to_op="Rename by Replacing" to_port="example set input"/>
    <connect from_op="Rename by Replacing" from_port="example set output" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    You can also use a regex to select all relevant attributes if they already are named similar. If they look like "CATEGORY_NAME1"...."CATEGORY_NAME2", you could use "CATEGORY_.*" as a regex to select them. I'm sorry, but I don't have your data, so I cannot tell the exact names.

    Cheers
    Jan
    ad2045sgenzer
Sign In or Register to comment.