Options

How do I store Outlier Score (LOF) for each process?

TAFTAF Member Posts: 8 Contributor II
edited November 2018 in Help

I'm trying to create an iterative process in RapidMiner where I:

1. Plot 2 attributes against each other (whilst keeping the ID for each tuple)
2. Apply LOF operator (from Anomaly Detection Extension to get Outlier score for each tuple)
3. Detect top 10 outliers (currently using Detect Outlier(Distances))
4. Filter them out (i.e. keeping only outliers)
5. KEEP Outlier score (that was generated from step 2) for the 10 outliers in step 3
6. Copy results into table, where I need columns for: the 'ID' (of the respective detected outliers) and their 'OutlierScore'
7. Repeat process by each time choosing 2 different attributes from the same dataset, carry out the remaining steps, and then for each iteration, adding their outlier score to the result table each time using the JOIN operator. (even if they don't have outlying tuples in common)

I am stuck in step numbers 5 and 6 since I can't seem to store the Outlier Score! ow do I store Outlier Score (LOF) for each process and eventually have them all in 1 result table? Any help would be greatly appreciated!

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi there,

    could you provide an example process with your first steps?

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    TAFTAF Member Posts: 8 Contributor II
    Sure thing Martin, thanks for replying!

    Basically I have a dataset called Bodyfat dataset and in it it has multiple columns like 'age', 'height', 'weight', 'percentagebodyfat', etc

    The process starts by retrieving data from the dataset, manually selecting 2 attributes and the ID (for example 'Id, 'age' and 'weight') and filter the data to keep only the first 99 records. Then, I apply the LOF operator to get the outlier scores for each record, apply the Detect outlier (distances) to get only the top 10 outliers, and use a filter to keep only the outliers.

    Here is a screenshot of the example process:
    image

    and this is the result:
    image

    As you can see the outlier score is not kept.. how can I (for example) add a column to the result table with the outlier score for each outlier in it please?
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi,

    detect outliers is not preserving the top 10, but calulcating new scores. I would recommend to sort by score and then use Filter Examples by range.

    I think if you want to preseve both scores, you need to rename the attributes and give different roles.

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    TAFTAF Member Posts: 8 Contributor II
    Thank a lot Martin, that makes sense! I arranged the process accordingly and removed the 'Detect outliers(Distances)' operator so now the process looks as follows. (Sorry for bombarding you with multiple printscreens!)

    image

    and the consequent result table:

    image

    Now here is where I encounter another problem.. I copy and paste the entire process but just change the attributes selected in the 'Select Attributes' operator to 'id', 'percentagebodyfat' and 'height', do the same process, and then join (outer) the results from both processes:

    image

    and this is the result! :

    image

    As you can see, I'm losing the outlier scores of the second process! Also, in the case of repeating outliers (i.e. they are outliers in both processes), theyre not even showing in the results (in fact there are 4 records competely missing). I wish to auto generate a new column for each process where the respective outlier score is immediately appended to the result table so that for each time the same outlier is present in each process, there will be many outlier scores associated with it. And in the case where the outlier is not present, a null (or question mark even) would be fine.  (I also tried inner join but it removed all the outliers which aren't in common to both processes, and still only kept the outlier score generated by the second process only.) What am I doing wrong?
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi,

    do you know that you can simply copy and paste the XML code from the XML view? It's way easier to share processes that way.
    The probllem is, that each role in an example set needs to be unique. Thus join is throwing away the second outlier score, if the role is also "outlier". Simply set a different role there. The same might be true for same attribute names. The join operator has a button for this.

    Attached is an example process on sonar.

    Cheers,
    Martin
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.4.000" expanded="true" height="60" name="Retrieve Sonar" width="90" x="112" y="75">
            <parameter key="repository_entry" value="//Samples/data/Sonar"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="6.4.000" expanded="true" height="76" name="Generate ID" width="90" x="246" y="75"/>
          <operator activated="true" class="multiply" compatibility="6.4.000" expanded="true" height="94" name="Multiply" width="90" x="380" y="75"/>
          <operator activated="true" class="anomalydetection:Histogram-based Outlier Score (HBOS)" compatibility="2.3.000" expanded="true" height="76" name="Histogram-based Outlier Score (HBOS)" width="90" x="514" y="165">
            <list key="histogram properties">
              <parameter key="attribute_1" value="fixed binwidth.-1"/>
              <parameter key="attribute_10" value="fixed binwidth.-1"/>
              <parameter key="attribute_11" value="fixed binwidth.-1"/>
              <parameter key="attribute_12" value="fixed binwidth.-1"/>
              <parameter key="attribute_13" value="fixed binwidth.-1"/>
              <parameter key="attribute_14" value="fixed binwidth.-1"/>
              <parameter key="attribute_15" value="fixed binwidth.-1"/>
              <parameter key="attribute_16" value="fixed binwidth.-1"/>
              <parameter key="attribute_17" value="fixed binwidth.-1"/>
              <parameter key="attribute_18" value="fixed binwidth.-1"/>
              <parameter key="attribute_19" value="fixed binwidth.-1"/>
              <parameter key="attribute_2" value="fixed binwidth.-1"/>
              <parameter key="attribute_20" value="fixed binwidth.-1"/>
              <parameter key="attribute_21" value="fixed binwidth.-1"/>
              <parameter key="attribute_22" value="fixed binwidth.-1"/>
              <parameter key="attribute_23" value="fixed binwidth.-1"/>
              <parameter key="attribute_24" value="fixed binwidth.-1"/>
              <parameter key="attribute_25" value="fixed binwidth.-1"/>
              <parameter key="attribute_26" value="fixed binwidth.-1"/>
              <parameter key="attribute_27" value="fixed binwidth.-1"/>
              <parameter key="attribute_28" value="fixed binwidth.-1"/>
              <parameter key="attribute_29" value="fixed binwidth.-1"/>
              <parameter key="attribute_3" value="fixed binwidth.-1"/>
              <parameter key="attribute_30" value="fixed binwidth.-1"/>
              <parameter key="attribute_31" value="fixed binwidth.-1"/>
              <parameter key="attribute_32" value="fixed binwidth.-1"/>
              <parameter key="attribute_33" value="fixed binwidth.-1"/>
              <parameter key="attribute_34" value="fixed binwidth.-1"/>
              <parameter key="attribute_35" value="fixed binwidth.-1"/>
              <parameter key="attribute_36" value="fixed binwidth.-1"/>
              <parameter key="attribute_37" value="fixed binwidth.-1"/>
              <parameter key="attribute_38" value="fixed binwidth.-1"/>
              <parameter key="attribute_39" value="fixed binwidth.-1"/>
              <parameter key="attribute_4" value="fixed binwidth.-1"/>
              <parameter key="attribute_40" value="fixed binwidth.-1"/>
              <parameter key="attribute_41" value="fixed binwidth.-1"/>
              <parameter key="attribute_42" value="fixed binwidth.-1"/>
              <parameter key="attribute_43" value="fixed binwidth.-1"/>
              <parameter key="attribute_44" value="fixed binwidth.-1"/>
              <parameter key="attribute_45" value="fixed binwidth.-1"/>
              <parameter key="attribute_46" value="fixed binwidth.-1"/>
              <parameter key="attribute_47" value="fixed binwidth.-1"/>
              <parameter key="attribute_48" value="fixed binwidth.-1"/>
              <parameter key="attribute_49" value="fixed binwidth.-1"/>
              <parameter key="attribute_5" value="fixed binwidth.-1"/>
              <parameter key="attribute_50" value="fixed binwidth.-1"/>
              <parameter key="attribute_51" value="fixed binwidth.-1"/>
              <parameter key="attribute_52" value="fixed binwidth.-1"/>
              <parameter key="attribute_53" value="fixed binwidth.-1"/>
              <parameter key="attribute_54" value="fixed binwidth.-1"/>
              <parameter key="attribute_55" value="fixed binwidth.-1"/>
              <parameter key="attribute_56" value="fixed binwidth.-1"/>
              <parameter key="attribute_57" value="fixed binwidth.-1"/>
              <parameter key="attribute_58" value="fixed binwidth.-1"/>
              <parameter key="attribute_59" value="fixed binwidth.-1"/>
              <parameter key="attribute_6" value="fixed binwidth.-1"/>
              <parameter key="attribute_60" value="fixed binwidth.-1"/>
              <parameter key="attribute_7" value="fixed binwidth.-1"/>
              <parameter key="attribute_8" value="fixed binwidth.-1"/>
              <parameter key="attribute_9" value="fixed binwidth.-1"/>
            </list>
          </operator>
          <operator activated="true" class="sort" compatibility="6.4.000" expanded="true" height="76" name="Sort (2)" width="90" x="648" y="165">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="6.4.000" expanded="true" height="76" name="Set Role" width="90" x="782" y="165">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="target_role" value="outlier2"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="anomalydetection:Local Outlier Factor (LOF)" compatibility="2.3.000" expanded="true" height="94" name="Local Outlier Factor (LOF)" width="90" x="514" y="30"/>
          <operator activated="true" class="sort" compatibility="6.4.000" expanded="true" height="76" name="Sort" width="90" x="648" y="30">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="join" compatibility="6.4.000" expanded="true" height="76" name="Join" width="90" x="916" y="75">
            <list key="key_attributes"/>
          </operator>
          <connect from_op="Retrieve Sonar" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Local Outlier Factor (LOF)" to_port="example set"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Histogram-based Outlier Score (HBOS)" to_port="example set"/>
          <connect from_op="Histogram-based Outlier Score (HBOS)" from_port="example set" to_op="Sort (2)" to_port="example set input"/>
          <connect from_op="Sort (2)" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Local Outlier Factor (LOF)" from_port="example set" to_op="Sort" to_port="example set input"/>
          <connect from_op="Sort" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    TAFTAF Member Posts: 8 Contributor II
    Thanks again for your help Martin, I'll check it out and if I encounter another problem get back to you :-)
  • Options
    TAFTAF Member Posts: 8 Contributor II
    Hi again Martin! I seem to be stuck on the same issue from a different perspective now :-(

    I did Set Role to change the Outlier attribute to a regular attribute as you indicated and it worked for the 1st and 2nd process - in fact in the result table, 2 columns with 2 outlier scores where generated. However, when I try to join the output from three processes (i.e. using 1 join to combine the output of the first 2 processes and then another join to join the output of the first join to the output of the 3rd process), I'm only getting the output from processes 1 and 2 in the result table !

    Here is the XML code for my current process:




    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.3.000">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="6.3.000" expanded="true" name="Process">
        <parameter key="logverbosity" value="init"/>
        <parameter key="random_seed" value="2001"/>
        <parameter key="send_mail" value="never"/>
        <parameter key="notification_email" value=""/>
        <parameter key="process_duration_for_mail" value="30"/>
        <parameter key="encoding" value="SYSTEM"/>
        <process expanded="true">
          <operator activated="true" class="retrieve" compatibility="6.3.000" expanded="true" height="60" name="Retrieve BodyFat v3" width="90" x="45" y="165">
            <parameter key="repository_entry" value="BodyFat v3"/>
          </operator>
          <operator activated="true" class="generate_id" compatibility="6.3.000" expanded="true" height="76" name="Generate ID" width="90" x="179" y="165">
            <parameter key="create_nominal_ids" value="false"/>
            <parameter key="offset" value="0"/>
          </operator>
          <operator activated="true" class="multiply" compatibility="6.3.000" expanded="true" height="112" name="Multiply" width="90" x="312" y="165"/>
          <operator activated="true" class="select_attributes" compatibility="6.3.000" expanded="true" height="76" name="Select Attributes (3)" width="90" x="447" y="165">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="height|pcbfat"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="6.3.000" expanded="true" height="76" name="Filter Example Range (2)" width="90" x="648" y="165">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="100"/>
            <parameter key="invert_filter" value="false"/>
          </operator>
          <operator activated="true" class="anomalydetection:Local Outlier Factor (LOF)" compatibility="2.3.001" expanded="true" height="94" name="Local Outlier Factor (2)" width="90" x="782" y="165">
            <parameter key="k_min (MinPtsLB)" value="10"/>
            <parameter key="k_max (MinPtsUB)" value="20"/>
            <parameter key="measure_types" value="MixedMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="GeneralizedIDivergence"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="parallelize evaluation process" value="false"/>
            <parameter key="number of threads" value="8"/>
          </operator>
          <operator activated="true" class="sort" compatibility="6.3.000" expanded="true" height="76" name="Sort (2)" width="90" x="983" y="165">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="6.3.000" expanded="true" height="76" name="Set Role (2)" width="90" x="1116" y="165">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="target_role" value="regular"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="6.3.000" expanded="true" height="76" name="Select Attributes (5)" width="90" x="1250" y="165">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="outlier|id"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="6.3.000" expanded="true" height="76" name="Select Attributes (2)" width="90" x="447" y="75">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="age|weight"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="6.3.000" expanded="true" height="76" name="Filter Example Range (3)" width="90" x="648" y="30">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="100"/>
            <parameter key="invert_filter" value="false"/>
          </operator>
          <operator activated="true" class="anomalydetection:Local Outlier Factor (LOF)" compatibility="2.3.001" expanded="true" height="94" name="Local Outlier Factor (LOF)" width="90" x="782" y="30">
            <parameter key="k_min (MinPtsLB)" value="10"/>
            <parameter key="k_max (MinPtsUB)" value="20"/>
            <parameter key="measure_types" value="MixedMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="GeneralizedIDivergence"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="parallelize evaluation process" value="false"/>
            <parameter key="number of threads" value="8"/>
          </operator>
          <operator activated="true" class="sort" compatibility="6.3.000" expanded="true" height="76" name="Sort" width="90" x="983" y="30">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="6.3.000" expanded="true" height="76" name="Set Role" width="90" x="1116" y="30">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="target_role" value="regular"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="6.3.000" expanded="true" height="76" name="Select Attributes (4)" width="90" x="1250" y="75">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="outlier|id"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="join" compatibility="6.3.000" expanded="true" height="76" name="Join" width="90" x="1385" y="120">
            <parameter key="remove_double_attributes" value="false"/>
            <parameter key="join_type" value="outer"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="true"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="6.3.000" expanded="true" height="76" name="Select Attributes (6)" width="90" x="447" y="255">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="chestcirc|frarmcirc"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="filter_example_range" compatibility="6.3.000" expanded="true" height="76" name="Filter Example Range (4)" width="90" x="648" y="300">
            <parameter key="first_example" value="1"/>
            <parameter key="last_example" value="100"/>
            <parameter key="invert_filter" value="false"/>
          </operator>
          <operator activated="true" class="anomalydetection:Local Outlier Factor (LOF)" compatibility="2.3.001" expanded="true" height="94" name="Local Outlier Factor (3)" width="90" x="782" y="300">
            <parameter key="k_min (MinPtsLB)" value="10"/>
            <parameter key="k_max (MinPtsUB)" value="20"/>
            <parameter key="measure_types" value="MixedMeasures"/>
            <parameter key="mixed_measure" value="MixedEuclideanDistance"/>
            <parameter key="nominal_measure" value="NominalDistance"/>
            <parameter key="numerical_measure" value="EuclideanDistance"/>
            <parameter key="divergence" value="GeneralizedIDivergence"/>
            <parameter key="kernel_type" value="radial"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_degree" value="3.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="parallelize evaluation process" value="false"/>
            <parameter key="number of threads" value="8"/>
          </operator>
          <operator activated="true" class="sort" compatibility="6.3.000" expanded="true" height="76" name="Sort (3)" width="90" x="983" y="300">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="sorting_direction" value="decreasing"/>
          </operator>
          <operator activated="true" class="set_role" compatibility="6.3.000" expanded="true" height="76" name="Set Role (3)" width="90" x="1117" y="255">
            <parameter key="attribute_name" value="outlier"/>
            <parameter key="target_role" value="regular"/>
            <list key="set_additional_roles"/>
          </operator>
          <operator activated="true" class="select_attributes" compatibility="6.3.000" expanded="true" height="76" name="Select Attributes (7)" width="90" x="1251" y="255">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attribute" value=""/>
            <parameter key="attributes" value="outlier|id"/>
            <parameter key="use_except_expression" value="false"/>
            <parameter key="value_type" value="attribute_value"/>
            <parameter key="use_value_type_exception" value="false"/>
            <parameter key="except_value_type" value="time"/>
            <parameter key="block_type" value="attribute_block"/>
            <parameter key="use_block_type_exception" value="false"/>
            <parameter key="except_block_type" value="value_matrix_row_start"/>
            <parameter key="invert_selection" value="false"/>
            <parameter key="include_special_attributes" value="false"/>
          </operator>
          <operator activated="true" class="join" compatibility="6.3.000" expanded="true" height="76" name="Join (2)" width="90" x="1518" y="165">
            <parameter key="remove_double_attributes" value="true"/>
            <parameter key="join_type" value="right"/>
            <parameter key="use_id_attribute_as_key" value="true"/>
            <list key="key_attributes"/>
            <parameter key="keep_both_join_attributes" value="false"/>
          </operator>
          <connect from_op="Retrieve BodyFat v3" from_port="output" to_op="Generate ID" to_port="example set input"/>
          <connect from_op="Generate ID" from_port="example set output" to_op="Multiply" to_port="input"/>
          <connect from_op="Multiply" from_port="output 1" to_op="Select Attributes (2)" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes (3)" to_port="example set input"/>
          <connect from_op="Multiply" from_port="output 3" to_op="Select Attributes (6)" to_port="example set input"/>
          <connect from_op="Select Attributes (3)" from_port="example set output" to_op="Filter Example Range (2)" to_port="example set input"/>
          <connect from_op="Filter Example Range (2)" from_port="example set output" to_op="Local Outlier Factor (2)" to_port="example set"/>
          <connect from_op="Local Outlier Factor (2)" from_port="example set" to_op="Sort (2)" to_port="example set input"/>
          <connect from_op="Sort (2)" from_port="example set output" to_op="Set Role (2)" to_port="example set input"/>
          <connect from_op="Set Role (2)" from_port="example set output" to_op="Select Attributes (5)" to_port="example set input"/>
          <connect from_op="Select Attributes (5)" from_port="example set output" to_op="Join" to_port="right"/>
          <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Filter Example Range (3)" to_port="example set input"/>
          <connect from_op="Filter Example Range (3)" from_port="example set output" to_op="Local Outlier Factor (LOF)" to_port="example set"/>
          <connect from_op="Local Outlier Factor (LOF)" from_port="example set" to_op="Sort" to_port="example set input"/>
          <connect from_op="Sort" from_port="example set output" to_op="Set Role" to_port="example set input"/>
          <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes (4)" to_port="example set input"/>
          <connect from_op="Select Attributes (4)" from_port="example set output" to_op="Join" to_port="left"/>
          <connect from_op="Join" from_port="join" to_op="Join (2)" to_port="left"/>
          <connect from_op="Select Attributes (6)" from_port="example set output" to_op="Filter Example Range (4)" to_port="example set input"/>
          <connect from_op="Filter Example Range (4)" from_port="example set output" to_op="Local Outlier Factor (3)" to_port="example set"/>
          <connect from_op="Local Outlier Factor (3)" from_port="example set" to_op="Sort (3)" to_port="example set input"/>
          <connect from_op="Sort (3)" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
          <connect from_op="Set Role (3)" from_port="example set output" to_op="Select Attributes (7)" to_port="example set input"/>
          <connect from_op="Select Attributes (7)" from_port="example set output" to_op="Join (2)" to_port="right"/>
          <connect from_op="Join (2)" from_port="join" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>




    Any suggestions?
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi again!

    I think join is not the way to go there. A combination of Append and pivot is working fine.
    Attahced is an example process on sonar. You can comment in your files. Furthermore i cleaned the process a bit and used a loop. You just need to enable the select Attributes in Select Suprocess to make it runable.

    Cheers,
    Martin

    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="6.4.000">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="6.4.000" expanded="true" name="Process">
       <process expanded="true">
         <operator activated="false" class="retrieve" compatibility="6.4.000" expanded="true" height="60" name="Retrieve BodyFat v3" width="90" x="45" y="30">
           <parameter key="repository_entry" value="BodyFat v3"/>
         </operator>
         <operator activated="true" class="retrieve" compatibility="6.4.000" expanded="true" height="60" name="Retrieve Sonar" width="90" x="45" y="165">
           <parameter key="repository_entry" value="//Samples/data/Sonar"/>
         </operator>
         <operator activated="true" class="generate_id" compatibility="6.4.000" expanded="true" height="76" name="Generate ID" width="90" x="179" y="165"/>
         <operator activated="true" class="loop_parameters" compatibility="6.4.000" expanded="true" height="76" name="Loop Parameters" width="90" x="313" y="165">
           <list key="parameters">
             <parameter key="Select Subprocess.select_which" value="[1.0;3;2;linear]"/>
           </list>
           <process expanded="true">
             <operator activated="true" class="select_subprocess" compatibility="6.4.000" expanded="true" height="76" name="Select Subprocess" width="90" x="45" y="30">
               <parameter key="select_which" value="3"/>
               <process expanded="true">
                 <operator activated="false" class="select_attributes" compatibility="6.4.000" expanded="true" height="76" name="Select Attributes (8)" width="90" x="45" y="30">
                   <parameter key="attribute_filter_type" value="subset"/>
                   <parameter key="attributes" value="age|weight"/>
                 </operator>
                 <connect from_port="input 1" to_port="output 1"/>
                 <portSpacing port="source_input 1" spacing="0"/>
                 <portSpacing port="source_input 2" spacing="0"/>
                 <portSpacing port="sink_output 1" spacing="0"/>
                 <portSpacing port="sink_output 2" spacing="0"/>
               </process>
               <process expanded="true">
                 <operator activated="false" class="select_attributes" compatibility="6.4.000" expanded="true" height="76" name="Select Attributes (9)" width="90" x="45" y="30">
                   <parameter key="attribute_filter_type" value="subset"/>
                   <parameter key="attributes" value="height|pcbfat"/>
                 </operator>
                 <connect from_port="input 1" to_port="output 1"/>
                 <portSpacing port="source_input 1" spacing="0"/>
                 <portSpacing port="source_input 2" spacing="0"/>
                 <portSpacing port="sink_output 1" spacing="0"/>
                 <portSpacing port="sink_output 2" spacing="0"/>
               </process>
               <process expanded="true">
                 <operator activated="false" class="select_attributes" compatibility="6.4.000" expanded="true" height="76" name="Select Attributes (10)" width="90" x="45" y="30">
                   <parameter key="attribute_filter_type" value="subset"/>
                   <parameter key="attributes" value="chestcirc|frarmcirc"/>
                 </operator>
                 <connect from_port="input 1" to_port="output 1"/>
                 <portSpacing port="source_input 1" spacing="0"/>
                 <portSpacing port="source_input 2" spacing="0"/>
                 <portSpacing port="sink_output 1" spacing="0"/>
                 <portSpacing port="sink_output 2" spacing="0"/>
               </process>
               <description align="center" color="transparent" colored="false" width="126">different select Attribute Operators</description>
             </operator>
             <operator activated="true" class="filter_example_range" compatibility="6.4.000" expanded="true" height="76" name="Filter Example Range (5)" width="90" x="179" y="30">
               <parameter key="first_example" value="1"/>
               <parameter key="last_example" value="100"/>
             </operator>
             <operator activated="true" class="anomalydetection:Local Outlier Factor (LOF)" compatibility="2.3.000" expanded="true" height="94" name="Local Outlier Factor (4)" width="90" x="313" y="30">
               <parameter key="number of threads" value="8"/>
             </operator>
             <operator activated="true" class="sort" compatibility="6.4.000" expanded="true" height="76" name="Sort (4)" width="90" x="514" y="30">
               <parameter key="attribute_name" value="outlier"/>
               <parameter key="sorting_direction" value="decreasing"/>
             </operator>
             <operator activated="true" class="select_attributes" compatibility="6.4.000" expanded="true" height="76" name="Select Attributes" width="90" x="648" y="30">
               <parameter key="invert_selection" value="true"/>
             </operator>
             <operator activated="true" class="set_role" compatibility="6.4.000" expanded="true" height="76" name="Set Role (4)" width="90" x="782" y="30">
               <parameter key="attribute_name" value="outlier"/>
               <list key="set_additional_roles"/>
             </operator>
             <operator activated="true" class="generate_attributes" compatibility="6.4.000" expanded="true" height="76" name="Generate Attributes" width="90" x="906" y="30">
               <list key="function_descriptions">
                 <parameter key="Iteration" value="%{a}"/>
               </list>
               <description align="center" color="transparent" colored="false" width="126">%{a} is a macro which returns you the number of times this is excecuted</description>
             </operator>
             <connect from_port="input 1" to_op="Select Subprocess" to_port="input 1"/>
             <connect from_op="Select Subprocess" from_port="output 1" to_op="Filter Example Range (5)" to_port="example set input"/>
             <connect from_op="Filter Example Range (5)" from_port="example set output" to_op="Local Outlier Factor (4)" to_port="example set"/>
             <connect from_op="Local Outlier Factor (4)" from_port="example set" to_op="Sort (4)" to_port="example set input"/>
             <connect from_op="Sort (4)" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
             <connect from_op="Select Attributes" from_port="example set output" to_op="Set Role (4)" to_port="example set input"/>
             <connect from_op="Set Role (4)" from_port="example set output" to_op="Generate Attributes" to_port="example set input"/>
             <connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
             <portSpacing port="source_input 1" spacing="0"/>
             <portSpacing port="source_input 2" spacing="0"/>
             <portSpacing port="sink_performance" spacing="0"/>
             <portSpacing port="sink_result 1" spacing="0"/>
             <portSpacing port="sink_result 2" spacing="0"/>
           </process>
         </operator>
         <operator activated="true" class="append" compatibility="6.4.000" expanded="true" height="76" name="Append" width="90" x="447" y="165"/>
         <operator activated="true" class="set_role" compatibility="6.4.000" expanded="true" height="76" name="Set Role" width="90" x="581" y="165">
           <parameter key="attribute_name" value="id"/>
           <list key="set_additional_roles"/>
         </operator>
         <operator activated="true" class="pivot" compatibility="6.4.000" expanded="true" height="76" name="Pivot" width="90" x="715" y="165">
           <parameter key="group_attribute" value="id"/>
           <parameter key="index_attribute" value="Iteration"/>
           <parameter key="skip_constant_attributes" value="false"/>
         </operator>
         <connect from_op="Retrieve Sonar" from_port="output" to_op="Generate ID" to_port="example set input"/>
         <connect from_op="Generate ID" from_port="example set output" to_op="Loop Parameters" to_port="input 1"/>
         <connect from_op="Loop Parameters" from_port="result 1" to_op="Append" to_port="example set 1"/>
         <connect from_op="Append" from_port="merged set" to_op="Set Role" to_port="example set input"/>
         <connect from_op="Set Role" from_port="example set output" to_op="Pivot" to_port="example set input"/>
         <connect from_op="Pivot" from_port="example set output" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="126"/>
         <portSpacing port="sink_result 2" spacing="54"/>
       </process>
     </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    TAFTAF Member Posts: 8 Contributor II
    It works perfectly! Thank you so much for your continuous and prompt help Martin!!!!  ;D

  • Options
    TAFTAF Member Posts: 8 Contributor II
    Hi again Martin,

    Apologies for more inconvenience!

    I'm currently doing the documentation of this process and so I'm going over it in as much depth as possible. However I'm stuck on a particular step in this process. In the 'Generate Attributes' operator you used the word 'Iteration' as attribute name and '%{a}' as function expressions. What do these mean exactly?

    All I gathered is that the percentage sign calculates a modulus. I couldn't find anything else online that could explain the rest though.

    Any advice would be greatly appreciated!
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,510 RM Data Scientist
    Hi,

    %{someName} indicate so called macros. Macros are system variables inside rapidminer. You can define macros either via the context view or with the explicit operators (Generate Macro, Set Macro, Extract Macro). For example you can Extract the number of rows and use it as a parameter of some operator.
    To use macros you can use either macro("someName") or %{someName} in Generate Attributes. If you want to use it in a operator you need to use %{someName}.

    This video seems to explain them quite well: https://www.youtube.com/watch?v=K4aBq-apeqM

    There are a hand full of predefined macros, one of them is %{a} which is always the number of times a operator is excecuted. You can find a list of them here: http://rapidminernotes.blogspot.de/2013/05/built-in-macros.html

    Cheers,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    TAFTAF Member Posts: 8 Contributor II
    Great, thanks a lot for the clarification! :)
Sign In or Register to comment.